February 09, 2026 3 min read 560 words

Reza E. Fazel, Arash Bakhtiary, Siavash A. Bigdeli

Also: Русский

This article is AI-generated from a scientific publication. We recommend verifying information in the original source.

Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine

In Brief

This research improves credit card fraud detection by using a smart, transparent machine learning model called the Explainable Boosting Machine (EBM). Unlike many models that are "black boxes," EBM shows exactly how it makes decisions—making it trustworthy for banks and financial systems.

The Problem

Credit card fraud is a growing threat, but detecting it is difficult because fraudulent transactions are extremely rare compared to normal ones—a problem called "class imbalance." If a model is trained mostly on normal transactions, it might miss real fraud. Traditional methods to fix imbalance, like oversampling or undersampling, can distort data or lose important information. This makes it hard to build a system that’s both accurate and trustworthy. Financial institutions need models that not only catch fraud but also explain why a transaction was flagged, so they can act quickly and fairly.

The Solution

The researchers used a powerful machine learning model called the Explainable Boosting Machine (EBM), which is designed to be both highly accurate and easy to understand. Unlike black-box models, EBM shows exactly how each feature (like transaction amount or time of day) contributes to a prediction. To make EBM even better, they carefully tuned its settings (hyperparameters), selected the most useful features, and refined how the data was prepared—without relying on potentially misleading sampling tricks. They also used a method called Taguchi to find the best order for data preprocessing steps and model settings, ensuring consistent and reliable results. This systematic approach helped them maximize performance while keeping the model interpretable.
shows which features had the biggest impact on predictions—feature_0004 was most important, while feature_0009 and feature_0006 had the least influence.

The chart shows the relative importance or contribution of different features, with feature_0004 having the highest score and feature_0009 & feature_0006 having the lowest.

This helps experts understand what factors truly matter in fraud detection.

Key Findings

The optimized EBM achieved an ROC-AUC score of 0.983, meaning it’s very good at distinguishing between fraud and non-fraud transactions. This is higher than previous EBM models (0.975) and better than other common models like Logistic Regression, Random Forest, XGBoost, and Decision Tree.
The model’s interpretability allows experts to see how each feature contributes to a decision. illustrates this: the intercept (baseline prediction) has the highest positive contribution, while several features have negative contributions, showing how they reduce the chance of a transaction being labeled as fraud.

The chart displays the individual contributions of different features to a model's prediction, with the intercept having the highest positive contribution and several features having negative contributions.

highlights that feature_0014 had the strongest positive impact on predictions, while the intercept had the largest negative contribution—this helps explain why certain transactions are flagged as suspicious.

The chart shows the relative importance of different features in a predictive model, with feature_0014 having the highest positive contribution and intercept having the highest negative contribution.

Why It Matters

This research means financial institutions could use smarter, more trustworthy fraud detection systems. Because EBM explains its decisions, banks can quickly verify why a transaction was flagged, reduce false alarms, and respond faster to real fraud. This not only protects customers but also reduces financial losses and maintains trust in digital payments. The method’s transparency makes it easier for regulators and auditors to approve, which is crucial for real-world adoption.

Limitations

The researchers report that the method was tested only on a single benchmark dataset, so its performance on other real-world data remains to be confirmed.
The study does not compare EBM to newer, more complex models beyond those listed, so it’s unclear how it might perform against future algorithms.
The feature names (e.g., feature_0004) are not explained, so the practical meaning of these variables remains unclear without additional context.

Read Original Paper

All Articles