Fraud Detection in Transactions with Python: A Machine Learning Project

By Rohit Sharma

Updated on Jul 28, 2025 | 10 min read | 1.24K+ views

Share:

With the rise of digital payments, fraudulent transactions have become more frequent and harder to detect. Traditional systems often fail to catch evolving fraud patterns. 

In this Fraud Detection in Transactions project, you’ll solve that issue by using real-world credit card transaction data to train models that can identify suspicious behavior. You'll apply techniques such as anomaly detection, isolation forests, and deep learning to classify transactions as fraudulent or genuine.

Want to turn skills into a career? Learn Python, Machine Learning, and more with upGrad’s job-ready Data Science Courses, built to get you hired faster. Explore now.

Build confidence through code. Explore top Python data science projects and start creating work that stands out to recruiters.

Pre-requisites Before You Begin

Before starting this Fraud Detection in Transactions project, it is helpful to have a basic understanding of the following concepts and tools:

  • Python programming (variables, functions, loops, basic syntax)
  • Pandas and Numpy (for handling and analyzing data)
  • Matplotlib or Seaborn (for creating charts and visualizing trends)
  • TensorFlow: To implement and train deep learning models for high-accuracy fraud classification.
  • Scikit-learn: For building machine learning models like Isolation Forest, Random Forest, and evaluating performance.
  • Anomaly Detection Basics: Understanding how rare events can be modeled using statistical or machine learning techniques.
  • Model tuning and validation: Concepts like train-test split, cross-validation, handling imbalanced data, and preventing overfitting.

Also Read: PyTorch vs TensorFlow: Making the Right Choice for 2025!

Level up your data science game with upGrad’s top-rated courses. Get mentored by industry pros, build real skills, and fast-track your path to a standout career.

Your Fraud Detection Journey Starts Here

  • Time required: Around 2 to 3 hours
  • Difficulty level: Moderate

This project is perfect if you’re comfortable with Python and want practical experience detecting fraud in real-world transaction data. You’ll learn how to identify fraud patterns, apply anomaly detection techniques, and build machine learning models

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Tools That Power This Project 

Here are the key tools and Python libraries we’ll use to build and evaluate the Fraud Detection system:

Tool / Library

Purpose

Python Core language for building the end-to-end fraud detection pipeline
NumPy Efficient handling of arrays and numerical computations
Pandas Loading, cleaning, and exploring transaction datasets
Scikit-learn Implementing machine learning models and anomaly detection techniques
TensorFlow Designing and training deep learning models for advanced fraud detection
Google Colab Cloud-based environment to run, test, and visualize your project

Also Read - Step-by-Step Guide to Learning Python for Data Science

How the Model Detects Fraud in Transactions

To identify fraudulent transactions effectively, this project uses machine learning and anomaly detection techniques tailored for financial data. Here’s what the approach focuses on:

  • Anomaly Detection Models – Isolation Forest and autoencoders help spot unusual transaction patterns that deviate from typical user behavior.
  • Data Preprocessing – We clean, scale, and transform the transaction data to improve model performance, handling imbalanced classes and missing values.
  • Feature Engineering – Created relevant features such as transaction amount, time patterns, and behavioral metrics to enhance fraud detection accuracy.
  • Model Evaluation – Precision, recall, confusion matrix, and ROC AUC were used to validate and fine-tune model predictions for high-stakes financial decisions.

Also Read - Explaining 5 Layers of Convolutional Neural Network

From Transactions to Trust: Building a Fraud Detection System Step-by-Step

This section guides you through each stage of building a Fraud Detection in Transactions model using machine learning and deep learning methods:

  • Load and explore the credit card transaction dataset
  • Preprocess the data (handle imbalance, scale features, etc.)
  • Apply anomaly detection techniques like Isolation Forest
  • Build and train classification models (e.g., Logistic Regression, Random Forest)
  • Train deep learning models for better pattern recognition
  • Evaluate performance using metrics like ROC AUC, precision, and recall

So now let’s get started with detecting the fraud in transactions.

Step 1: Download the Dataset

Download customer data from Kaggle by searching " Fraud Detection in Transactions," downloading the ZIP file, extracting it, and using the CSV file for analysis.

Now, after downloading the dataset, move to the next step.

Step 2: Load and Understand the Fraud Transaction Dataset

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

We start by importing important libraries and loading the dataset.

import pandas as pd

# Load the dataset from CSV file
df = pd.read_csv('transaction.csv')

# Check class distribution to see how balanced the dataset is
# 'Class' column: 0 = Normal Transaction, 1 = Fraud
print("\nClass distribution:\n", df['Class'].value_counts(normalize=True))

# Preview the first 5 rows of the dataset
print("\nFirst 5 rows:\n", df.head())

Output : 

Class distribution:

 Class

0    0.998273

1    0.001727

Name: proportion, dtype: float64

First 5 rows:

    Time        V1        V2        V3        V4        V5        V6        V7  \

0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   

1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   

2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   

3   1.0 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609   

4   2.0 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941   

         V8        V9  ...       V21       V22       V23       V24       V25  \

0  0.098698  0.363787  ... -0.018307  0.277838 -0.110474  0.066928  0.128539   

1  0.085102 -0.255425  ... -0.225775 -0.638672  0.101288 -0.339846  0.167170   

2  0.247676 -1.514654  ...  0.247998  0.771679  0.909412 -0.689281 -0.327642   

3  0.377436 -1.387024  ... -0.108300  0.005274 -0.190321 -1.175575  0.647376   

4 -0.270533  0.817739  ... -0.009431  0.798278 -0.137458  0.141267 -0.206010   

        V26       V27       V28  Amount  Class  

0 -0.189115  0.133558 -0.021053  149.62      0  

1  0.125895 -0.008983  0.014724    2.69      0  

2 -0.139097 -0.055353 -0.059752  378.66      0  

3 -0.221929  0.062723  0.061458  123.50      0  

4  0.502292  0.219422  0.215153   69.99      0  

Also Read - Top 6 Python IDEs of 2025 That Will Change Your Workflow!

Step 3: Preparing Data for Model Training

Before training the model, we need to separate the target label (Class) from the features, scale the values for consistency, and split the dataset into training and testing sets.

The code for this step is below : 

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Step 1: Separate features and target label
X = df.drop('Class', axis=1)   # Features (all columns except 'Class')
y = df['Class']                # Target (0 for normal, 1 for fraud)

# Step 2: Scale the features
# Scaling standardizes the values to have mean 0 and standard deviation 1
# This helps models converge faster and perform better
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Split the data into training and testing sets
# Stratified split ensures the same proportion of fraud cases in both sets
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, 
    test_size=0.2,        # 20% of data goes to the test set
    stratify=y,           # Maintain class balance
    random_state=42       # Reproducible results
)

This setup prepares the data for feeding into machine learning models.

Step 4: Detecting Fraud Using Isolation Forest

In this step, we apply the Isolation Forest algorithm: an unsupervised method used for anomaly detection. It works well when fraudulent transactions are rare and behave differently than the majority of data.

Here is the code for detecting:

from sklearn.ensemble import IsolationForest
from sklearn.metrics import classification_report, confusion_matrix

# Step 1: Initialize the Isolation Forest model
# n_estimators = number of trees
# contamination = expected proportion of frauds (here, ~1%)
iso_forest = IsolationForest(n_estimators=100, contamination=0.01, random_state=42)

# Step 2: Train the model on the training data (unsupervised)
iso_forest.fit(X_train)

# Step 3: Predict on the test data
# Output: -1 indicates anomaly (possible fraud), 1 indicates normal
y_pred_iso = iso_forest.predict(X_test)

# Step 4: Convert predictions to match target format
# 1 = fraud, 0 = normal
y_pred_iso = [1 if p == -1 else 0 for p in y_pred_iso]

# Step 5: Evaluate performance using classification metrics
print("Classification Report (Isolation Forest):")
print(classification_report(y_test, y_pred_iso, digits=4))  # Precision, recall, f1-score

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_iso))  # Shows true/false positives and negatives

Output: 

Classification Report (Isolation Forest):

              precision    recall  f1-score   support

           0     0.9994    0.9903    0.9948     56864

           1     0.1052    0.6633    0.1816        98

    accuracy                         0.9897     56962

   macro avg     0.5523    0.8268    0.5882     56962

weighted avg     0.9979    0.9897    0.9934     56962

Confusion Matrix:

[[56311   553]

 [   33    65]]

Also Read - CNN vs. RNN: Key Differences and Applications Explained

Step 5: Deep Learning Model for Fraud Detection

Now we build a neural network using TensorFlow/Keras to classify transactions as fraudulent or not. This is a supervised binary classification model using a dense feedforward network.

Here is the code:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Step 1: Define the architecture of the neural network
model = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),  # Input layer with 32 units
    Dropout(0.2),  # Dropout to prevent overfitting
    Dense(64, activation='relu'),  # Hidden layer with 64 units
    Dropout(0.3),  # More dropout
    Dense(32, activation='relu'),  # Another hidden layer
    Dense(1, activation='sigmoid')  # Output layer for binary classification (fraud or not)
])

# Step 2: Compile the model
# - Adam optimizer: adaptive learning rate
# - Binary crossentropy: loss for binary classification
# - Accuracy: to monitor model performance
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Step 3: Use EarlyStopping to avoid overfitting
# Stops training if validation loss doesn’t improve for 3 consecutive epochs
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Step 4: Train the model on training data
# - validation_split: 20% of training data used for validation
# - batch_size: number of samples processed before model update
# - verbose=1: prints training progress
history = model.fit(
    X_train, y_train,
    epochs=15,
    batch_size=512,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=1
)

Output:  

357/357 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.9842 - loss: 0.1136 - val_accuracy: 0.9982 - val_loss: 0.0055

Epoch 2/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9989 - loss: 0.0050 - val_accuracy: 0.9994 - val_loss: 0.0039

Epoch 3/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.9993 - loss: 0.0041 - val_accuracy: 0.9994 - val_loss: 0.0038

Epoch 4/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9993 - loss: 0.0034 - val_accuracy: 0.9994 - val_loss: 0.0036

Epoch 5/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9991 - loss: 0.0037 - val_accuracy: 0.9994 - val_loss: 0.0035

Epoch 6/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9993 - loss: 0.0035 - val_accuracy: 0.9994 - val_loss: 0.0035

Epoch 7/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.9991 - loss: 0.0035 - val_accuracy: 0.9995 - val_loss: 0.0035

Epoch 8/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9994 - loss: 0.0030 - val_accuracy: 0.9995 - val_loss: 0.0035

Epoch 9/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9993 - loss: 0.0032 - val_accuracy: 0.9994 - val_loss: 0.0036

Epoch 10/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.9993 - loss: 0.0035 - val_accuracy: 0.9994 - val_loss: 0.0035

Epoch 11/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9994 - loss: 0.0030 - val_accuracy: 0.9995 - val_loss: 0.0036

Conclusion: This model learns the patterns of normal and fraudulent transactions to accurately classify future transactions.

Step 6: Model Evaluation on Test Set

To boost performance and prevent overfitting, we apply random rotations and zooms during training. 

Here is the Code:

# Step 1: Evaluate the trained model on test data
# - Returns loss and accuracy
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Accuracy: {accuracy:.4f}")

# Step 2: Predict class probabilities on the test set
# - Model outputs probabilities between 0 and 1
y_pred_dl = model.predict(X_test)

# Step 3: Convert probabilities to binary class labels
# - Threshold of 0.5: if probability > 0.5, predict fraud (1); else, normal (0)
y_pred_dl = [1 if prob > 0.5 else 0 for prob in y_pred_dl]

# Step 4: Evaluate using classification metrics
from sklearn.metrics import classification_report, confusion_matrix

print("Classification Report (Deep Learning):")
print(classification_report(y_test, y_pred_dl, digits=4))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_dl))

Output:

Classification Report (Deep Learning):

              precision    recall  f1-score   support

           0     0.9997    0.9997    0.9997     56864

           1     0.8247    0.8163    0.8205        98

    accuracy                         0.9994     56962

   macro avg     0.9122    0.9080    0.9101     56962

weighted avg     0.9994    0.9994    0.9994     56962

Confusion Matrix:

[[56847    17]

 [   18    80]]

Step 7: Visualizing Model Training History

Let’s plot how the model performed during training in terms of accuracy and loss over epochs. This helps you see whether the model overfitted or underfitted.

import matplotlib.pyplot as plt

# Function to plot training and validation accuracy/loss
def plot_training_history(history):
    plt.figure(figsize=(12, 4))

    # Plot 1: Accuracy over epochs
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='Train Accuracy')       # Training accuracy
    plt.plot(history.history['val_accuracy'], label='Val Accuracy')     # Validation accuracy
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title('Model Accuracy')
    plt.legend()

    # Plot 2: Loss over epochs
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Train Loss')               # Training loss
    plt.plot(history.history['val_loss'], label='Val Loss')             # Validation loss
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Model Loss')
    plt.legend()

    # Layout adjustment
    plt.tight_layout()
    plt.show()

# Call the function to display plots
plot_training_history(history)

Output:

Conclusion-From the training history plots:

  • Model Accuracy: Both training and validation accuracy consistently stay above 99.9%, showing the model is highly effective in distinguishing fraudulent from normal transactions.
  • Model Loss: The loss values quickly dropped and stabilized, indicating fast convergence and no major signs of overfitting.

Final Conclusion: What We Learned from the Fraud Detection in Transactions Project

We built and evaluated two models: Isolation Forest and a neural network, to detect fraudulent transactions. After preprocessing and scaling the data, both models were trained and tested.

Isolation Forest gave a quick unsupervised baseline, while the deep learning model achieved higher accuracy and handled class imbalance better. Overall, the project showed how combining preprocessing, anomaly detection, and neural networks can effectively flag fraud in transaction data.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link-
https://colab.research.google.com/drive/1PEQF-F3GZH7Y-90KyEY1B-GlcsJHptV4?usp=sharing

Frequently Asked Questions (FAQs)

1. What was the objective of this project?

2. Why was the Isolation Forest used?

3. Why did we scale the features?

4. How did we deal with class imbalance?

5. Why use a neural network for this?

Rohit Sharma

805 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months