Home
Blog
Data Science
Fraud Detection in Transactions with Python: A Machine Learning Project

Fraud Detection in Transactions with Python: A Machine Learning Project

Updated on Jul 28, 2025 | 10 min read | 1.43K+ views

Table of Contents

View all

Pre-requisites Before You Begin
Your Fraud Detection Journey Starts Here
Tools That Power This Project
How the Model Detects Fraud in Transactions
From Transactions to Trust: Building a Fraud Detection System Step-by-Step
Final Conclusion: What We Learned from the Fraud Detection in Transactions Project

With the rise of digital payments, fraudulent transactions have become more frequent and harder to detect. Traditional systems often fail to catch evolving fraud patterns.

In this Fraud Detection in Transactions project, you’ll solve that issue by using real-world credit card transaction data to train models that can identify suspicious behavior. You'll apply techniques such as anomaly detection, isolation forests, and deep learning to classify transactions as fraudulent or genuine.

Want to turn skills into a career? Learn Python, Machine Learning, and more with upGrad’s job-ready Data Science Courses, built to get you hired faster. Explore now.

Build confidence through code. Explore top Python data science projects and start creating work that stands out to recruiters.

Popular Data Science Programs

Advanced Certificate Program in Data Science Data Science Machine Learning Course M Sc in Data Science Degree Postgraduate Diploma in Data Science DevOps Full Course Online

Pre-requisites Before You Begin

Before starting this Fraud Detection in Transactions project, it is helpful to have a basic understanding of the following concepts and tools:

Python programming (variables, functions, loops, basic syntax)
Pandas and Numpy (for handling and analyzing data)
Matplotlib or Seaborn (for creating charts and visualizing trends)
TensorFlow: To implement and train deep learning models for high-accuracy fraud classification.
Scikit-learn: For building machine learning models like Isolation Forest, Random Forest, and evaluating performance.
Anomaly Detection Basics: Understanding how rare events can be modeled using statistical or machine learning techniques.
Model tuning and validation: Concepts like train-test split, cross-validation, handling imbalanced data, and preventing overfitting.

Also Read: PyTorch vs TensorFlow: Making the Right Choice for 2025!

Level up your data science game with upGrad’s top-rated courses. Get mentored by industry pros, build real skills, and fast-track your path to a standout career.

Your Fraud Detection Journey Starts Here

Time required: Around 2 to 3 hours
Difficulty level: Moderate

This project is perfect if you’re comfortable with Python and want practical experience detecting fraud in real-world transaction data. You’ll learn how to identify fraud patterns, apply anomaly detection techniques, and build machine learning models

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Tools That Power This Project

Here are the key tools and Python libraries we’ll use to build and evaluate the Fraud Detection system:

Tool / Library	Purpose
Python	Core language for building the end-to-end fraud detection pipeline
NumPy	Efficient handling of arrays and numerical computations
Pandas	Loading, cleaning, and exploring transaction datasets
Scikit-learn	Implementing machine learning models and anomaly detection techniques
TensorFlow	Designing and training deep learning models for advanced fraud detection
Google Colab	Cloud-based environment to run, test, and visualize your project

Also Read - Step-by-Step Guide to Learning Python for Data Science

How the Model Detects Fraud in Transactions

To identify fraudulent transactions effectively, this project uses machine learning and anomaly detection techniques tailored for financial data. Here’s what the approach focuses on:

Anomaly Detection Models – Isolation Forest and autoencoders help spot unusual transaction patterns that deviate from typical user behavior.
Data Preprocessing – We clean, scale, and transform the transaction data to improve model performance, handling imbalanced classes and missing values.
Feature Engineering – Created relevant features such as transaction amount, time patterns, and behavioral metrics to enhance fraud detection accuracy.
Model Evaluation – Precision, recall, confusion matrix, and ROC AUC were used to validate and fine-tune model predictions for high-stakes financial decisions.

Also Read - Explaining 5 Layers of Convolutional Neural Network

From Transactions to Trust: Building a Fraud Detection System Step-by-Step

This section guides you through each stage of building a Fraud Detection in Transactions model using machine learning and deep learning methods:

Load and explore the credit card transaction dataset
Preprocess the data (handle imbalance, scale features, etc.)
Apply anomaly detection techniques like Isolation Forest
Build and train classification models (e.g., Logistic Regression, Random Forest)
Train deep learning models for better pattern recognition
Evaluate performance using metrics like ROC AUC, precision, and recall

So now let’s get started with detecting the fraud in transactions.

Step 1: Download the Dataset

Download customer data from Kaggle by searching " Fraud Detection in Transactions," downloading the ZIP file, extracting it, and using the CSV file for analysis.

Now, after downloading the dataset, move to the next step.

Step 2: Load and Understand the Fraud Transaction Dataset

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

We start by importing important libraries and loading the dataset.

import pandas as pd

# Load the dataset from CSV file
df = pd.read_csv('transaction.csv')

# Check class distribution to see how balanced the dataset is
# 'Class' column: 0 = Normal Transaction, 1 = Fraud
print("\nClass distribution:\n", df['Class'].value_counts(normalize=True))

# Preview the first 5 rows of the dataset
print("\nFirst 5 rows:\n", df.head())

Output :

Class distribution:

Class

0 0.998273

1 0.001727

Name: proportion, dtype: float64

First 5 rows:

Time V1 V2 V3 V4 V5 V6 V7 \

0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599

1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803

2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461

3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609

4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941

V8 V9 ... V21 V22 V23 V24 V25 \

0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539

1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170

2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642

3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376

4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010

V26 V27 V28 Amount Class

0 -0.189115 0.133558 -0.021053 149.62 0

1 0.125895 -0.008983 0.014724 2.69 0

2 -0.139097 -0.055353 -0.059752 378.66 0

3 -0.221929 0.062723 0.061458 123.50 0

4 0.502292 0.219422 0.215153 69.99 0

Also Read - Top 6 Python IDEs of 2025 That Will Change Your Workflow!

Step 3: Preparing Data for Model Training

Before training the model, we need to separate the target label (Class) from the features, scale the values for consistency, and split the dataset into training and testing sets.

The code for this step is below :

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Step 1: Separate features and target label
X = df.drop('Class', axis=1)   # Features (all columns except 'Class')
y = df['Class']                # Target (0 for normal, 1 for fraud)

# Step 2: Scale the features
# Scaling standardizes the values to have mean 0 and standard deviation 1
# This helps models converge faster and perform better
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Split the data into training and testing sets
# Stratified split ensures the same proportion of fraud cases in both sets
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, 
    test_size=0.2,        # 20% of data goes to the test set
    stratify=y,           # Maintain class balance
    random_state=42       # Reproducible results
)

This setup prepares the data for feeding into machine learning models.

Step 4: Detecting Fraud Using Isolation Forest

In this step, we apply the Isolation Forest algorithm: an unsupervised method used for anomaly detection. It works well when fraudulent transactions are rare and behave differently than the majority of data.

Here is the code for detecting:

from sklearn.ensemble import IsolationForest
from sklearn.metrics import classification_report, confusion_matrix

# Step 1: Initialize the Isolation Forest model
# n_estimators = number of trees
# contamination = expected proportion of frauds (here, ~1%)
iso_forest = IsolationForest(n_estimators=100, contamination=0.01, random_state=42)

# Step 2: Train the model on the training data (unsupervised)
iso_forest.fit(X_train)

# Step 3: Predict on the test data
# Output: -1 indicates anomaly (possible fraud), 1 indicates normal
y_pred_iso = iso_forest.predict(X_test)

# Step 4: Convert predictions to match target format
# 1 = fraud, 0 = normal
y_pred_iso = [1 if p == -1 else 0 for p in y_pred_iso]

# Step 5: Evaluate performance using classification metrics
print("Classification Report (Isolation Forest):")
print(classification_report(y_test, y_pred_iso, digits=4))  # Precision, recall, f1-score

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_iso))  # Shows true/false positives and negatives

Output:

Classification Report (Isolation Forest):

precision recall f1-score support

0 0.9994 0.9903 0.9948 56864

1 0.1052 0.6633 0.1816 98

accuracy 0.9897 56962

macro avg 0.5523 0.8268 0.5882 56962

weighted avg 0.9979 0.9897 0.9934 56962

Confusion Matrix:

[[56311 553]

[ 33 65]]

Also Read - CNN vs. RNN: Key Differences and Applications Explained

Step 5: Deep Learning Model for Fraud Detection

Now we build a neural network using TensorFlow/Keras to classify transactions as fraudulent or not. This is a supervised binary classification model using a dense feedforward network.

Here is the code:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Step 1: Define the architecture of the neural network
model = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),  # Input layer with 32 units
    Dropout(0.2),  # Dropout to prevent overfitting
    Dense(64, activation='relu'),  # Hidden layer with 64 units
    Dropout(0.3),  # More dropout
    Dense(32, activation='relu'),  # Another hidden layer
    Dense(1, activation='sigmoid')  # Output layer for binary classification (fraud or not)
])

# Step 2: Compile the model
# - Adam optimizer: adaptive learning rate
# - Binary crossentropy: loss for binary classification
# - Accuracy: to monitor model performance
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Step 3: Use EarlyStopping to avoid overfitting
# Stops training if validation loss doesn’t improve for 3 consecutive epochs
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Step 4: Train the model on training data
# - validation_split: 20% of training data used for validation
# - batch_size: number of samples processed before model update
# - verbose=1: prints training progress
history = model.fit(
    X_train, y_train,
    epochs=15,
    batch_size=512,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=1
)

Output:

357/357 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.9842 - loss: 0.1136 - val_accuracy: 0.9982 - val_loss: 0.0055

Epoch 2/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9989 - loss: 0.0050 - val_accuracy: 0.9994 - val_loss: 0.0039

Epoch 3/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.9993 - loss: 0.0041 - val_accuracy: 0.9994 - val_loss: 0.0038

Epoch 4/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9993 - loss: 0.0034 - val_accuracy: 0.9994 - val_loss: 0.0036

Epoch 5/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9991 - loss: 0.0037 - val_accuracy: 0.9994 - val_loss: 0.0035

Epoch 6/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9993 - loss: 0.0035 - val_accuracy: 0.9994 - val_loss: 0.0035

Epoch 7/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.9991 - loss: 0.0035 - val_accuracy: 0.9995 - val_loss: 0.0035

Epoch 8/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9994 - loss: 0.0030 - val_accuracy: 0.9995 - val_loss: 0.0035

Epoch 9/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9993 - loss: 0.0032 - val_accuracy: 0.9994 - val_loss: 0.0036

Epoch 10/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.9993 - loss: 0.0035 - val_accuracy: 0.9994 - val_loss: 0.0035

Epoch 11/15

357/357 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9994 - loss: 0.0030 - val_accuracy: 0.9995 - val_loss: 0.0036

Conclusion: This model learns the patterns of normal and fraudulent transactions to accurately classify future transactions.

Step 6: Model Evaluation on Test Set

To boost performance and prevent overfitting, we apply random rotations and zooms during training.

Here is the Code:

# Step 1: Evaluate the trained model on test data
# - Returns loss and accuracy
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Accuracy: {accuracy:.4f}")

# Step 2: Predict class probabilities on the test set
# - Model outputs probabilities between 0 and 1
y_pred_dl = model.predict(X_test)

# Step 3: Convert probabilities to binary class labels
# - Threshold of 0.5: if probability > 0.5, predict fraud (1); else, normal (0)
y_pred_dl = [1 if prob > 0.5 else 0 for prob in y_pred_dl]

# Step 4: Evaluate using classification metrics
from sklearn.metrics import classification_report, confusion_matrix

print("Classification Report (Deep Learning):")
print(classification_report(y_test, y_pred_dl, digits=4))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_dl))

Output:

Classification Report (Deep Learning):

precision recall f1-score support

0 0.9997 0.9997 0.9997 56864

1 0.8247 0.8163 0.8205 98

accuracy 0.9994 56962

macro avg 0.9122 0.9080 0.9101 56962

weighted avg 0.9994 0.9994 0.9994 56962

Confusion Matrix:

[[56847 17]

[ 18 80]]

Step 7: Visualizing Model Training History

Let’s plot how the model performed during training in terms of accuracy and loss over epochs. This helps you see whether the model overfitted or underfitted.

import matplotlib.pyplot as plt

# Function to plot training and validation accuracy/loss
def plot_training_history(history):
    plt.figure(figsize=(12, 4))

    # Plot 1: Accuracy over epochs
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='Train Accuracy')       # Training accuracy
    plt.plot(history.history['val_accuracy'], label='Val Accuracy')     # Validation accuracy
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title('Model Accuracy')
    plt.legend()

    # Plot 2: Loss over epochs
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Train Loss')               # Training loss
    plt.plot(history.history['val_loss'], label='Val Loss')             # Validation loss
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Model Loss')
    plt.legend()

    # Layout adjustment
    plt.tight_layout()
    plt.show()

# Call the function to display plots
plot_training_history(history)

Output:

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Conclusion-From the training history plots:

Model Accuracy: Both training and validation accuracy consistently stay above 99.9%, showing the model is highly effective in distinguishing fraudulent from normal transactions.
Model Loss: The loss values quickly dropped and stabilized, indicating fast convergence and no major signs of overfitting.

Final Conclusion: What We Learned from the Fraud Detection in Transactions Project

We built and evaluated two models: Isolation Forest and a neural network, to detect fraudulent transactions. After preprocessing and scaling the data, both models were trained and tested.

Isolation Forest gave a quick unsupervised baseline, while the deep learning model achieved higher accuracy and handled class imbalance better. Overall, the project showed how combining preprocessing, anomaly detection, and neural networks can effectively flag fraud in transaction data.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link-
https://colab.research.google.com/drive/1PEQF-F3GZH7Y-90KyEY1B-GlcsJHptV4?usp=sharing

Frequently Asked Questions (FAQs)

1. What was the objective of this project?

The project aimed to detect fraudulent credit card transactions by building two types of models:

An unsupervised anomaly detection model using Isolation Forest
A supervised deep learning model using a neural network

This helped us compare their ability to identify rare fraud cases in real-world transaction data.

2. Why was the Isolation Forest used?

Isolation Forest is effective in flagging outliers without needing labeled data. It isolates anomalies by randomly selecting features and splitting values. It works well when fraud cases are rare, like in this dataset.

3. Why did we scale the features?

Feature scaling using StandardScaler was crucial because:

It ensures all variables contribute equally to the model.

Many algorithms (especially neural networks) are sensitive to feature magnitude.

4. How did we deal with class imbalance?

Fraud cases made up less than 1% of the data. Instead of focusing on accuracy alone, we evaluated models using:

Precision (how many flagged frauds were actual frauds)
Recall (how many actual frauds were detected)

F1-score and confusion matrix for balanced assessment

5. Why use a neural network for this?

Deep learning models can capture complex patterns in transaction data that rule-based or simpler models might miss. We used multiple layers with dropout to prevent overfitting and improve generalization.

Rohit Sharma

834 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources