Skip to content

HamedDaoud/FraudShieldML

Repository files navigation

FraudShieldML

Open in HF Spaces Python scikit-learn Gradio FastAPI Docker License: MIT

Gradio UI for the FraudShieldML Space

An end-to-end credit card fraud detection system built on the Kaggle dataset (~285k transactions, 0.17% fraud). Pairs a tuned Random Forest classifier with a FastAPI inference service and a Gradio web UI, both runnable via Docker. Designed to address the severe class-imbalance problem with custom preprocessing, SMOTE resampling, and threshold tuning.

Results

Evaluated on the held-out test set (stratified split, real fraud distribution preserved):

Metric Score
Precision (fraud class) 0.93
Recall (fraud class) 0.82
F1-score 0.87
PR-AUC 0.86
Accuracy ~100%
Best decision threshold 0.50 (F1-optimized)

The model is threshold-tuned for the precision-recall tradeoff rather than raw accuracy — accuracy is misleading on a 0.17%-positive dataset, where a constant "always legitimate" classifier would score 99.83%.

Architecture

                    ┌─────────────────────────┐
  Transaction ----->│  Gradio UI (port 7860)  │
  features (30)     └─────────────────────────┘
                                │  in-process call
                                ▼
                    ┌─────────────────────────┐
                    │  src/predict.py         │
                    │   predict_new_data(df)  │
                    └────────────┬────────────┘
                                 │
                  ┌──────────────┼──────────────┐
                  ▼              ▼              ▼
            pipeline.pkl   rf_model.pkl   threshold.json
            (preprocessor)  (classifier)   (decision cut)

                                 ▲
                    ┌─────────────────────────┐
       HTTP    --->│  FastAPI (port 8000)    │
       /predict    │  POST /predict          │
                   │  GET  /generate_dummy   │
                   │  GET  /health           │
                   └─────────────────────────┘

Pipeline

Preprocessing

  • Amountnp.log1p transform to handle the heavy right tail.
  • Time — converted to hour-of-day (0–23) and encoded with sin/cos for cyclical continuity.
  • V1–V28 — PCA-transformed features in the original dataset, left as-is.
  • StandardScaler applied across features for consistent ranges.
  • SMOTE oversampling on the training set only to address class imbalance.

Modeling

  • RandomForestClassifier (100 trees, sklearn defaults otherwise).
  • Threshold tuning by scanning thresholds 0.00–1.00 and maximizing F1 on validation predictions; the chosen threshold is persisted in models/threshold.json and applied at inference time.

Artifacts (persisted under models/)

  • pipeline.pkl — fitted preprocessing pipeline
  • rf_model.pkl — trained classifier
  • threshold.json — selected decision threshold
  • feature_names.json — feature ordering required at inference

Quick start

Local — Gradio UI (recommended)

pip install -r requirements.txt
python -m app.gradio_app
# open http://localhost:7860

The UI ships with three preset inputs: a legitimate-looking sample, a fraud-pattern sample, and a random sample generator — usable even without the original Kaggle dataset on disk.

Local — FastAPI service

uvicorn api.main:app --host 0.0.0.0 --port 8000
# health
curl http://localhost:8000/health
# prediction
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d @docs/sample_request.json

Docker — both services together

Works the same on macOS, Linux, and Windows (with Docker Desktop installed):

docker compose up --build
# Gradio UI:    http://localhost:7860
# FastAPI docs: http://localhost:8000/docs

The docker-compose.yml runs the API and UI as separate services sharing the same image; a health check on /health confirms model readiness before the UI starts.

API endpoints

Endpoint Method Description
/health GET Liveness + confirms model loaded
/predict POST Predicts fraud from a 30-feature transaction (Pydantic-validated input)
/generate_dummy GET Generates a noisy sample from the underlying dataset (requires data/creditcard.csv mounted)

Project structure

FraudShieldML/
├── api/
│   └── main.py              # FastAPI service
├── app/
│   ├── __init__.py
│   └── gradio_app.py        # Gradio Blocks UI
├── assets/
│   └── demo.png
├── models/                  # Persisted artifacts (pipeline, model, threshold, features)
├── notebooks/               # 01_eda, 02_feature_engineering, 03_class_imbalances, 04_final_model
├── reports/eda/             # Generated EDA plots (distribution, correlations, violin plots V1–V28)
├── src/
│   ├── preprocessing.py
│   ├── feature_engineering.py
│   ├── train.py
│   ├── evaluate.py
│   ├── predict.py
│   └── utils.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

Tech stack

Layer Tools
Data processing pandas, scikit-learn, imbalanced-learn
Model RandomForestClassifier (sklearn)
Serving FastAPI + Uvicorn
UI Gradio (Blocks, Soft theme)
Persistence joblib, JSON
Packaging Docker, Docker Compose

Dataset

Kaggle — Credit Card Fraud Detection. Not shipped in this repo; download creditcard.csv into data/ if you want to retrain or use the dummy-sample endpoint.

License

MIT — see LICENSE.

About

A machine learning system for detecting credit card fraud with a Gradio UI and FastAPI backend.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors