FraudShieldML

▶ Try it live: hameddaoud/fraudshield-ml on Hugging Face Spaces

FraudShieldML

An end-to-end credit card fraud detection system built on the Kaggle dataset (~285k transactions, 0.17% fraud). Pairs a tuned Random Forest classifier with a FastAPI inference service and a Gradio web UI, both runnable via Docker. Designed to address the severe class-imbalance problem with custom preprocessing, SMOTE resampling, and threshold tuning.

Results

Evaluated on the held-out test set (stratified split, real fraud distribution preserved):

Metric	Score
Precision (fraud class)	0.93
Recall (fraud class)	0.82
F1-score	0.87
PR-AUC	0.86
Accuracy	~100%
Best decision threshold	0.50 (F1-optimized)

The model is threshold-tuned for the precision-recall tradeoff rather than raw accuracy — accuracy is misleading on a 0.17%-positive dataset, where a constant "always legitimate" classifier would score 99.83%.

Architecture

                    ┌─────────────────────────┐
  Transaction ----->│  Gradio UI (port 7860)  │
  features (30)     └─────────────────────────┘
                                │  in-process call
                                ▼
                    ┌─────────────────────────┐
                    │  src/predict.py         │
                    │   predict_new_data(df)  │
                    └────────────┬────────────┘
                                 │
                  ┌──────────────┼──────────────┐
                  ▼              ▼              ▼
            pipeline.pkl   rf_model.pkl   threshold.json
            (preprocessor)  (classifier)   (decision cut)

                                 ▲
                    ┌─────────────────────────┐
       HTTP    --->│  FastAPI (port 8000)    │
       /predict    │  POST /predict          │
                   │  GET  /generate_dummy   │
                   │  GET  /health           │
                   └─────────────────────────┘

Pipeline

Preprocessing

Amount — np.log1p transform to handle the heavy right tail.
Time — converted to hour-of-day (0–23) and encoded with sin/cos for cyclical continuity.
V1–V28 — PCA-transformed features in the original dataset, left as-is.
StandardScaler applied across features for consistent ranges.
SMOTE oversampling on the training set only to address class imbalance.

Modeling

RandomForestClassifier (100 trees, sklearn defaults otherwise).
Threshold tuning by scanning thresholds 0.00–1.00 and maximizing F1 on validation predictions; the chosen threshold is persisted in models/threshold.json and applied at inference time.

Artifacts (persisted under `models/`)

pipeline.pkl — fitted preprocessing pipeline
rf_model.pkl — trained classifier
threshold.json — selected decision threshold
feature_names.json — feature ordering required at inference

Quick start

Local — Gradio UI (recommended)

pip install -r requirements.txt
python -m app.gradio_app
# open http://localhost:7860

The UI ships with three preset inputs: a legitimate-looking sample, a fraud-pattern sample, and a random sample generator — usable even without the original Kaggle dataset on disk.

Local — FastAPI service

uvicorn api.main:app --host 0.0.0.0 --port 8000
# health
curl http://localhost:8000/health
# prediction
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d @docs/sample_request.json

Docker — both services together

Works the same on macOS, Linux, and Windows (with Docker Desktop installed):

docker compose up --build
# Gradio UI:    http://localhost:7860
# FastAPI docs: http://localhost:8000/docs

The docker-compose.yml runs the API and UI as separate services sharing the same image; a health check on /health confirms model readiness before the UI starts.

API endpoints

Endpoint	Method	Description
`/health`	GET	Liveness + confirms model loaded
`/predict`	POST	Predicts fraud from a 30-feature transaction (Pydantic-validated input)
`/generate_dummy`	GET	Generates a noisy sample from the underlying dataset (requires `data/creditcard.csv` mounted)

Project structure

FraudShieldML/
├── api/
│   └── main.py              # FastAPI service
├── app/
│   ├── __init__.py
│   └── gradio_app.py        # Gradio Blocks UI
├── assets/
│   └── demo.png
├── models/                  # Persisted artifacts (pipeline, model, threshold, features)
├── notebooks/               # 01_eda, 02_feature_engineering, 03_class_imbalances, 04_final_model
├── reports/eda/             # Generated EDA plots (distribution, correlations, violin plots V1–V28)
├── src/
│   ├── preprocessing.py
│   ├── feature_engineering.py
│   ├── train.py
│   ├── evaluate.py
│   ├── predict.py
│   └── utils.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

Tech stack

Layer	Tools
Data processing	pandas, scikit-learn, imbalanced-learn
Model	RandomForestClassifier (sklearn)
Serving	FastAPI + Uvicorn
UI	Gradio (Blocks, Soft theme)
Persistence	joblib, JSON
Packaging	Docker, Docker Compose

Dataset

Kaggle — Credit Card Fraud Detection. Not shipped in this repo; download creditcard.csv into data/ if you want to retrain or use the dummy-sample endpoint.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

▶ Try it live: hameddaoud/fraudshield-ml on Hugging Face Spaces

FraudShieldML

Results

Architecture

Pipeline

Preprocessing

Modeling

Artifacts (persisted under `models/`)

Quick start

Local — Gradio UI (recommended)

Local — FastAPI service

Docker — both services together

API endpoints

Project structure

Tech stack

Dataset

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.streamlit		.streamlit
api		api
app		app
assets		assets
models		models
notebooks		notebooks
reports/eda		reports/eda
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

▶ Try it live: hameddaoud/fraudshield-ml on Hugging Face Spaces

FraudShieldML

Results

Architecture

Pipeline

Preprocessing

Modeling

Artifacts (persisted under models/)

Quick start

Local — Gradio UI (recommended)

Local — FastAPI service

Docker — both services together

API endpoints

Project structure

Tech stack

Dataset

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Artifacts (persisted under `models/`)

Packages