▶ Try it live: hameddaoud/fraudshield-ml on Hugging Face Spaces
An end-to-end credit card fraud detection system built on the Kaggle dataset (~285k transactions, 0.17% fraud). Pairs a tuned Random Forest classifier with a FastAPI inference service and a Gradio web UI, both runnable via Docker. Designed to address the severe class-imbalance problem with custom preprocessing, SMOTE resampling, and threshold tuning.
Evaluated on the held-out test set (stratified split, real fraud distribution preserved):
| Metric | Score |
|---|---|
| Precision (fraud class) | 0.93 |
| Recall (fraud class) | 0.82 |
| F1-score | 0.87 |
| PR-AUC | 0.86 |
| Accuracy | ~100% |
| Best decision threshold | 0.50 (F1-optimized) |
The model is threshold-tuned for the precision-recall tradeoff rather than raw accuracy — accuracy is misleading on a 0.17%-positive dataset, where a constant "always legitimate" classifier would score 99.83%.
┌─────────────────────────┐
Transaction ----->│ Gradio UI (port 7860) │
features (30) └─────────────────────────┘
│ in-process call
▼
┌─────────────────────────┐
│ src/predict.py │
│ predict_new_data(df) │
└────────────┬────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
pipeline.pkl rf_model.pkl threshold.json
(preprocessor) (classifier) (decision cut)
▲
┌─────────────────────────┐
HTTP --->│ FastAPI (port 8000) │
/predict │ POST /predict │
│ GET /generate_dummy │
│ GET /health │
└─────────────────────────┘
- Amount —
np.log1ptransform to handle the heavy right tail. - Time — converted to hour-of-day (0–23) and encoded with sin/cos for cyclical continuity.
- V1–V28 — PCA-transformed features in the original dataset, left as-is.
- StandardScaler applied across features for consistent ranges.
- SMOTE oversampling on the training set only to address class imbalance.
- RandomForestClassifier (100 trees, sklearn defaults otherwise).
- Threshold tuning by scanning thresholds 0.00–1.00 and maximizing F1 on validation predictions; the chosen threshold is persisted in
models/threshold.jsonand applied at inference time.
pipeline.pkl— fitted preprocessing pipelinerf_model.pkl— trained classifierthreshold.json— selected decision thresholdfeature_names.json— feature ordering required at inference
pip install -r requirements.txt
python -m app.gradio_app
# open http://localhost:7860The UI ships with three preset inputs: a legitimate-looking sample, a fraud-pattern sample, and a random sample generator — usable even without the original Kaggle dataset on disk.
uvicorn api.main:app --host 0.0.0.0 --port 8000
# health
curl http://localhost:8000/health
# prediction
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d @docs/sample_request.jsonWorks the same on macOS, Linux, and Windows (with Docker Desktop installed):
docker compose up --build
# Gradio UI: http://localhost:7860
# FastAPI docs: http://localhost:8000/docsThe docker-compose.yml runs the API and UI as separate services sharing the same image; a health check on /health confirms model readiness before the UI starts.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Liveness + confirms model loaded |
/predict |
POST | Predicts fraud from a 30-feature transaction (Pydantic-validated input) |
/generate_dummy |
GET | Generates a noisy sample from the underlying dataset (requires data/creditcard.csv mounted) |
FraudShieldML/
├── api/
│ └── main.py # FastAPI service
├── app/
│ ├── __init__.py
│ └── gradio_app.py # Gradio Blocks UI
├── assets/
│ └── demo.png
├── models/ # Persisted artifacts (pipeline, model, threshold, features)
├── notebooks/ # 01_eda, 02_feature_engineering, 03_class_imbalances, 04_final_model
├── reports/eda/ # Generated EDA plots (distribution, correlations, violin plots V1–V28)
├── src/
│ ├── preprocessing.py
│ ├── feature_engineering.py
│ ├── train.py
│ ├── evaluate.py
│ ├── predict.py
│ └── utils.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md
| Layer | Tools |
|---|---|
| Data processing | pandas, scikit-learn, imbalanced-learn |
| Model | RandomForestClassifier (sklearn) |
| Serving | FastAPI + Uvicorn |
| UI | Gradio (Blocks, Soft theme) |
| Persistence | joblib, JSON |
| Packaging | Docker, Docker Compose |
Kaggle — Credit Card Fraud Detection. Not shipped in this repo; download creditcard.csv into data/ if you want to retrain or use the dummy-sample endpoint.
MIT — see LICENSE.
