Human Digit Recognition

Updated Dec 2025

PythonFlaskPyTorchComputer VisionHTML

Human Digit Recognition is a two-stage perception project that detects a hand region in an image and then classifies how many digits are raised. It includes the modeling workflow, annotation tooling, pretrained weights, and a lightweight Flask-based demo path.

Two CNNs (~7.5M params each): 4-layer conv stack + 9-layer dense head on 28x28 grayscale inputs

Custom dataset with canvas-based bounding-box annotation tool and albumentations augmentation

Cross-platform inference with automatic CUDA/MPS/CPU device detection

What I built

PyTorch models for hand-box detection and digit-count classification
Annotation tooling and notebooks for training and evaluation
Flask inference server and browser-facing prediction flow

How it works

1Capture or upload an image frame
2Predict a hand bounding box from the full frame
3Crop the detected region and run digit classification
4Return the predicted count through the Flask API

Results

✓End-to-end perception workflow from annotation to serving
✓Demonstrates staged CV inference rather than a single-model toy demo

Next steps

Package dependencies more cleanly for reproducible setup
Add screenshots and clearer example outputs to the repo and portfolio