
Human Digit Recognition
Updated Dec 2025
PythonFlaskPyTorchComputer VisionHTML
this is a two-stage perception project, first it detects a hand region in an image, then classifies how many fingers are raised. i built the whole thing from the annotation tooling through training to a flask demo you can actually interact with.
two CNNs (~7.5M params each): 4-layer conv stack + 9-layer dense head on 28x28 grayscale inputs
custom dataset with a canvas-based bounding-box annotation tool and albumentations augmentation
cross-platform inference with automatic CUDA/MPS/CPU device detection
what i built
- pytorch models for hand-box detection and digit-count classification
- annotation tooling and notebooks for training and evaluation
- flask inference server and browser-facing prediction flow
how it works
- 1capture or upload an image
- 2predict a hand bounding box from the full frame
- 3crop the detected region and run digit classification
- 4return the predicted count through the flask API
results
- ✓the whole pipeline from labeling images to actually serving predictions
- ✓shows staged CV inference rather than just a single-model toy demo
what's next
- clean up dependencies for easier reproducible setup
- capture an inference demo GIF showing hand detection through digit classification