
Human Digit Recognition
Updated Dec 2025
PythonFlaskPyTorchComputer VisionHTML
Human Digit Recognition is a two-stage perception project that detects a hand region in an image and then classifies how many digits are raised. It includes the modeling workflow, annotation tooling, pretrained weights, and a lightweight Flask-based demo path.
Two CNNs (~7.5M params each): 4-layer conv stack + 9-layer dense head on 28x28 grayscale inputs
Custom dataset with canvas-based bounding-box annotation tool and albumentations augmentation
Cross-platform inference with automatic CUDA/MPS/CPU device detection
What I built
- PyTorch models for hand-box detection and digit-count classification
- Annotation tooling and notebooks for training and evaluation
- Flask inference server and browser-facing prediction flow
How it works
- 1Capture or upload an image frame
- 2Predict a hand bounding box from the full frame
- 3Crop the detected region and run digit classification
- 4Return the predicted count through the Flask API
Results
- ✓End-to-end perception workflow from annotation to serving
- ✓Demonstrates staged CV inference rather than a single-model toy demo
Next steps
- Package dependencies more cleanly for reproducible setup
- Add screenshots and clearer example outputs to the repo and portfolio