Human Digit Recognition two-stage pipeline

Human Digit Recognition

Updated Dec 2025

PythonFlaskPyTorchComputer VisionHTML

Human Digit Recognition is a two-stage perception project that detects a hand region in an image and then classifies how many digits are raised. It includes the modeling workflow, annotation tooling, pretrained weights, and a lightweight Flask-based demo path.

Two CNNs (~7.5M params each): 4-layer conv stack + 9-layer dense head on 28x28 grayscale inputs

Custom dataset with canvas-based bounding-box annotation tool and albumentations augmentation

Cross-platform inference with automatic CUDA/MPS/CPU device detection

What I built

  • PyTorch models for hand-box detection and digit-count classification
  • Annotation tooling and notebooks for training and evaluation
  • Flask inference server and browser-facing prediction flow

How it works

  1. 1Capture or upload an image frame
  2. 2Predict a hand bounding box from the full frame
  3. 3Crop the detected region and run digit classification
  4. 4Return the predicted count through the Flask API

Results

  • End-to-end perception workflow from annotation to serving
  • Demonstrates staged CV inference rather than a single-model toy demo

Next steps

  • Package dependencies more cleanly for reproducible setup
  • Add screenshots and clearer example outputs to the repo and portfolio