Skip to content
Human Digit Recognition two-stage pipeline

Human Digit Recognition

Updated Dec 2025

PythonFlaskPyTorchComputer VisionHTML

this is a two-stage perception project, first it detects a hand region in an image, then classifies how many fingers are raised. i built the whole thing from the annotation tooling through training to a flask demo you can actually interact with.

two CNNs (~7.5M params each): 4-layer conv stack + 9-layer dense head on 28x28 grayscale inputs

custom dataset with a canvas-based bounding-box annotation tool and albumentations augmentation

cross-platform inference with automatic CUDA/MPS/CPU device detection

what i built

  • pytorch models for hand-box detection and digit-count classification
  • annotation tooling and notebooks for training and evaluation
  • flask inference server and browser-facing prediction flow

how it works

  1. 1capture or upload an image
  2. 2predict a hand bounding box from the full frame
  3. 3crop the detected region and run digit classification
  4. 4return the predicted count through the flask API

results

  • the whole pipeline from labeling images to actually serving predictions
  • shows staged CV inference rather than just a single-model toy demo

what's next

  • clean up dependencies for easier reproducible setup
  • capture an inference demo GIF showing hand detection through digit classification