PP-OCRv6 Hits 86.2% Detection Hmean at Just 34.5M Parameters

PP-OCRv6_medium hits 86.2% detection Hmean and 83.2% recognition accuracy at just 34.5M parameters. That's a +4.6 percentage point jump in text detection and +5.1 in recognition over PP-OCRv5_server, the previous generation's server-grade model. PaddlePaddle just shipped this on Hugging Face, and it's worth your time if you deploy OCR anywhere outside a benchmark paper.

Three Model Tiers That Actually Matter for Deployment

PP-OCRv6 comes in three sizes: tiny (1.5M params, 80.6% det, 73.5% rec), small (7.7M params, 84.1% det, 81.3% rec), and medium (34.5M params, 86.2% det, 83.2% rec). These aren't arbitrary checkpoints. Each tier uses the same PPLCNetV4 backbone, so you get architectural consistency whether you're running on an edge device or a server pipeline. The medium and small tiers support 50 languages: Simplified and Traditional Chinese, English, Japanese, plus 46 Latin-script languages. One model family, one training approach, multiple deployment profiles.

What Changed Under the Hood: RepLKFPN and LightSVTR

Detection gets a new neck: RepLKFPN, a lightweight large-kernel feature pyramid network. For real-world inputs (small text, dense layouts, rotated labels, industrial characters), RepLKFPN handles multi-scale text without blowing up inference cost. Recognition switches to EncoderWithLightSVTR, mixing local convolutions with global attention. The combination cuts error rates on challenging crops like screen text, noisy images, and special symbols. PaddlePaddle's own benchmarks show the tiny model alone (1.5M params) hitting 80.6% detection Hmean, which is competitive with many larger models from a year ago.

Fifty Languages, One Model, Multiple Backends

PP-OCRv6 runs on Paddle Inference (default), Transformers via Hugging Face, or ONNX Runtime. Switching backends is a one-liner: engine="transformers". The structured JSON output feeds directly into document parsing, RAG pipelines, search indexing, or agent workflows. PaddleOCR 3.7 unifies the inference interface, so you don't rewrite your pipeline when moving from dev to production. The demo is live on Hugging Face Spaces right now.

For anyone building document ingestion or multilingual OCR into production, PP-OCRv6 is worth a serious look right now.

Source: PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
Domain: huggingface.co

PP-OCRv6 Hits 86.2% Detection Hmean at Just 34.5M Parameters

Three Model Tiers That Actually Matter for Deployment

What Changed Under the Hood: RepLKFPN and LightSVTR

Fifty Languages, One Model, Multiple Backends

More in Machine Learning