Processing 2.5 million contracts in 22 months, Doczy.ai hit 99 % accuracy, a leap from the 55 % of rule‑based systems.
Architecture that Scales
Doczy.ai’s pipeline starts with a Next.js front‑end authenticated by Amazon Cognito. Uploaded PDFs land in Amazon S3, then a Lambda function triggers Amazon Textract to pull text and metadata. The real engine is a patented "smart chunking" algorithm that preserves hierarchical structure and one‑to‑many relationships, far beyond flat token extraction. Chunked data feeds into a dual clustering engine: semantic embeddings cluster similar ideas, while structural pattern‑recognition identifies clause types, tables, and nested exhibits. Projection algorithms fuse the two views into a unified model, feeding large language models that generate structured JSON.
The system writes results to Snowflake, powering dashboards that let users query contract terms, flag discrepancies, and feed data into downstream systems like Coupa and Icertis. Amazon CloudWatch monitors latency and error rates; AWS Secrets Manager protects credentials. Over 22 months, Doczy.ai made 137 million API calls to Amazon Bedrock and processed 442 billion tokens.
Smart Chunking and Dual Clustering
Smart chunking keeps logical relationships intact by assigning sequential identifiers and metadata‑driven grouping. It removes duplication while preserving natural flow, enabling the dual clustering engine to operate on both meaning and structure. The semantic side converts text into embeddings; the structural side maps clause types and formatting. Projection aligns clusters, producing a richer document model that drives the 99 % accuracy.
LLMs then generate structured output, guided by prompts that evolve through few‑shot and multi‑shot examples. Each iteration refines the prompt based on real outputs, creating a feedback loop that continually boosts precision.
Business Impact
Doczy.ai’s automation reduced manual processing time by 97 %, freeing staff for higher‑value tasks. Clients in healthcare and financial services realized roughly $330 million in direct and indirect savings. The platform processes up to 250 000 contracts per week, automatically translating reimbursement terms into claims systems and flagging payment discrepancies before they occur.
By turning unstructured contracts into a queryable data asset, Doczy.ai gives organizations a strategic advantage: faster contract lifecycle management, reduced errors, and a clear path to monetizing contractual information.
Future deployments will integrate Doczy.ai’s SaaS offering with existing CLM systems, enabling near‑real‑time contract analytics and continuous optimization of financial outcomes.
Source: Automating contract intelligence with Doczy.aiTM on AWS
Domain: aws.amazon.com
Comments load interactively on the live page.