Batch Bedrock Inference Cuts Document Extraction Costs 50% at Scale

Fifty percent cheaper inference for bulk document extraction, and still under 15 minutes for a thousand PDFs — that's the number that matters here.

AWS engineers Tim Shear, Cecilia Li, and Said Benallal published a production-ready pattern that pairs an on-demand Amazon Bedrock inference pipeline with a batch one, all driven by SQS queues and managed through Bedrock Prompt Management. The batch pipeline costs half what the on-demand pipeline costs for the same Bedrock model invocations, as confirmed in their testing.

Dual Pipelines, Single Trigger — SQS Routes the Work

The architecture is straightforward: an on-demand FIFO SQS queue triggers a Lambda function that converts scanned PDF pages to images and calls Bedrock's Converse API. Results land in DynamoDB within seconds. For batch jobs, a standard SQS queue feeds a separate Lambda that groups messages, creates JSONL files, and submits a single Bedrock batch inference job. A post-processing Lambda handles output when the batch completes.

Why two queues? FIFO guarantees exactly-once delivery and ordered processing for time-sensitive requests. The standard queue drops ordering guarantees but supports higher throughput for bulk — sensible engineering.

Dynamic Model and Prompt Selection Per Document

Land lease documents from different counties arrive in wildly different formats — numbered lists, tables, even handwritten drawings. This system solves that by encoding the prompt ID, prompt version, and model ID in the SQS message body. The Lambda fetches the appropriate prompt from Bedrock Prompt Management at runtime, so a single pipeline can handle varied layouts without per-document code changes.

Same approach for batch: each message in the queue carries its own prompt and version. The batch Lambda retrieves the prompt text from Bedrock Prompt Management, writes it into the JSONL input, and the batch job uses the correct prompt per record. Only restriction: all documents in a single batch job must use the same model ID. The Lambda handles that by polling the most frequent model ID from the queue.

1,000 Documents in 15 Minutes, 50% Cheaper

Using Python's multiprocessing module inside the Lambda functions, the batch pipeline processes 1,000 documents in under 15 minutes. That's Lambda execution time, not the entire batch job lifecycle, but still a concrete benchmark. And the 50% cost reduction on Bedrock inference for batch vs. on-demand makes the tradeoff obvious for any backlog of hundreds of millions of scanned PDFs — the exact scenario the authors cite from a real customer.

AWS Batch is mentioned as a next-step scaling option for tens of thousands of documents per single batch inference job. The pattern is designed to evolve.

What This Enables

Dynamic prompt management at the document level within a single pipeline means you can onboard new document types without rebuilding infrastructure. That's the real unlock — not just cheaper inference, but cheaper operations. For any organization sitting on a mountain of scanned records, this architecture turns a static batch process into a flexible, multi-format extraction engine.

Source: Extract Data with On-demand and Batch Pipelines Dynamically
Domain: aws.amazon.com