Mistral OCR 4 adds structured document extraction for enterprise RAG

Mistral released OCR 4 on June 23 as a document AI model for turning files into structured input for search, retrieval, and agent systems. The main change is not that it extracts text. It returns layout-aware structure: bounding boxes, block classification, and confidence scores alongside the content.

That matters for enterprise AI because documents are not just strings. Contracts, filings, invoices, slide decks, forms, tables, signatures, and scanned PDFs carry meaning through layout. A retrieval system that loses that structure can cite the wrong region, miss a table, or make human review harder.

Mistral is positioning OCR 4 as the ingestion layer for that problem.

The output is the product

Mistral says OCR 4 classifies blocks such as titles, tables, equations, and signatures. It returns bounding boxes and inline confidence scores per page and per word. That gives downstream systems more than raw text to chunk.

The practical use cases are familiar: semantic chunking for RAG, source-grounded citations, redaction, form filling, invoice processing, compliance checks, and human-in-the-loop review. The difference is that the model output carries coordinates and confidence, so a user or system can trace an answer back to a document region.

Mistral says OCR 4 accepts common enterprise formats including PDF, DOC, PPT, and OpenDocument. It also says the model supports 170 languages across 10 language groups, with gains on specialized and low-resource languages.

Those are broad claims, so teams should still test their own documents. The useful point is the shape of the product: OCR is becoming document understanding infrastructure, not just a preprocessing utility.

Pricing makes batch workflows legible

Mistral lists OCR 4 API pricing at $4 per 1,000 pages. Batch API pricing is $2 per 1,000 pages, and Document AI is priced at $5 per 1,000 pages.

That page-based pricing matters because document AI often arrives as a back-office batch problem. A bank, law firm, insurer, or enterprise search team may need to process millions of pages before a user ever asks a question. Per-token pricing can be harder to plan for that workload because scanned documents, tables, images, and layout artifacts vary widely.

Mistral also says OCR 4 is compact enough to run in a single container and offers a self-hosting option for enterprise customers. That is the other half of the enterprise pitch. Some document sets cannot leave a controlled environment because of privacy, residency, regulatory, or customer requirements.

Benchmarks need attribution

Mistral says independent annotators preferred OCR 4 over every leading OCR and document AI system it tested, with average win rates of 72%, and that OCR 4 had the top overall score on OlmOCRBench at 85.20.

Those are useful claims, but they are still Mistral’s reported benchmark results. The post itself notes that automated benchmarks can carry scoring artifacts, which is why Mistral paired them with a human preference evaluation across more than 600 documents and more than 12 languages.

That is a reasonable evaluation pattern for document AI. Exact-string scoring can punish harmless formatting differences or miss whether the output is actually useful to a downstream workflow. Human preference is also subjective. The right buying test is still a team’s own corpus: messy scans, rotated pages, low-resource languages, tables, equations, handwriting, stamps, and the retrieval tasks that follow.

Document AI is becoming an agent dependency

The larger story is where OCR sits in the AI stack. Agents that reason over enterprise knowledge need reliable ingestion before they can retrieve, cite, redact, or act. If the document parser drops a table boundary or loses a confidence signal, the agent inherits that weakness.

Mistral links OCR 4 to its Search Toolkit public preview and says both OCR 4 and Document AI are available through Mistral Studio, Amazon SageMaker, Microsoft Foundry, and, later, Snowflake Parse Document. That distribution is aimed at teams that already build retrieval and data workflows in enterprise platforms.

The next checkpoint is evidence from production deployments. OCR 4 is promising if it helps teams keep source grounding intact from page to answer. It is less useful if structure disappears once the document enters a generic chunking pipeline.

Sources

Mistral AI: OCR 4