Model
mistral-medium-latest
Request Payload
Request payload
POST /v1/ocr (via client.ocr.process), model mistral-ocr-latest (also reproduced on mistral-ocr-4-0), structured document annotation with strict: true and number fields:
{
"model": "mistral-ocr-latest",
"document": {
"type": "document_url",
"document_url": "data:application/pdf;base64,<PDF_BASE64>"
},
"include_image_base64": true,
"include_blocks": true,
"document_annotation_prompt": "Extract the invoice fields into the schema. Numbers as plain decimals.",
"document_annotation_format": {
"type": "json_schema",
"json_schema": {
"name": "invoice",
"strict": true,
"schema": {
"type": "object",
"additionalProperties": false,
"required": ["company", "invoice_number", "items"],
"$defs": {
"Item": {
"type": "object",
"additionalProperties": false,
"properties": {
"product_code": {"anyOf": [{"type": "string"}, {"type": "null"}]},
"description": {"anyOf": [{"type": "string"}, {"type": "null"}]},
"quantity": {"anyOf": [{"type": "number"}, {"type": "null"}]},
"unit_price": {"anyOf": [{"type": "number"}, {"type": "null"}]},
"line_total_price": {"anyOf": [{"type": "number"}, {"type": "null"}]}
}
}
},
"properties": {
"company": {"type": "string"},
"invoice_number": {"type": "string"},
"items": {"type": "array", "items": {"$ref": "#/$defs/Item"}},
"subtotal_amount": {"anyOf": [{"type": "number"}, {"type": "null"}]},
"tax_amount": {"anyOf": [{"type": "number"}, {"type": "null"}]},
"total_amount": {"anyOf": [{"type": "number"}, {"type": "null"}]}
}
}
}
}
}
Input: a single-page PDF invoice with decimal line amounts (attached: sample.pdf).
Output
Output
The returned document_annotation is truncated, invalid JSON: a number field is emitted as the correct value, then expands into its full float64 decimal representation followed by an endless run of digits, never closing the value (company/invoice anonymized; the runaway line_total_price is verbatim):
{
"model": "mistral-ocr-latest",
"document_annotation": "{\"company\": \"ACME ELECTRICAL\", \"invoice_number\": \"PO-000000\", \"items\": [{\"product_code\": \"99999", \"description\": \"LIGHT 1200MM\", \"line_total_price\": 1487.500000000000255795384873613816452026367187500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001402960000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
"usage_info": {
"pages_processed": 1,
"doc_size_bytes": 45532
}
}
Intended value was 1487.5. Output = 1487.5 + its float64 binary→decimal tail (...0255795384873613816...) + an unbounded digit run → the value and the rest of the document are never closed.
Expected Behavior
Each number field is emitted as a finite decimal and the response is well-formed JSON conforming to the schema (e.g. "line_total_price": 1487.5).
Additional Context
Environment: mistralai Python SDK, endpoint ocr.process document annotations. Models mistral-ocr-latest (= mistral-ocr-2505) and mistral-ocr-4-0 both affected.
Reproduction conditions:
- Reproduces only with
strict: true. Same schema / model / prompt / PDF does not degenerate when strict is off appears to be a constrained-decoding interaction with the unbounded JSON number grammar.
- Requires
number (float) fields in the schema and a PDF with decimal values.
- Intermittent and document-dependent: ~70% of runs on one specific invoice, ~0% on other invoices, ~10% across a set of 20. Same document + same request, re-running eventually triggers it.
include_blocks / include_image_base64 have no effect.
- The truncation always lands inside a
number field.
How we found it: extracting invoices at scale, some documents intermittently returned unparseable JSON. Re-running the exact same request on the same PDF reproduced it ~70% of the time on the worst document.
Reproduction note: the attached sample.pdf triggers it ~70% of runs, please re-run a few times if the first is clean. Happy to share more sample documents privately.
Suggested Solutions
No response
Model
mistral-medium-latest
Request Payload
Request payload
POST /v1/ocr(viaclient.ocr.process), modelmistral-ocr-latest(also reproduced onmistral-ocr-4-0), structured document annotation withstrict: trueandnumberfields:{ "model": "mistral-ocr-latest", "document": { "type": "document_url", "document_url": "data:application/pdf;base64,<PDF_BASE64>" }, "include_image_base64": true, "include_blocks": true, "document_annotation_prompt": "Extract the invoice fields into the schema. Numbers as plain decimals.", "document_annotation_format": { "type": "json_schema", "json_schema": { "name": "invoice", "strict": true, "schema": { "type": "object", "additionalProperties": false, "required": ["company", "invoice_number", "items"], "$defs": { "Item": { "type": "object", "additionalProperties": false, "properties": { "product_code": {"anyOf": [{"type": "string"}, {"type": "null"}]}, "description": {"anyOf": [{"type": "string"}, {"type": "null"}]}, "quantity": {"anyOf": [{"type": "number"}, {"type": "null"}]}, "unit_price": {"anyOf": [{"type": "number"}, {"type": "null"}]}, "line_total_price": {"anyOf": [{"type": "number"}, {"type": "null"}]} } } }, "properties": { "company": {"type": "string"}, "invoice_number": {"type": "string"}, "items": {"type": "array", "items": {"$ref": "#/$defs/Item"}}, "subtotal_amount": {"anyOf": [{"type": "number"}, {"type": "null"}]}, "tax_amount": {"anyOf": [{"type": "number"}, {"type": "null"}]}, "total_amount": {"anyOf": [{"type": "number"}, {"type": "null"}]} } } } } }Input: a single-page PDF invoice with decimal line amounts (attached:
sample.pdf).Output
Output
The returned
document_annotationis truncated, invalid JSON: anumberfield is emitted as the correct value, then expands into its full float64 decimal representation followed by an endless run of digits, never closing the value (company/invoice anonymized; the runawayline_total_priceis verbatim):{ "model": "mistral-ocr-latest", "document_annotation": "{\"company\": \"ACME ELECTRICAL\", \"invoice_number\": \"PO-000000\", \"items\": [{\"product_code\": \"99999", \"description\": \"LIGHT 1200MM\", \"line_total_price\": 1487.500000000000255795384873613816452026367187500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001402960000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", "usage_info": { "pages_processed": 1, "doc_size_bytes": 45532 } }Intended value was
1487.5. Output =1487.5+ its float64 binary→decimal tail (...0255795384873613816...) + an unbounded digit run → the value and the rest of the document are never closed.Expected Behavior
Each
numberfield is emitted as a finite decimal and the response is well-formed JSON conforming to the schema (e.g."line_total_price": 1487.5).Additional Context
Environment:
mistralaiPython SDK, endpointocr.processdocument annotations. Modelsmistral-ocr-latest(=mistral-ocr-2505) andmistral-ocr-4-0both affected.Reproduction conditions:
strict: true. Same schema / model / prompt / PDF does not degenerate whenstrictis off appears to be a constrained-decoding interaction with the unbounded JSONnumbergrammar.number(float) fields in the schema and a PDF with decimal values.include_blocks/include_image_base64have no effect.numberfield.How we found it: extracting invoices at scale, some documents intermittently returned unparseable JSON. Re-running the exact same request on the same PDF reproduced it ~70% of the time on the worst document.
Reproduction note: the attached
sample.pdftriggers it ~70% of runs, please re-run a few times if the first is clean. Happy to share more sample documents privately.Suggested Solutions
No response