# documentai bbox benchmark
In my previous post, I talked a bit about the recent developments in the field of DocumentAI. Now comes the practical part. For the Attention v3 paper from the ExtractBench dataset, ExtractBench focused only on extraction, but I am also interested in the bounding box reference that the models return.
Because ExtractBench had only a very limited selection of models without any open-weight ones among them, I ran a few extractions via OpenRouter especially to see how well Qwen, Kimi, and Mistral are doing. So I took pages 1 and 13 from the FlashAttention-3 example from there and added "reference" bounding boxes with pdfplumber (it is a native PDF) as a reference. They are not perfect, but for a rough indication they are more than enough.
leaderboard
* for some models I did not manage to generate extraction and bbox in one run. For these I ran separate extraction + bbox prompts.
Note: For some models I could not really get consistent scores on OpenRouter even after several runs.
The bbox score is a bit over-engineered with coverage (for how many fields were bboxes generated?), intersection-over-union (to check how well the bbox "fits" the original one, also known as the Jaccard index), and centroid distance (to check if the bbox is roughly in the correct area):
prompts
ONE_SHOT_SYSTEM_PROMPT = "Return only valid JSON matching the provided JSON Schema."
one_shot_user_prompt = f"""
Only use the provided page images. They are not necessarily consecutive pages.
The original PDF has 22 pages. If the schema asks for number_of_pages, use 22.
Page mapping:
- input image 1: original PDF page 1, page_index 0
- input image 2: original PDF page 13, page_index 12
Each scalar extraction field is an object with value and bbox. Use bbox null when
the value is not visible in the provided page images. Boxes are [x1, y1, x2, y2].
JSON Schema:
{annotated_extraction_schema_json}
"""
I modified the original JSON schema a bit and added an additional bbox field to every value. See the example for the ids field:
{
"ids": {
"value": {
"type": ["string", "null"]
},
"bbox": {
"type": ["object", "null"],
"properties": {
"page_index": {
"type": "integer"
},
"box": {
"type": "array",
"items": {
"type": "number"
},
"minItems": 4,
"maxItems": 4
}
},
"required": ["page_index", "box"]
}
}
}
page 1
page 13