YOLOv11m - Question Segmentation for PDFs

A fine-tuned YOLOv11m model designed to detect and segment questions in Turkish educational documents (PDFs).

Task: Object Detection
Classes: question (Single class)
Resolution: 1280 x 1280

Model Details

Base Model: yolo11m (Ultralytics)
Parameters: ~20M
Training Epochs: 44
Compute: Trained for ~4 hours on an NVIDIA RTX 4090 Mobile GPU.
Precision: 0.971 (Very Low False Positives)
mAP@50: 0.716

Intended Use & Limitations

This model is optimized for extracting question blocks from dense test papers, worksheets, and exam booklets.

✅ Best For

Two-Column Tests: Standard exam layouts where questions are split into columns.
Dense Worksheets: Pages packed with questions.
LGS Style: It potentially works for single-column LGS-style next-generation questions detailed with graphics, though performance is robustest on two-column layouts.

⚠️ Limitations

Header Merging: In some cases, the model might accidentally merge the test header/title with the first question.
Answer Format: The model is heavily biased towards typical "choice-based" questions (A, B, C, D, E). It may fail to detect questions that lack these choice markers or follow an open-ended format.
Weird Layouts: Extremely irregular layouts or overlapping text boxes might confuse the boundary checks.
Confidence: It is recommended to use a confidence threshold of 0.3 - 0.4 depending on the specific test.

Usage & Best Practices

from ultralytics import YOLO
from ultralytics.utils.downloads import safe_download

# Load the model
model_url = "https://huggingface.co/erayyapagci/yolo11m-question-segmentation/resolve/main/yolov11m-question-seg.pt"
model = YOLO(safe_download(model_url))

# Run Inference
# Recommended conf: 0.3 - 0.4
results = model("page_image.jpg", imgsz=1280, conf=0.35)

# Show results
results[0].show()

Example Output

Here is a side-by-side comparison on a real ÖSYM test sample:

Filtering False Positives (Heuristics)

To further eliminate false positives (e.g., random paragraphs detected as questions), it is highly recommended to use an OCR library (like Tesseract, EasyOCR, or PaddleOCR) on the cropped question image.

Check for Question Numbers: verify if the text starts with a number pattern like 1., 2), Soru 3:.
Check for Choices: verify if the text contains multiple choice markers like A), B), C), D), E).
If a detected box contains neither, it is likely a false positive (header, instruction text) and can be discarded.

Training Data & Citations

Trained on a dataset of 14,693 images (after strict filtering) sourced from 10 public Roboflow datasets. The data was split using Document-Aware Splitting to ensure no data leakage between training and validation sets.

We gratefully acknowledge the keys datasets used in this training:

PDF Soru Cikarma (tanimazsinu): Link
WholeQuestionDetection (Gazi University): Link
ExamBuddy (ExamBuddy): Link
Questions (Terry Li): Link
Question Parsing from Document (Sefa): Link
Question Dedector (Nur Etinkaya): Link
Sorukes (Sorualgilama): Link
Question Detection (Cognizen): Link
Questions2 (Fiver): Link
Question-New (Question): Link

Downloads last month: 38