YOLOv11m - Question Segmentation for PDFs

A fine-tuned YOLOv11m model designed to detect and segment questions in Turkish educational documents (PDFs).

  • Task: Object Detection
  • Classes: question (Single class)
  • Resolution: 1280 x 1280

Model Details

  • Base Model: yolo11m (Ultralytics)
  • Parameters: ~20M
  • Training Epochs: 44
  • Compute: Trained for ~4 hours on an NVIDIA RTX 4090 Mobile GPU.
  • Precision: 0.971 (Very Low False Positives)
  • mAP@50: 0.716

Intended Use & Limitations

This model is optimized for extracting question blocks from dense test papers, worksheets, and exam booklets.

✅ Best For

  • Two-Column Tests: Standard exam layouts where questions are split into columns.
  • Dense Worksheets: Pages packed with questions.
  • LGS Style: It potentially works for single-column LGS-style next-generation questions detailed with graphics, though performance is robustest on two-column layouts.

⚠️ Limitations

  • Header Merging: In some cases, the model might accidentally merge the test header/title with the first question.
  • Answer Format: The model is heavily biased towards typical "choice-based" questions (A, B, C, D, E). It may fail to detect questions that lack these choice markers or follow an open-ended format.
  • Weird Layouts: Extremely irregular layouts or overlapping text boxes might confuse the boundary checks.
  • Confidence: It is recommended to use a confidence threshold of 0.3 - 0.4 depending on the specific test.

Usage & Best Practices

from ultralytics import YOLO
from ultralytics.utils.downloads import safe_download

# Load the model
model_url = "https://huggingface.co/erayyapagci/yolo11m-question-segmentation/resolve/main/yolov11m-question-seg.pt"
model = YOLO(safe_download(model_url))

# Run Inference
# Recommended conf: 0.3 - 0.4
results = model("page_image.jpg", imgsz=1280, conf=0.35)

# Show results
results[0].show()

Example Output

Here is a side-by-side comparison on a real ÖSYM test sample: Example Output

Filtering False Positives (Heuristics)

To further eliminate false positives (e.g., random paragraphs detected as questions), it is highly recommended to use an OCR library (like Tesseract, EasyOCR, or PaddleOCR) on the cropped question image.

  • Check for Question Numbers: verify if the text starts with a number pattern like 1., 2), Soru 3:.
  • Check for Choices: verify if the text contains multiple choice markers like A), B), C), D), E).
  • If a detected box contains neither, it is likely a false positive (header, instruction text) and can be discarded.

Training Data & Citations

Trained on a dataset of 14,693 images (after strict filtering) sourced from 10 public Roboflow datasets. The data was split using Document-Aware Splitting to ensure no data leakage between training and validation sets.

We gratefully acknowledge the keys datasets used in this training:

  1. PDF Soru Cikarma (tanimazsinu): Link
  2. WholeQuestionDetection (Gazi University): Link
  3. ExamBuddy (ExamBuddy): Link
  4. Questions (Terry Li): Link
  5. Question Parsing from Document (Sefa): Link
  6. Question Dedector (Nur Etinkaya): Link
  7. Sorukes (Sorualgilama): Link
  8. Question Detection (Cognizen): Link
  9. Questions2 (Fiver): Link
  10. Question-New (Question): Link
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support