DocHighlight: A Real-World Dataset for Document Specular Highlight Removal

This repository contains the DocHighlight dataset for the paper "Towards Real-World Document Specular Highlight Removal: The DocHighlight Dataset and DocSHRNet Method" published in Pattern Recognition and Computer Vision (PRCV 2025).

DocHighlight is a large-scale, high-resolution dataset specifically designed for document specular highlight removal. The dataset comprises 2,201 rigorously aligned paired images captured under diverse real-world conditions using a polarization-based acquisition pipeline, featuring:

Various document types: books, magazines, receipts, and graphical content
Diverse illumination conditions: varying color temperatures, brightness levels, and lighting angles
Multiple capture devices: different camera types to ensure diversity
High resolution: average 2924×3672 pixels (range: 1034×737 – 3468×4624)
Real-world highlights: manual quality verification for reliable ground truth

The reference implementation DocSHRNet with training and inference code is available at 👉 https://github.com/shallweiwei/DocSHRNet.

📥 Download

The dataset is available via the following links:

Baidu Netdisk
Quark Netdisk

📝 Usage Notes

🔒 Non-commercial use only (CC BY-NC-SA 4.0).

📚 Citation

If this dataset is useful in your research or product, please cite our paper:

@InProceedings{xu2026dochighlight,
  author="Xu, Haowei
  and Zhang, Jiaxin
  and Cheng, Hiuyi
  and Zhang, Peirong
  and Zheng, Xuhan
  and Jin, Lianwen",
  title={{Towards Real-World Document Specular Highlight Removal: The DocHighlight Dataset and DocSHRNet Method}},
  booktitle="Pattern Recognition and Computer Vision",
  year="2026",
  publisher="Springer Nature Singapore",
  address="Singapore",
  pages="109--124",
  isbn="978-981-95-5676-2"
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocHighlight: A Real-World Dataset for Document Specular Highlight Removal

📥 Download

📝 Usage Notes

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DocHighlight: A Real-World Dataset for Document Specular Highlight Removal

📥 Download

📝 Usage Notes

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages