Social Science Research Council Research AMP Mediawell
Citation

A Global Review of Publicly Available Datasets for Ophthalmological Imaging: Barriers to Access, Usability and Generalizability

Author:
Khan, Saad M.; Liu, Xiaouxan; Naath, Siddharth; Korot, Edward; Faes, Livia; Wagner, Siegfried K.; Keane, Pearse A.; Sebire, Neil J.; Burton, Matthew J.; Denniston, Alastair K.
Publication:
The Lancet Digital Health
Year:
2021

Health data that are publicly available are valuable resources for digital health research. Several public datasets containing ophthalmological imaging have been frequently used in machine learning research; however, the total number of datasets containing ophthalmological health information and their respective content is unclear. This Review aimed to identify all publicly available ophthalmological imaging datasets, detail their accessibility, describe which diseases and populations are represented, and report on the completeness of the associated metadata. With the use of MEDLINE, Google's search engine, and Google Dataset Search, we identified 94 open access datasets containing 507 724 images and 125 videos from 122 364 patients. Most datasets originated from Asia, North America, and Europe. Disease populations were unevenly represented, with glaucoma, diabetic retinopathy, and age-related macular degeneration disproportionately overrepresented in comparison with other eye diseases. The reporting of basic demographic characteristics such as age, sex, and ethnicity was poor, even at the aggregate level. This Review provides greater visibility for ophthalmological datasets that are publicly available as powerful resources for research. Our paper also exposes an increasing divide in the representation of different population and disease groups in health data repositories. The improved reporting of metadata would enable researchers to access the most appropriate datasets for their needs and maximise the potential of such resources.