Challenges With Quality of Race and Ethnicity Data in Observational Databases

Polubriaginof, Fernanda C G; Ryan, Patrick; Salmasian, Hojjat; Shapiro, Andrea Wells; Perotte, Adler; Safford, Monika M; Hripcsak, George; Smith, Shaun; Tatonetti, Nicholas P; Vawdrey, David K
Journal of the American Medical Informatics Association

We sought to assess the quality of race and ethnicity information in observational health databases, including electronic health records (EHRs), and to propose patient self-recording as an improvement strategy.We assessed completeness of race and ethnicity information in large observational health databases in the United States (Healthcare Cost and Utilization Project and Optum Labs), and at a single healthcare system in New York City serving a racially and ethnically diverse population. We compared race and ethnicity data collected via administrative processes with data recorded directly by respondents via paper surveys (National Health and Nutrition Examination Survey and Hospital Consumer Assessment of Healthcare Providers and Systems). Respondent-recorded data were considered the gold standard for the collection of race and ethnicity information.Among the 160 million patients from the Healthcare Cost and Utilization Project and Optum Labs datasets, race or ethnicity was unknown for 25%. Among the 2.4 million patients in the single New York City healthcare system’s EHR, race or ethnicity was unknown for 57%. However, when patients directly recorded their race and ethnicity, 86% provided clinically meaningful information, and 66% of patients reported information that was discrepant with the EHR.Race and ethnicity data are critical to support precision medicine initiatives and to determine healthcare disparities; however, the quality of this information in observational databases is concerning. Patient self-recording through the use of patient-facing tools can substantially increase the quality of the information while engaging patients in their health.Patient self-recording may improve the completeness of race and ethnicity information.