To Link or Not to Link:  Insights from comparison of CPRD Aurum and CPRD GOLD with linked Hospital Episode Statistics (HES) APC, HES Outpatient, and Cancer Registry data

Presented: ICPE 2023 in Halifax, Nova Scotia

Authors: Katrina Wilcox Hagberg, Catherine Vasilakis-Scaramozza, Rebecca Persson, George Kafatos, David E Neasham, Susan S Jick

Background: Clinical Practice Research Datalink (CPRD) Aurum and GOLD are valuable sources of primary care (GP) data. Linked data adds additional information about care provided in secondary and hospital settings. The decision to use GP data on their own or with linked data impacts patient selection, study design and interpretation of results. There are also practical factors to consider.

Objectives: To assess the presence of malignant breast cancer diagnoses in Aurum and GOLD, and in linked Hospital Episode Statistics (HES) Admitted Patient Care (APC), HES Outpatient (OP), and Cancer Registry (CR) data, and to describe lessons learned and relevance for future study decisions.

Methods: We selected female patients eligible for linkage with a malignant breast cancer diagnosis recorded in at least one data source: Aurum or GOLD, HES APC, HES OP, or CR (2004-2016 (CR data end)). We excluded patients who had malignant cancer at any site (except non-melanoma skin cancer), mastectomy, or tamoxifen or aromatase inhibitors prescriptions before start of follow-up. We described which data source(s) contained a malignant breast cancer diagnosis code.

Results: There were 71,113 eligible patients in Aurum and 29,928 in GOLD. Most patients had a malignant breast cancer diagnosis coded in the GP record and in HES APC/OP and/or CR (Aurum 84.9%, GOLD 83.7%). Approximately 7% of patients in Aurum or GOLD had a malignant breast cancer diagnosis recorded in the GP record only. While it is possible that patients missing concordant diagnoses in HES or CR represent provisional breast cancer, most (92.9% Aurum, 89.0% GOLD) had codes for relevant treatments or care in their GP record that supported the diagnosis. Additional breast cancer cases (those with no such diagnosis in the GP record) were captured in HES APC and CR, while very few additional cases were added through HES OP alone. Median age of cases recorded in linked data only was higher (HES 67 years, CR 68 years) than cases recorded only in GP data (60 years).

Conclusion: These findings indicate that patients with breast cancer are well captured in Aurum, and similar to the well-described GOLD, providing confidence in the use of Aurum for research on malignant breast cancer. Where complete case capture is important for specific study questions, researchers should consider linkage to HES APC or CR. The benefits of using linked data are study question dependent and must be weighed against impacts of shorter patient follow-up, data lags (specifically CR), reduced sample size, and geographic generalizability (linkages limited to England). These decisions may introduce selection bias and impact interpretation of results. Practical considerations include longer study timelines, increased administrative complexities, and added costs to access linked data.