Identification and Mapping Real-World Data Sources for Heart Failure, Acute Coronary Syndrome, and Atrial Fibrillation

Background: Transparent and robust real-world evidence sources are increasingly important for global health, including cardiovascular (CV) diseases. We aimed to identify global real-world data (RWD) sources for heart failure (HF), acute coronary syndrome (ACS), and atrial fibrillation (AF). Methods: We conducted a systematic review of publications with RWD pertaining to HF, ACS, and AF (2010–2018), generating a list of unique data sources. Metadata were extracted based on the source type (e.g., electronic health records, genomics, and clinical data), study design, population size, clinical characteristics, follow-up duration, outcomes, and assessment of data availability for future studies and linkage. Results: Overall, 11,889 publications were retrieved for HF, 10,729 for ACS, and 6,262 for AF. From these, 322 (HF), 287 (ACS), and 220 (AF) data sources were selected for detailed review. The majority of data sources had near complete data on demographic variables (HF: 94%, ACS: 99%, and AF: 100%) and considerable data on comorbidities (HF: 77%, ACS: 93%, and AF: 97%). The least reported data categories were drug codes (HF, ACS, and AF: 10%) and caregiver involvement (HF: 6%, ACS: 1%, and AF: 1%). Only a minority of data sources provided information on access to data for other researchers (11%) or whether data could be linked to other data sources to maximize clinical impact (20%). The list and metadata for the RWD sources are publicly available at www.escardio.org/bigdata. Conclusions: This review has created a comprehensive resource of CV data sources, providing new avenues to improve future real-world research and to achieve better patient outcomes.


Background
Cardiovascular (CV) disease is the leading cause of death worldwide [1], accounting for >17 million deaths in 2015 alone [2]. According to the World Health Organization (WHO), the annual number of deaths due to CV diseases globally is projected to increase to 20.5 million by 2020 and 24.5 million by 2030 [3]. Moreover, in both This article is licensed under the Creative Commons Attribution 4.0 International License (CC BY) (http://www.karger.com/Services/ OpenAccessLicense). Usage, derivative works and distribution are permitted provided that proper credit is given to the author and the original publisher.
high-income and middle-income countries, the main cause of death has shifted over time from communicable to non-communicable diseases, with a high burden on national health systems [4].
Real-world data (RWD) have played a key role in CV disease-related decision-making, especially in recent years, due to a widening range of new therapies and increasing demands for justification of their effectiveness. Translating RWD into real-world evidence (RWE) can provide information throughout a product's life cycle [5]. RWE can help design pivotal phase 3 trials by reducing the required sample size, supporting recruitment, and thereby saving time [6] and informing the appropriate selection criteria [7,8]. RWE can provide outcomes of care in real-world settings, thus improving the external validity of clinical trial findings, and offer insights into coverage and payment decisions to support health authority decision-making [9,10]. However, limitations of RWD should also be acknowledged, which broadly include bias and confounding, incomplete data, different legal frameworks leading to restricted data sharing, and lack of universally accepted methodological standards [9][10][11]. In addition, the evidence landscape is constantly evolving with respect to the conduct and reporting of RWE studies. The recent retraction from major medical journals of apparently fraudulent RWD on COVID-19 [12] highlights the urgent need for more transparency and access to global data sources.
This review aimed to identify global RWD sources pertaining to heart failure (HF), acute coronary syndrome (ACS), and atrial fibrillation (AF) in order to facilitate new evidence research and improve patient outcomes. Our objective was to help global researchers move toward the FAIR principles for RWD -Findable, Accessible, Interoperable, and Reusable [13].

Methods
The European Union Innovative Medicines Initiative (IMI) public-private consortium launched the BigData@Heart project with the goal of developing a big data-driven translational research platform from RWE focussing on HF, ACS, and AF. Through this translational research platform, BigData@Heart aims to deliver clinically relevant disease phenotypes and support drug development and personalized medicine [14][15][16]. One of the undertakings of this initiative is to identify and characterize available RWD sources that would serve as a starting point to identify existing datasets that could help address research questions at scale.
A systematic literature search was conducted in MEDLINE and EMBASE using the OvidSP platform for the period January 2010-March 2018 to identify publications using RWD sources for HF, ACS, and AF. The review was not prospectively registered. We did not include publications before 2010 because older RWD sources may not be relevant to current practice. Disease-specific search strategies (using Medical Subject Headings terms) were combined with study design terms to identify research publications that either generated primary RWD or used existing RWD sources. Identified data sources from these publications were categorized according to predefined geographical locations: Europe; USA; Latin America/Canada (LaCan); and Asia-Pacific, Middle East, and Africa (APMA).

Inclusion and Exclusion Criteria
We included English-language publications using different data sources as defined by the authors, such as structured data sources (administrative data and registries), medical records or charts, insurance claims, health surveys, and observational studies for HF, ACS, and AF. Publications that did not generate primary RWD or did not study existing RWD sources, as well as guidelines, editorials, letters, and reviews, were excluded. Additionally, we excluded clinical trials or interventional studies, in vitro/preclinical studies, and data sources with <50 patients.

Screening, Selection, and Extraction of Data Sources
The search strategies are presented in the additional files, available online at www.karger.com/doi/10.1159/000520674 (HF: Additional File 1 [online suppl. Table 1], ACS: Additional File 2 [online suppl. Table 2], and AF: Additional File 3 [online suppl. Table  3]). All publications identified from the literature searches were first screened based on the title and abstract by a single reviewer, and duplicates were removed. The inclusion and exclusion criteria were applied at this stage to generate a list of full-text reviews. Of the included publications, 10% were randomly selected and checked for discrepancies, which were reconciled through group discussions. For the included publications, names of identified data sources, type (single-centre or multicentre), and geographical location were extracted using a predefined screening tool. Publications with the same data source were grouped by name, and data were extracted into a single record to avoid double counting of data sources in subsequent analyses. Thereafter, a list of unique data sources available from the literature search was prepared for each indication.
From this list, selected data sources were further mapped and extracted in detail based on different criteria for each disease indication: Data sources with larger samples sizes were prioritized with a view on big data and potentially more robust analyses. For the data sources identified based on the above criteria, information presented in the included publications was extracted, including additional information on data source details (description, coverage, and follow-up) and availability of clinically relevant key variables related to HF, ACS, and AF (diagnosis and staging, demographics, management [including procedures], test results and treatments, burden of disease [including costs], deaths and resource use, quality of life, and adverse events). In addition, publicly available information related to the data sources, such as the data source holder/owner, access and linkage possibility, supporting documentation, and its governance aspects, were extracted and recorded. DOI: 10.1159/000520674

Heart Failure
Of the 11,889 publications retrieved from the HF literature search, 1,326 unique data sources were identified, of which 322 RWD sources were selected for detailed mapping (Additional File 4: online suppl. Fig. 1). Overall, 74% of these data sources were disease-specific, with registries being the most common type of data source (45%). Geographic distribution is shown in Figure 1; 47% of the published HF data sources were from Europe, followed by the USA (21%), APMA (20%), LaCan (8%), and multiregional (4%). Germany had the highest number of data sources in Europe (n = 15); Japan, in APMA (n = 10); and Canada, in LaCan (n = 12). The top 5 HF data sources based on the highest number of publications are presented in Figure 2.
Completeness of variables varied across the mapped data sources and ranged from 0 to 78%. The most commonly recorded variables were age and gender (94%), hospital admissions (81%), comorbidities (77%), mortality (75%), and LVEF (73%; increased by selection criteria). The least recorded data variables were drug codes (10%), dates of procedures and prescriptions (7%), and caregiver involvement (6%). In terms of comorbidities, the proportion of HF data sources reporting ACS and AF as a comorbidity was 16% and 28%, respectively. Information on access to these data sources through purchasing, licencing, or collaboration with the dataset owners was reported for 6% of the sources, whereas it was unknown for the remaining sources. Linking of these data with other data sources was reported in 18% of the sources, whereas the possibility of linkage was unknown for the remainder.

Acute Coronary Syndromes
From the 10,729 publications retrieved through the literature search, 1,560 unique data sources were identified, of which 287 were further selected and mapped (Additional File 5: online suppl. Fig. 2). Over half of these data sources (52%) were from Europe; 25%, APMA; 9%, USA; and 8%, LaCan; 6% of the sources were multiregional (Fig. 3). The highest number of data sources was from Germany in Europe (n = 20), Japan in APMA (n = 21), and Canada in LaCan (n = 17). Over 80% of the mapped data sources were registries (Fig. 4) Recommended Therapies registry had the highest number of publications (n = 100) identified during the search period (Fig. 2). Completeness for recorded variables varied from 5% to 70%. The most commonly available clinical variables were age and gender (99%), mortality (95%), comorbidities (93%), inpatient diagnostic or therapeutic procedures (84%), and prescribed drugs (74%). The least recorded variables were date of ACS diagnosis (6%), dates of procedures and prescriptions (5%), procedure costs (3%), drug codes, and caregiver involvement and costs (1% each). The proportion of ACS data sources capturing the presence of HF and AF as a comorbidity was 27% and 8%, respectively. Information on access to these data sources was provided in 6% of the sources, and linkage of these data sources with other datasets was possible in 20%.

Atrial Fibrillation
From the 6,262 publications retrieved via the literature search, 701 unique data sources were identified, of which 220 data sources were further mapped (Additional File 6: online suppl. Fig. 3). Geographically, Europe had the highest number of data sources (40%), followed by the USA (30%), APMA (20%), and LaCan (7%); 4% of the sources were multiregional (Fig. 5). The highest number of data sources was from the United Kingdom in Europe (n = 13), Japan in APMA (n = 12), and Canada in LaCan (n = 14). Registries (42%) were the most common type of data sources, followed by administrative databases (18%), observational studies (17%), claims (13%), and surveys (10%) (Fig. 4). The top 5 data sources based on the highest number of publications are presented in Figure 2.
Coverage of variables differed across the mapped data sources, and their completeness ranged from 10% to 60%. The most widely reported data variables were age and gender (100%), comorbidities (97%), prescribed drugs (91%), stroke risk (81%), mortality (67%), and hospitalizations (66%), whereas the least reported variables were date of AF diagnosis (10%), drug codes (10%), quality of life (10%), and caregiver involvement (1%). HF and ACS as comorbidities were recorded for 92% and 49% of the AF data sources, respectively. Information on access to data sources was reported in 25% of the mapped sources, whereas for the remaining sources, the possibility of access was unknown. Linkage of these data sources with other data sources was possible in 28%.

Discussion
This review aimed to identify global RWD sources focussing on 3 common CV diseases and make them publicly available as a resource for researchers. Previous studies have identified RWD sources in disease areas such as chronic obstructive pulmonary disease [17] and Parkinson's disease [18] as well as generic RWD data sources [19], but to our knowledge, no study has reported RWD sources focussing on CV diseases across different geographies. We were able to map 322 RWD sources for HF, 287 for ACS, and 220 for AF. The mapping and provision of these sources in this review aims to enhance the generation of RWE across CV diseases. Importantly, we also define current limitations, such as lack of access to data, linkage with other sources, and insight on cross-comorbidity that should be improved in order to achieve maximum patient benefit from future RWE.
In December 2018, the US Food and Drug Administration (FDA) released a guidance document for the use of RWE to support regulatory decision-making for drugs and medical devices [20]. Similarly, in Europe, the European Medicines Agency (EMA), with its adaptive pathway initiative, highlighted RWE as an important source to further support evidence collected through randomized controlled trials (RCTs) [9]. In addition to the EMA and FDA, Health Technology Assessment International, in its global policy forum, presented the availability and use of RWE for health technology assessment [21], and the National Institute for Health and Care Excellence in the UK has documented the use of RWE in its decision-making [22]. In the context of the coronavirus pandemic, RWD has been used extensively to manage public health programmes, although the recent controversy and retraction of studies by leading journals have highlighted the need for robust evaluation before apparent RWD becomes RWE [12].
The growing importance of RWE can further be ascertained through many examples, including selected drug approvals during 1999-2014 by the FDA and EMA, which were largely based on uncontrolled studies for oncology and orphan indications [23]. For health technology assessments, certain outcomes such as costs and qualityadjusted life-years are often retrieved from non-RCT data [24]. With the growing need for RWE, we require varied, high-quality, and transparent sources of RWD to cater to different research objectives related to the epidemiology or burden of disease. Across the 3 CV indications, we found that most data sources were currently from Europe and to North Ameri-ca, but a growing number are now presented from the Middle East, Asia, Russia, and South America. The collection of RWD requires relatively high upfront investment, which might be more feasible in high-income countries. Among the European HF data sources, and consistent with other published data, the Swedish Heart Failure Registry was the most frequent source for generating RWE [25]. The most published data sources for ACS and AF were the Swedish Web-System for Enhancement and Development of Evidence-Based Care in Heart Disease Evaluated According to Recommended Therapies registry and the Danish nationwide-linked admin registries, respectively.
In the present review, for all the 3 CV conditions, demographics and comorbidities were the most commonly available variables, whereas costs and caregiver involvement were least reported. This could be because most of the identified data sources were registries. Moreover, cost to the healthcare system and caregiver involvement cannot be collected directly from patients, existing healthcare records, or medical charts. Data sources for HF also provided information on mortality, hospitalization, and LVEF. For ACS, information on mortality and prescribed drugs was captured in 95% and 74% of the data sources, respectively. For AF, other commonly reported variables were prescribed drugs, stroke risk, mortality, and hospitalization. Taken together, these data sources provide a wealth of information on patient characteristics and clinical burden; however, data pertaining to humanistic and economic burden are limited. These trends are similar to those observed in non-CV conditions [18,25].
In the absence of universally accepted methodological standards for data models and infrastructure, the accessibility, linkage, and comparability of RWD sources are currently a challenge. This can prevent the establishment of larger datasets by linking RWD to generate more robust and representative RWE [9,26]. In line with this, this review reports low accessibility and possibility of linkage based on information retrieved from the public domain. The alternative, i.e., personal communication with data holders, can be time-consuming and potentially unproductive. This challenge may be addressed through efforts in private-public collaborations and within the European framework; for example, dataset owners could be invited to the European Medical Information Framework catalogue [27], which allows users to explore populationbased data sources. Linking of these data sources requires harmonization similar to that in other large IMIs such as the European Health Data & Evidence Network [28]. Translation to clinical practice and the development of new RWE will be aided by integration and linkage of mo-   lecular and genetic studies with RWD sources; this developing field has the potential to enable more rapid translation of mechanistic studies to improve patient care. This review has certain limitations, including incomplete information on the RWD sources because of the limited information available in the public domain. Many databases may contain more data than are currently reported in the tool, and inversely, some variables may be recorded for only a subset of patients (e.g., LVEF). This review reflects the current state-of-the-art; however, RWD sources are continually being generated and revised. Key recent publications from the identified data sources are presented in Table 1 and demonstrate the broad impact that RWE can have on clinical practice. The consortium will update this review periodically (see www. escardio.org/bigdata for future updates), and the European Medical Information Framework catalogue is open for investigators to add or update information on their data sources. The risk of bias in the data sources was not assessed, and the selection of predominantly disease-specific registries may have introduced a bias with respect to the type of variables available; for example, we reported a large number of data sources with LVEF due to the selection criteria for detailed mapping. For some research questions, however, other data sources could be more suitable. Furthermore, this review was limited to Englishlanguage publications and may consequently underrepresent data sources from other regions. Finally, RWD are observational in nature and cannot replace RCTs to determine the unbiased efficacy of therapy. Treatment choices in clinical practice are dependent on a large array of prescription biases and confounding factors that limit the value of observational data [29]. However, RWE can complement clinical trial data, and allows an understanding of the epidemiology and interaction of diseases.

Conclusions
In summary, this review identified and mapped worldwide RWD sources pertaining to HF, ACS, and AF, thus providing researchers with a knowledge base to conduct feasibility assessments of these data sources for RWE studies. The list of and metadata for the data sources are publicly available at www.escardio.org/bigdata. Epidemiological research can be conducted using the wealth of individual data sources available. However, further details and access to the RWD sources, enhanced collaboration and harmonization between data holders (academia and industry), as well as integration of datasets would al-low for the generation of more complex and impactful evidence. This could support CV disease drug development, market access, and use of interventions in clinical practice, eventually leading to improved CV outcomes and patient well-being.

Statement of Ethics
An ethics statement was not required for this study type; no human or animal subjects or materials were used. DOI: 10.1159/000520674