Report with publications on results of component algorithms

Summary
Task 7.7 Quantification of the impact of choosing different algorithms (M9-M60) (ARS, UMCU, GSK, JAN, Sanofi, CNR-IFC, FISABIO, UOSL, ULST, CHUT, UMCG, LAREB in collaboration with the Outcomes Validation Task Force (WP1, 2) and third parties who will provide expertise and/or data). This task will document and quantify the impact of using different algorithms to extract data in health care databases. Moreover, the strategy will ensure support for sensitivity analyses aimed at addressing assumptions on data semantics. Finally, the strategy will aim to ensure compatibility with the OHDSI tools adopted by the EHDEN initiative, to facilitate data sources which are mapped to the OMOP CDMSome examples of data transformation that we envision follow: •Last Menstrual Period (LMP)/ Gestational age/ Gestational age: We will document how the time of start and end of pregnancy can be derived for each pregnancy in each data source and allow for sensitivity analyses (for instance, restricting to women whose LMP is directly recorded).•Drugs: Transforming drug prescriptions or dispensings in time-stamped episodes of treatment requires harmonisation or derivation from existing data, which is currently missing. We will develop standard algorithms using available information and standard sets of assumptions (for instance, that each study subject assumes the drug according to the recorded prescribed dose, or assumes one DDD per day, or assumes one tablet per day, or assumes the dispensed drug uniformly between dispensings) so that dose- and duration- specific analyses may be possible. In particular, standard sets of assumptions will be developed addressing timing of exposure relative to Last Menstrual Period. •Breastfeeding: different case identification algorithms will be tested to identify breastfeeding in different data sources. This will be validated where possible in collaboration with the outcomes taskforce of WP1/2/7. •Maternal child linkage: To study the effects of drugs during pregnancy and lactation on the child, it is essential to link maternal health records with the child. Based on a literature overview and the work from EU-Peristat we will further identify and test methods for both deterministic and probabilistic linkage of mothers and children in each of the DAPs, and cross-validate linkage whenever possible.The process of documenting and testing the study variables will follow the ‘component analysis’ workflow. We will test different case-finding-algorithms for WP1/2 maternal, neonatal and long-term paediatric outcomes, as well as for other key variables (see examples above). The outcomes taskforce of WP1/2/7 will validate the main variables whenever possible using a gold standard. Statistical techniques for outcome validation & misclassification ascertainment will be further explored, including (Bayesian) Latent Class modelling, imputation-based techniques like regression calibration, multiple imputation of measurement error and the simulation extrapolation technique). Component analysis using data from different sources will be used to evaluate impact of different algorithms on population-based incidence rates of events and to produce estimates of validity. Such ‘novel’ validation techniques will be compared with ‘classic’ validation studies in the demonstration projects that will be conducted in WP1. We will also apply existing methods to use the results of validation to correct the EHR-based disease risk estimates for misclassification.