Implementation of CDISC at BI Overview CDISC

1. Motivation, Objectives and expected Benefits 2. System Landscape, Data Flow and Processes 3.

Implementing CDISC at BI (ICBI) - Motivation – Requests for analyses on substance/project databases SDB/PDB are increasing • need to effective use and exploit clinical data beyond single trials • need to build efficient substance databases – A harmonized data model based on CDISC allows for • • • a wider range of standard reporting tools re-use of standard programs facilitated familiarization with new trials/projects higher flexibility in assignments to projects quicker response to regulatory requests (same view on data) - BI has taken the decision to implement the CDISC data standards to effectively manage, exploit and report clinical data 4 IBM © 2009 IBM Corporation

ICBI - Objectives Corporate wide, Harmonized Clinical Data Structure 1. 2. IBM Operational data structure, allowing: - data quality checks - ADS/ADa. M generation - Ad hoc statistical analysis 3. 5 Effectual for: - single clinical trials - pooled databases (PDB) Based on the principles of the CDISC data standards © 2009 IBM Corporation

ICBI - Business Benefits Shown in three categories: 1. Submission / Regulatory Compliance 2.

ICBI - Business Benefits Submission / Regulatory Compliance 1 – Working with a data structure close to the one requested for Submission • Allows traceability from analysis data (ADa. M) back to raw data (BICDISC and plain SDTM) • allows for semi-automated generation of plain SDTM and define. xml • is a one time effort per submission • is less time consuming • creates no external costs – Having the same view on data as authorities • Increases transparency • Leads to higher efficiency / turn-around time in answering questions Standardized Data Structure will - further enhance compliance to regulatory requirements - allow more efficient creation of submission package 7 IBM © 2009 IBM Corporation

ICBI - Business Benefits Knowledge Generation Working with one data structure across trials: • • 2 Allows easier creation of PDB and pooling of trial data Leads to effective meta-analyses on project and/or substance level Increases re-use of standard programs, program templates and views Supports exchange between OPUs and functions (e. g. PK/PD, PGx, partners, …) Allows (semi-)automated load, transformation and incorporation of external data from vendors, suppliers, pharmaceutical and collaboration partners Leads to higher flexibility in assignments to trial & project tasks Reduces time to answer of internal (various customers, e. g. medical affairs) requests Reduces time to answer of external (regulatory) questions Standardized Data Structure will further enhance effective pooling of data and pooled analyses 8 IBM © 2009 IBM Corporation

ICBI - Business Benefits Effort & Time Saving Working with BI-CDISC facilitates downstream processes: 3 • Semi-automated generation of define. xml for SDTM and ADS/ADa. M • no review cycles for define. xml generated externally • Same view on data as authorities • increases transparency • results in higher efficiency in answering questions • A higher degree of automation, making use of metadata (CDR) • enables more efficient programming • reduces validation efforts • Reduces effort for creation of standard ADS/ADa. M Standardized Data Structure will - establish a higher level of standardization - further enhance analysis with reduced timelines 9 IBM © 2009 IBM Corporation

Chosen Approach for BI-CDSIC § In line with the recommendations of the SDTM and Analysis Datasets Implementation Expert Team for a CDISC data standards implementation we defined the following cornerstones for our data model: 1. Define a sponsor specific in-house data-structure (BI-CDISC) and create SDTM and ADa. M/ADS in parallel from there 2. Definition of transformation rules from BI-CDISC to SDTM and from BI-CDISC to ADa. M/ADS (but not creating ADS from SDTM) 3. The data model contains both collected and derived data 4. The data model will omit RELREC and SUPPQUAL (will only be created upon generation of plain SDTM for submission) 5. BI-CDISC will make use of the SDTM vocabulary • SDTM-vocabulary defined as variable metadata and controlled terminology, not the SDTM structure 6. BI-CDISC is defined by metadata and (long-term vision) metadata shall drive the transformations from this BI-CDISC to SDTM and ADa. M/ADS. Traceability from SDTM ADa. M is sufficiently granted by including the SEQ variable in CDR and inherit it to SDTM/ADa. M and/or metadata defining the various transformation steps 11 IBM © 2009 IBM Corporation

ICBI Data Flow through System Landscape Load from O*C and Transform in CDR (LSH) Trial Database / Substance DB O*C Trial Database Study Setup Data Load O*C Export Transform CDR 1 Pooled Database Submission To FDA Transform CDR 2 ADS Dev. Displays Dev. Master Mapping Table Trial specifics manually partially manually Meta info Trial 1 no Change as is no change Transform define. xml SDTM+ SDTM as is ADa. M Trial 2 no change define. xml as is no change Transform SDTM+ ADa. M Pool as is define. xml SDTM+ IBM SDTM, ADa. M, Tables, Listings, Profiles, + Metadata, define. xml as is SDTM Pooled DB 12 as is Final Report ADa. M © 2009 IBM Corporation

Cornerstones of ICBI § There will be no impact on early processes like study set up, data entry, and user friendliness of RDC. Data cleaning and discrepancy management remains in O*C § ICBI requires a certain upfront (once for each trial) effort for trial specific transformation to SDTM+ and its QC/validation § Once data are available in the O*C database, they are loaded into LSH. Loading is triggered by a completed Batch Validation session in O*C § After loading the data into LSH, they can be automatically transformed into the SDTM+ structure (Load and transformation steps can be combined in one LSH workflow) § ADS/ADa. M will be created from SDTM+ and form the basis for reporting § The submission data sets in plain SDTM are created by sub-setting and restructuring out of SDTM+ (can be automated) 13 IBM © 2009 IBM Corporation

Cornerstones of ICBI § The define. xml can be created semi-automatically taking the meta data available in LSH thus improving quality (inconsistencies) and timely delivery of final submission data sets § To gather all meta information needed for SDTM, ADS and define. xml a process needs to be implemented to capture the meta information throughout the process (see Module “Meta Data Collection and Master Mapping Table”) § To enable DQRM reporting to be based on SDTM+, the data need to be available in SDTM+ structure early/close to First Patient In § Training would be required for all functions working with the data in LSH. The O*C part of the process would not be effected (Overview training recommended only) 14 IBM © 2009 IBM Corporation

Overall Approach Mapping Table Sources OC Views • • 16 IBM T/PSAP ADS Plan Protocol a. CRF BI-DM O*C BI-DM Plain SDTM BI-DM Plain SDTM • SDTM Implementation Guide • CDISC Controlled Terminology • BI-DM User Requirements • BI PDB Requirements • BI GLIB CT (formats) • ADa. M IG • BI ADS Guideline • Data Quality Requirements © 2009 IBM Corporation

Overall Approach – Trials § Design Data Model based on two trials of indication A § Expand Data Model with two trials of indication B § Proove Data Model (Po. C) – Create Pooled Database (PDB) of all four trials – Re-create trial ADS from PDB – Create submission SDTM from PDB 18 IBM © 2009 IBM Corporation

Overall Approach – Teams Safety Treat/Exposure Lab/Ext. Data Efficacy • One Rep from each

Overall Approach – Scope for Teams Study A Study B O*C Views available for the studies used for mapping • are the starting point for the mapping • are divided up among the groups according to topics • topics are based on logical grouping of SDTM domains • Treat. - Exposure - TD 20 IBM • Efficacy • Safety • Lab - External Data © 2009 IBM Corporation

Using --SEQ… Ø --SEQ should not be used for any SAS/SQL evaluation Ø --SEQ is dynamically assigned and might change until a database is locked • If BI-CDISC datasets are created multiple times prior to lock then –SEQ will be assigned differently whenever rows/observations of data have been added or removed Ø Ø 22 In different snapshots of the same trial the value of --SEQ will not be consistently applied to common observations The Keys and Relations team does not consider the above points to be issues, (to maintain consistency in --SEQ would be very difficult / impossible to achieve, with little / no gain) IBM © 2009 IBM Corporation

I. Pooling Identifiers / Keys Proposed Variables are: 1. SUBSTANCE 8. --DT 2. PROJECT 9. --ONDT 3. STUDYID 10. --ENDT 4. USUBJID/PTNO 11. --CAT 5. VISITNUM 12. --SCAT 6. TPTNUM 13. --TESTCD 7. VISDT 14. --METHOD 15. --SPEC 23 IBM © 2009 IBM Corporation

ICBI – Interdomain Dependencies § Mappings are often not trivial – BI-CDISC variables should be derived only once and from one single source – Domains have to be created/populated in a defined order 24 IBM © 2009 IBM Corporation

CT Consolidation – LABNM Format § For LABNM (>1000 code/decodes) it was decided to split them out to three variables (LBTESTCD, LBSPEC and LBMETHOD) § In special cases additional variables required (position, fasting status, time, …) 25 IBM © 2009 IBM Corporation

Identified SDTM+ Topic SDTM Numeric dates/times Missing SDTM definitions Key concept 26 IBM SDTM(+) Workload plain Workload plus All dates are CHAR Keep O*C dates (NUM) (ISO 8601) and ISO 8601 dates in parallel Medium Low because all dates have to be transformed because NUM dates are kept and to ISO 8601 and NUM for analysis used for analysis. No backtransformation necessary no definition available for some variables in SDTM V 3. 1. 2 Have to be kept as plus variables: variables required into current XAE or XTRTGEN macro (N. B. – closely evaluate future need of variable as input to new X-Macros) Not possible to create ADS from plain SDTM, because required variable for XAE and/or XGENTRT macro. Will not be available with plain SDTM STUDYID USUBJID DOMAIN Meaningful Keys to be defined (based on content) Very High values of ID-variables are not unique across subjects. Only designed for merging parent domains to SUPPQUAL, CO, RELREC. Does not support merging by content across domains (e. g. XR to XD) STUDYID USUBJID DOMAIN --SEQ --GRPID --REFID --SPID Very low effort expected, because the variable needed in the macros can be extracted as is from the available PLUS variable without complex referencing, transformations, derivations or imputations R/B* B e. g. R e. g. Medium needs to be defined when creating SDTM+, beneficial for analysis & reporting (no additional work) R e. g. * R – required, B - beneficial © 2009 IBM Corporation

Identified SDTM+ Topic SDTM NUM - CHAR Variables are of type CHAR in general Example: USUBJID --ORRES Code Decode Only Decode (CHAR) Example: XRCAT EPOCH No SUPPQUAL 27 IBM SUPPQUAL Domain SDTM(+) Workload plain Workload plus R/B* Keep both, CHAR and NUM-type variables Example: USUBJID "PTNO" --ORRES "--ORRESN" Medium Numeric O*C values are converted to CHAR, then need to be converted back to NUM for analysis & reporting Low Convert once to CHAR for SDTM. Keep numeric values from O*C as a plus for analysis & reporting (no re-conversion) B Have Medium without formats it is not possible to reproduce all the options offered in the CRF Very low High Merging needed because information that clinically belongs together is scattered (search and merge). Medium Information that clinically belongs together is located in one Domain. One time effort to create plain SDTM (selecting and splitting). • Code (NUM) • associated SAS format & e. g. R e. g. • Decode (CHAR) No SUPPQUAL Domain, variables included in parent domain Additional meta data required to identify qualifier information destined to SUPPQUAL Additional variable that contains the qualifier information that is destined to SUPPQUAL B * R – required, B - beneficial © 2009 IBM Corporation

Identified SDTM+ Topic Date/time imputation SDTM Reported date/time (ISO 8601) SDTM(+) Have • reported date/time • imputed date/time Workload plain Workload plus R/B* High In case of incomplete dates, imputation needs to be done by hand (error prone process) Low If imputation rule is implemented in O*C views. Otherwise needs to be defined once for creation of SDTM+ B Medium Connection between SDTM data and CRF is not readily available Low Primarily to ease programming and help with debugging • imputation rule e. g. in parallel Relationship to CRF/DCM Not included Keep the DCM name where the variable originated from Tracking of same patient in multiple trials (e. g. extension trial information) • Previous Trial Number Low 28 IBM • Previous Patient Number could possibly be stored in the Subject Characteristic domain (SC). This needs to be investigated. Previous Trial Number Previous Patient Number should be scattered into the Subject Characteristic domain (SC). Very low B e. g. R The collected variables need to be copied from O*C into SDTM+ (DM domain? ). These two variables are collected at the site and need to be available in SDTM+ for CTR reporting and to facilitate reporting from the P/SDB. e. g. * R – required, B - beneficial © 2009 IBM Corporation

Dr. Jens Wientges IBM Global Business Services. Contacts Dr. Jens Wientges Peter Leister 29 IBM Mailto: wientges@de. ibm. com Mobile: + 49 160 5826897 Peter Leister Mailto: peter. leister@de. ibm. com Mobile: +49 160 3671761 © 2009 IBM Corporation