Скачать презентацию Case Study for Clinical Relevancy Asthma Scott T Скачать презентацию Case Study for Clinical Relevancy Asthma Scott T

2973b94b85479d0f0a6a18015807e02b.ppt

  • Количество слайдов: 41

Case Study for Clinical Relevancy: Asthma Scott T. Weiss, M. D. , M. S. Case Study for Clinical Relevancy: Asthma Scott T. Weiss, M. D. , M. S. Professor of Medicine Harvard Medical School Director, Center for Genomic Medicine Director, Program in Bioinformatics Associate Director, Channing Laboratory Brigham and Women’s Hospital Boston, MA BRIGHAM AND WOMEN’S HOSPITAL HARVARD MEDICAL SCHOOL

Outline • • • Context: focus on process and data Overview of Asthma DBP Outline • • • Context: focus on process and data Overview of Asthma DBP Smoking as an example of the data issues Predicting COPD in those with asthma Predicting asthma exacerbations Genetic prediction of asthma exacerbations current status • DNA collection • Lessons Learned • Conclusions

Context • Channing Lab - extensive genetics & pharmacogenetics resources focused on airways diseases Context • Channing Lab - extensive genetics & pharmacogenetics resources focused on airways diseases • Faculty with clinical, epidemiology, genetic, and bioinformatics training and experience • multidisciplinary research collaborative track record • Good i 2 b 2 driver: from bench to clinic • Strong focus and direction for Cores

Broad Goals of Channing Program in Predictive Medicine • • • Genetic variation clinical Broad Goals of Channing Program in Predictive Medicine • • • Genetic variation clinical practice Disease risk (asthma diagnosis) Natural history (exacerbations) Individual response to medication (pharmacogenetics) Develop predictive tests (genetic and nongenetic) in Channing populations • Validate these tests in Partners asthma cohort (PAC) at least as proof of concept

I 2 B 2 Airways DBP: Overview Partners Clinical Services Develop statistical models Predict I 2 B 2 Airways DBP: Overview Partners Clinical Services Develop statistical models Predict clinical outcomes after adjustment for covariates RPDR Extract important phenotypes from text: NLP Extract data from Airways Disease patients Extract relevant quantitative and coded phenotypes RPDR: Recruit, validate, genotype

Before we start • Numerous important covariates • e. g. age, tobacco, comorbidities, medications Before we start • Numerous important covariates • e. g. age, tobacco, comorbidities, medications • Adjust outcomes for covariates • Some (eg age, gender, Dx, encounter) readily available • Obtained through Core 4 • Others require substantial effort e. g. medications, tobacco use, comorbid conditions • Collaboration - NLP experts in Core 1

Phenotypes from text • Extract specific data items – Medication – Smoking status – Phenotypes from text • Extract specific data items – Medication – Smoking status – Diagnoses (Co-morbidity) • Extract findings to assist with case selection • Extract findings to assist with clinical predictions

Smoking Status- Examples SOCIAL HISTORY: The patient is married with four grown daughters, Smoker Smoking Status- Examples SOCIAL HISTORY: The patient is married with four grown daughters, Smoker uses tobacco, has wine with dinner. SOCIAL HISTORY: The patient is a nonsmoker. No alcohol. Non-Smoker SOCIAL HISTORY: Negative for tobacco, alcohol, and IV drug abuse. BRIEF RESUME OF HOSPITAL COURSE: Past Smoker 63 yo woman with COPD, 50 pack-yr tobacco (quit 3 wks ago), spinal stenosis, . . . SOCIAL HISTORY: The patient lives in rehab, married. Unclear smoking history ? ? ? from the admission note… HOSPITAL COURSE: . . . It was recommended that she receive …We also added Lactinax, oral form of Lactobacillus acidophilus to attempt a repopulation of her gut. Hard to pick SH: widow, lives alone, 2 children, no tob/alcohol. Hard to pick

Smoking -Text Processing No. Cases 2796 No. Attributes 50 No. Classes 5 Cases per Smoking -Text Processing No. Cases 2796 No. Attributes 50 No. Classes 5 Cases per class Denies smoking 146 Never smoked 427 Past smoker 952 Current Smoker 1010 Control cases 261 Manually classified

Smoking Status Preliminary results • • • Raw sample ~ 20, 000 reports Feature Smoking Status Preliminary results • • • Raw sample ~ 20, 000 reports Feature extraction >3000 Feature selection 25 - 1000 “Gold standard” sample cases ~ 2, 800 Correct classification rate 46 - 81% (compared to Gold Standard)

Smoking Status Preliminary results Data Set Classification Method Test Cases No. Features % Correctly Smoking Status Preliminary results Data Set Classification Method Test Cases No. Features % Correctly Classified Stemmed one-gram Naïve Bayes CV 10 x 917 80. 92 Stemmed one-gram Naïve Bayes CV 10 x 231 80. 46 One-gram SVM Split 2/3 50 79. 70 One-gram Naïve Bayes Split 2/3 50 78. 02 Bi-gram SVM Split 2/3 25 49. 57 Bi-gram Naïve Bayes Split 2/3 25 70. 73 Tri-gram SVM Split 2/3 25 44. 63 Tri-gram Naïve Bayes Split 2/3 25 65. 05 More … Baseline performance Increase, combine features should improve performance

Data Extraction “Raw” Patient Data Text Processing § Word/pattern filters § Stemming §Lexicon matching Data Extraction “Raw” Patient Data Text Processing § Word/pattern filters § Stemming §Lexicon matching § Parsing §… Feature Analysis §Classification §Clustering §Statistical Analysis §… ---------------------- Data Mining Pipeline “Smart Data” § Medications § Smoking status § Co-morbidity

Asthma Preceding COPD • Significant overlap of asthma and COPD DX • Common denominator Asthma Preceding COPD • Significant overlap of asthma and COPD DX • Common denominator = smoking • Asthma is known to precede and predict the development of COPD independent of smoking • Could we develop a multivariate clinical predictor that would predict which asthmatics would get COPD?

Study Design Source: Partners Healthcare Research Patient Data Repository (RPDR). RPDR: MGH, BWH, etc Study Design Source: Partners Healthcare Research Patient Data Repository (RPDR). RPDR: MGH, BWH, etc clinical repository for researchers. Training: 9349 asthmatics (843 COPD, 8506 controls) first encounter 1988 1998. Test: A future set of 992 asthmatics (46 COPD, 946 controls) first encounter from 1999 -2002.

Data Collection Criteria: Patients observed for at least 5 years, at least 18 at Data Collection Criteria: Patients observed for at least 5 years, at least 18 at the first encouter, and race, sex, height, weight, and smoking available. Comorbodities: International Classification of Diseases, 9 th Revision (ICD-9) codes as admission diagnosis or ER primary diagnosis (104) COPD: ICD-9 code for “Chronic Bronchitis”, “Emphysema” “Chronic Airways Obstruction, not otherwise specified. ”

Analysis Model: A Bayesian network was generated from the training set of 9349 asthmatics Analysis Model: A Bayesian network was generated from the training set of 9349 asthmatics (843 COPD, 8506 controls) encountered between 1988 and 1998 from 104 comoribities and race, gender, age, smoking. Results: The risk of COPD is modulated by gender, race, and smoking history, and 14 comorbidities: Viral and chlamydial infections, diabetes mellitus, volume depletion, acute myocardial infarction, intermediate coronary syndrome, cardiac dysrhythmias, heart failure, acute upper respiratory infections, acute bronchitis and bronchiolitis, pneumonia, early or threatened labor, normal delivery, shortness of breath, respiratory distress.

Network Model Network Model

Validation Propagation: a Bayesian network can compute the probability distribution of any variable given Validation Propagation: a Bayesian network can compute the probability distribution of any variable given an instance of some or all the other variables. Test data: a future set of 992 asthmatics (46 COPD, 946 controls) first encounter from 1999 -2002. Prediction: for each patient, predict the probability of COPD given the other elements in the network (comorbidities and demographics). Validation: compare the predicted with the observed COPD status.

Predictive Validation Predictive Validation

One variable at the time One variable at the time

Asthma Exacerbations • Asthma attacks involve worsening of asthma symptoms including bronchoconstriction and inflammatory Asthma Exacerbations • Asthma attacks involve worsening of asthma symptoms including bronchoconstriction and inflammatory response • Major cause of morbidity and mortality in asthma • 11. 7 million Americans have an exacerbation every year (3. 9 million children) • In US children, exacerbations are third leading cause of hospitalizations (198, 000 occurrences per year) • Cost of asthma exacerbations US=4 billion dollars, Partners=20 million dollars

RPDR Exacerbation Prediction RPDR Exacerbation Prediction

Genetic Prediction of Asthma Exacerbation Objective Predict asthma exacerbation from genetic data Subjects 290 Genetic Prediction of Asthma Exacerbation Objective Predict asthma exacerbation from genetic data Subjects 290 CAMP participants • Not on steroids • Followed for 10+ years • Have genetic data available Phenotype Case: Reported overnight hospitalization(s) (n=83) Control: No overnight hospitalizations or ER visits (n=207) Genotype 2443 SNPs from 349 candidate genes • In Hardy-Weinberg equilibrium among controls • Minor allele frequency > 0. 05

Exacerbation Model 132 of 2443 SNPs in 55 of 349 genes predict exacerbation Exacerbation Model 132 of 2443 SNPs in 55 of 349 genes predict exacerbation

Validation Method: Prediction on fitted values Result: Area under the ROC curve (AUROC) is Validation Method: Prediction on fitted values Result: Area under the ROC curve (AUROC) is 0. 97 AUROC measures accuracy as trade-off between sensitivity and specificity AUROC Rating 0. 5 - 0. 6 Fail 0. 6 - 0. 7 Poor 0. 7 - 0. 8 Fair 0. 8 - 0. 9 Good 0. 9 - 1. 0 Excellent AUROC = 0. 97

Cross-Validation Method: 20 -fold cross-validation to test robustness 1. Data is split into 20 Cross-Validation Method: 20 -fold cross-validation to test robustness 1. Data is split into 20 groups 2. One group is used as independent and remaining 19 are used to quantify the model 3. (2) is repeated until each group has been independent set Result: AUROC is 0. 84 (good) AUROC = 0. 84

Partners Asthma DNA collection #1 • • Recruit Partners asthma patients Partners Asthma Center, Partners Asthma DNA collection #1 • • Recruit Partners asthma patients Partners Asthma Center, NWH, MGH High quality spirometric phenotyping Blood for DNA extraction and storage Children and adults High cost (>$1000/subject) Low intensity 6 months only 100 subjects recruited • Doctors and patients need education

Partners Asthma DNA collection #2 • • Recruit Partners asthma cohort patients Leverage CRIMSON Partners Asthma DNA collection #2 • • Recruit Partners asthma cohort patients Leverage CRIMSON blood samples Leverage data mart for phenotype data Blood for DNA extraction and storage Children and adults cases and controls low cost (<$30/subject) High intensity 9 months >3000 subjects recruited

Figure 1 Data Flow for Asthma DBP Channing ADMPN# RPDR Send to RPD converts Figure 1 Data Flow for Asthma DBP Channing ADMPN# RPDR Send to RPD converts ADMPN# to MRN sends to pathology Pathology (Crimson) MRN Crimson ID# ADMPN sends back to Channing with sample for DNA extraction Figure 1 Legend Deidentified data file analyzed by Channing subjects for DNA collection selected. File sent to RPDR converted back to MR# and sent to Crimson. Samples identified and given Crimson ID# ≡ ADMPN and sample Sent back to Channing.

Recruitment for DBP from Crimson at BWH: Asthma Cases by Utilization and Race Recruitment for DBP from Crimson at BWH: Asthma Cases by Utilization and Race

Recruitment for DBP from Crimson at BWH: Asthma Cases and Controls by Race Recruitment for DBP from Crimson at BWH: Asthma Cases and Controls by Race

Summary of Samples to 04/07/08 Running total: High African American: 111 Low African American: Summary of Samples to 04/07/08 Running total: High African American: 111 Low African American: 222 Controls African American: 880 High Caucasian: 59 Low Caucasian: 454 Controls Caucasian: 1, 341

Lessons learned 1 • • Get what you ask for Regular meetings, regular meetings Lessons learned 1 • • Get what you ask for Regular meetings, regular meetings Negotiate your demands Tools are not enough Leverage your peers Recruiting patients is hard work IRB is hard work

Lessons learned 2 • You can never have enough statistics or bioinformatics • Genotyping Lessons learned 2 • You can never have enough statistics or bioinformatics • Genotyping and its technologies are secondary • The RPDR data are dirty! • Listen to Shawn • Be flexible

Summary: Airways disease as a driver for i 2 b 2 • • “Typical” Summary: Airways disease as a driver for i 2 b 2 • • “Typical” complex disease challenge Big impact on health care system Potential for large clinical impact Core 1: Extracting phenotypes from free text; statistical models • Core 2: Viewer for CRC • Core 4: Data provisioning

Conclusions • The stronger the existing program, the more successful the I 2 B Conclusions • The stronger the existing program, the more successful the I 2 B 2 collaboration • Communication is key • Fit the question to the data not the other way around • Data access will be an issue for the future

Collaborators (and what they did) • Scott, Zak, John, and Susanne: money, project management, Collaborators (and what they did) • Scott, Zak, John, and Susanne: money, project management, IRB, and big picture • Ross: Channing bioinformatics, file structures, geek to geek translation with the cores, beta testing, 850 collection, IRB, links to other genetic bioinformatics tools and projects • Shawn and Vivian: asthma and control data mart • Anne, LJ, James: nongenetic predictors in CAMP • Marco and Blanca: nongenetic predictors in PAC • Marco and Blanca: genetic predictors in CAMP • Marco and Blanca: genetic predictors in PAC • Lynn: Crimson

Acknowledgments: Ross Lazarus Blanca E. Himes Marco F. Ramoni Isaac Kohane Shawn Murphy Susanne Acknowledgments: Ross Lazarus Blanca E. Himes Marco F. Ramoni Isaac Kohane Shawn Murphy Susanne Churchill Anne Fuhlbrigge LJ Wei James Sigornivitch Lynn Bry