- Количество слайдов: 62
Driving Discovery Through Data Integration and Analysis John Quackenbush Molecular Diagnostics World 2010 28 October 2010
Disease Progression and Personalized Care Birth Treatment Natural History of Disease Clinical Care Environment + Lifestyle Outcomes Treatment Options Disease Staging Patient Stratification Early Detection Genetic Risk Biomarkers Quality Of Life Death
Turning the vision into a reality Assure access to samples and rational consent Develop a technology platform Make information integration as a central mission Conduct research as a vital component Present data and information to the local community Enable research beyond your own Engage corporate partners Communicating the mission to the community.
Assure Access to Samples
Access, Research, Security Patients want to be part of the process of curing disease Informed consent needs to be structured to allow patients to be partners in the research process HIPPA requires both informed consent and that we assure patient confidentiality But “identifiability” is a moving target in a genomic age With the <$1000 genome, in the age of Facebook, what this means remains unclear The new Genomics is a disruptive technology.
Develop a Technology Platform
2006: State of the Art Sequencing PRODUCTION Rooms of equipment Subcloning > picking > prepping 35 FTEs 3 -4 weeks SEQUENCING 74 x Capillary Sequencers 10 FTEs 15 -40 runs per day 1 -2 Mb per instrument per day 120 Mb total capacity per day Sequencing the genome took ~15 years and $3 B
2008: Enabling a New Era in Genome Analysis PRODUCTION 1 x Cluster Station 1 FTE 1 day SEQUENCING 1 x Genome Analyzer Same FTE as above 1 run per 5 days 15 Gb per instrument per run >3 Gb per day (1 x genome coverage) We can now re-sequence the genome in a ~1 week
The Challenge New technologies inspired by the Human Genome Project are transforming biomedical research from a laboratory science to an information science We need new approaches to making sense of the data we generate The winners in the race to understand disease are going to be those best able to collect, manage, analyze, and interpret the data.
Make information integration as a central mission
http: //compbio. dfci. harvard. edu Gene RNA Gene Index Databases Protein TM 4 Microarray Software Network Patient Predict Network Candidate Gene(s) Perturb Network (RNAi) Assay Response (m. A) Resourcerer Other Databases Other tools Me. SHer Cluster. Med Bayesian Nets Central Warehouse DNA Microarray Analysis Other Things: Mesoscopic Expression Correlated Signatures State Space Gene Models Tiling Arrays to Genes
Dealing with an Information Overload
Beating Information Overload Clinical Data Genomics Cytogenomics Metabolomics Transcriptomics Central Warehouse Chemical Biology Clinical Trials Etc. Epigenomics Proteomics Improved Diagnostics Individualized Therapies More Effective Agents Pub. Med The Hap. Map The Genome Disease Databases (OMIM) Published Datasets Drug Bank
misc Dana Farber Clinical Systems Pub. Med Gen. Bank Rules Engine Web Center Portal BAM Dashboard Portals Business Intelligence Partners OMICS IDX Rx Lab E n te r p r i s e S e r v i c e B u s Dana Farber Lab External Dana-Farber Research DB Conceptual Architecture Clinical Trial Idm & Security HTB ODS genomics Web Service Directory BPEL …… Custom De-identification Terminology EMPI A Facts C …. . Mapping A D Facts C B D Severity Score Clinical Pathways B Security Auditing RFID Build or Buy Oracle Existing
An Example: Signature Analysis Warehouse Array Express Fenglong Liu GEO Random Websites Aedin Culhane, Thomas Schwarzl, Joe White, Fenglong Liu, Kerm Picard
Gene. Chip Oncology Database Fenglong Liu
Gene. Chip Oncology Database I’s B E by las d e At c n la io ep ss r e be xpr to E n e oo en S G Fenglong Liu
An Example: Signature Analysis Pub. Med Kerm Picard Array Express Warehouse Analysis Fenglong Liu GEO Random Websites In-House Studies Aedin Culhane, Thomas Schwarzl, Joe White, Fenglong Liu, Kerm Picard
Gene. Sig. DB – release 2 http: //compbio. dfci. harvard. edu/genesigdb
Gene. Sig. DB – comparing cancers
Cancer is a Cell-Cycle Disease Aedin Culhane, Daniel Gusenleitner
Breast Cancer has unique signatures Aedin Culhane, Daniel Gusenleitner
A sample research question How many Multiple Myeloma patients, with bone marrow or blood samples in the bank, and who have a chromosome 13 deletion, responded (complete, partial, or minor remission) to therapy and how many did not respond?
A Path Forward We are working to develop a two-way strategy for future Clinic → Lab → Clinic Consider Oncotype. Dx This approach represents the intellectual framework for future success – and the bridges between the various laboratories and programs.
Conduct research as a vital component
Bayesian Networks Amira Djebbari Raktin Sinha Dan Schlauch
When we say “Networks” we mean… Genes are represented as “nodes” Interactions are represented by “edges” Edges can be directed to show “causal” interactions Edges are not necessarily direct interactions
Bayesian network - example Conditional Edges represent dependencies probability table at Gene 1 node “Gene 2” Gene 1 -1 0 1 Gene 2=1|Gene 1 0. 2 0. 7 Gene 2 Gene 3 Gene 4 Learning Bayesian networks: Structure Conditional probability tables
Bayesian networks - priors No free lunch theorem (Wolpert & Mac. Ready, 1996): The performance of general-purpose optimization algorithm iterated on cost function is independent of the algorithm when averaged over all cost functions. Suggests that when considering a specific application one can introduce a potentially useful bias using domain knowledge
A low-cost lunch? One can “help” the search along by providing a seed structure representing what we believe is the most likely network The network search process will then use gene expression data to look for perturbations on the structure that are supported by the data There are many possible sources of prior structures including the Biomedical literature and large-scale interaction studies (PPI)
Bayesian networks using microarray data and literature Test Set: Golub et al. ALL/AML dataset Learn BN with literature network as prior structure, Protein-Protein Interaction data (PPI), and literature+PPI Perform 200 bootstrap network estimations and find links that are “high confidence” Compare without prior (microarray data only) vs. with prior structure from the literature to look for known interactions.
BN: No Priors Amira Djebbari
BN: PPI Data Amira Djebbari
BN: Literature Priors Amira Djebbari
BN: Literature + PPI Cell Cycle Gene Subnetwork Amira Djebbari
Improving the Seeds Co-occurrence does not a provide directionality for interactions, but a BN is a DAG and our assignment is ad hoc The literature contains information about how we the genes (and their products) interact The challenge is extracting that information from the literature—there is too much to read Text mining doesn’t work well for the biomedical literature.
Improving the Seeds (2) Solution: Use a hybrid approach! Use text-mining tools to find sentences that contain names of two or more genes Use the Amazon Mechanical Turk to extract [subject]—[predicate]—[object] triples Define relationships between genes based on the “consensus” interaction Combine these results with pathway databases to build seed networks.
“Predictive. Networks” seeds from the literature
Present data and information to the local community
LGRC Research Portal
LGRC Research Portal
PAGE DETAILS - View aggregate statistics - View cohort details - Build cohort sets - Build composite phenotypes Actions: -Go to data download for selected cohort -Go to assay detail for selected cohort -Go to cohort manager
LGRC Research Portal
PAGE DETAILS Search -Facets -Search within results -Keyword prompts -Search history Table: -Paged results -Sortable columns Actions: -Go to Gene detail page -Add genes to ‘gene set’
PAGE DETAILS Annotation summary & summary view for each assay/data type: Accordion style sections Annotation Summary Gene Expression Summary -GEXP – expression profile across major Dx categories -RNASeq – Exon structure of the gene -SNPs – Table of SNPs in region of gene, highlighting association with major Dx group - Methylation – Methylation profile in region around gene -Genomic alterations – table of CNVs & alterations observed w/ freq in region around gene Actions: - Click through to assay detail page -Add gene to set RNASeq
LGRC Research Portal
Analysis Tools Cohort 1: Cohort 2: PAGE DETAILS Set 1 Set 2 Job name: -Very minimal parameters and options…here just 2 cohorts of interest, maybe p-value cutoff My job 1 View analysis parameters Generates comprehensive report Start Analysis Edit in place results – Don’t set parameters, edit the results Analysis goes into queue, email notification when finished Job Status Running
Analysis of Differential Expression: My Job 1 PAGE DETAILS -Very minimal parameters and options. Supervised Analysis Generates comprehensive report Edit in place results – Don’t set parameters, edit the results Accordion style result sections Meta analysis Generate PDF report of analysis Analysis goes into queue, email notification when finished Unsupervised analysis
Engage corporate partners
We need to find the best tools We received an $1 M Oracle Commitment grant to create our integrated clinical/research data warehouse We’ve partnered with IDBS to create data portals We are working with Illumina on a variety of projects We are forging relationships with Thomson-Reuters to link genomic profiling data to drug, trial, and patent information We are building partnerships with Roche, Genomatix, NEB, and others interested in entering the personal genomics space.
Enable research beyond your own
John Quackenbush, Director Mick Correll, Associate Director
The Mission The mission of the CCCB is to provide broad-based support for the analysis and interpretation of ‘omic data and in doing so to further basic, clinical and translational research. CCCB also will conduct research that opens new ways of understanding cancer.
CCCB Service Offering IT Infrastructure -Application hosting -Data management -Custom software development -Comprehensive collaboration portals
CCCB Service Offering IT Infrastructure Next-Gen Sequencing -Competitive per-lane pricing -Integrated informatics -Major focus for development in 2010
CCCB Service Offering Sequencing IT Infrastructure Analytical Consulting -Bioinformatics / statistical data analysis -Experimental design -Value-add for IT/Sequencing services
CCCB Collaborative Consulting Model 1. Initial meeting to understand project scope and objectives 2. Development of an analysis plan and time/cost estimate Sequencing IT Infrastructure Consulting 3. During project execution, data and results are exchanged through a secure, password-protected collaboration portal 4. Available as ad-hoc service, or larger scale support agreements
Communicate the mission to the community.
Genomics is here to stay
Acknowledgments The Gene Index Team Corina Antonescu Valentin Antonescu Fenglong Liu Geo Pertea Razvan Sultana John Quackenbush Array Software Hit Team Katie Franklin Eleanor Howe Sarita Nair Jerry Papenhausen John Quackenbush Dan Schlauch Raktim Sinha Joseph White H. Lee Moffitt Center/USF Timothy J. Yeatman Greg Bloom