Скачать презентацию Normalization Getting the numbers comparable DNA Microarray Bioinformatics Скачать презентацию Normalization Getting the numbers comparable DNA Microarray Bioinformatics

5856d476bd625c1626292e38d8317186.ppt

  • Количество слайдов: 24

Normalization Getting the numbers comparable DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable DNA Microarray Bioinformatics - #27612

The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Normalization Expression Index Calculation Comparable Gene Expression Data Statistical Analysis Fit to Model (time series) Advanced Data Analysis Clustering Meta analysis PCA Classification Survival analysis Promoter Analysis Regulatory Network DNA Microarray Bioinformatics - #27612

Expression intensities are not just target concentrations • • Sample contamination RNA quality Sample Expression intensities are not just target concentrations • • Sample contamination RNA quality Sample preparation Dye effect (cy 3/cy 5) Probe affinity Hybridization Unspecific signal (background) • Saturation • Spotting • Other issues related to array manufacturing • Image segmentation • Array spatial effects DNA Microarray Bioinformatics - #27612

Two kinds of variation in the signal Global variation RNA quality Sample preparation Dye Two kinds of variation in the signal Global variation RNA quality Sample preparation Dye Hybridization Photodetection Systematic Gene-specific variation Spotting (size and shape) Cross-hybridization Dye Biological variation – Effect – Noise Stochastic DNA Microarray Bioinformatics - #27612

Sources of variation Global variation: Gene-specific variation: Systematic Stochastic • Similar effect on many Sources of variation Global variation: Gene-specific variation: Systematic Stochastic • Similar effect on many • Too random to be explicitly measurements • Corrections can be estimated from data accounted for • “noise” Normalization Statistical testing DNA Microarray Bioinformatics - #27612

Calibration = Normalization = Scaling DNA Microarray Bioinformatics - #27612 Calibration = Normalization = Scaling DNA Microarray Bioinformatics - #27612

Nonlinear normalization DNA Microarray Bioinformatics - #27612 Nonlinear normalization DNA Microarray Bioinformatics - #27612

Lowess Normalization * M * * * A One of the most commonly utilized Lowess Normalization * M * * * A One of the most commonly utilized normalization techniques is the LOcally Weighted Scatterplot Smoothing (LOWESS) algorithm. DNA Microarray Bioinformatics - #27612

The Qspline method From the empirical distribution, a number of quantiles are calculated for The Qspline method From the empirical distribution, a number of quantiles are calculated for each of the channels to be normalized (one channel shown in red) and for the reference distribution (shown in black) A QQ-plot is made and a normalization curve is constructed by fitting a cubic spline function As reference one can use an artificial “median array” for a set of arrays or use a log-normal distribution, which is a good approximation. DNA Microarray Bioinformatics - #27612

Once again…qspline Accumulating quantiles When many microarrays are to be normalized to each other Once again…qspline Accumulating quantiles When many microarrays are to be normalized to each other an average array can be used as target DNA Microarray Bioinformatics - #27612

Invariant set normalization (Li and Wong) A invariant set of probes is used -Probes Invariant set normalization (Li and Wong) A invariant set of probes is used -Probes that does not change intensity rank between arrays -A piecewise linear median line is calculated -This curve is used for normalization DNA Microarray Bioinformatics - #27612

Spatial normalization Raw data After intensity normalization Spatial bias estimate After spatial normalization DNA Spatial normalization Raw data After intensity normalization Spatial bias estimate After spatial normalization DNA Microarray Bioinformatics - #27612

The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Normalization Expression Index Calculation Comparable Gene Expression Data Statistical Analysis Fit to Model (time series) Advanced Data Analysis Clustering Meta analysis PCA Classification Survival analysis Promoter Analysis Regulatory Network DNA Microarray Bioinformatics - #27612

Expression index value Some microarrays have multiple probes addressing the expression of the same Expression index value Some microarrays have multiple probes addressing the expression of the same target – Affymetrix Gene. Chips have 11 -20 probe pairs pr. we However for downstream analysis Gene often want to deal with only one value - Perfect Match (PM) pr. gene. - Mis. Match (MM) Therefore we want to collapse the PM: intensities from many probes into CGATCAATTGCACTATGTCATTTCT MM: CGATCAATTGCAGTATGTCATTTCT one value: a gene expression index value DNA Microarray Bioinformatics - #27612

Expression index calculation Simplest method? Median But more sophisticated methods exists: d. Chip, RMA Expression index calculation Simplest method? Median But more sophisticated methods exists: d. Chip, RMA and MAS 5 DNA Microarray Bioinformatics - #27612

d. Chip (Li & Wong) Model: PMij = i j + eij Outlier removal: d. Chip (Li & Wong) Model: PMij = i j + eij Outlier removal: – Identify extreme residuals – Remove – Re-fit – Iterate Distribution of errors eij assumed independent of signal strength (Li and Wong, 2001) DNA Microarray Bioinformatics - #27612

RMA Robust Multi-array Average (RMA) expression measure (Irizarry et al. , Biostatistics, 2003) For RMA Robust Multi-array Average (RMA) expression measure (Irizarry et al. , Biostatistics, 2003) For each probe set, re-write PMij = i j as: log(PMij)= log( i ) + log( j) Fit this additive model by iteratively re-weighted leastsquares or median polish DNA Microarray Bioinformatics - #27612

MAS. 5 Micro. Array Suite version 5 uses Signal = Tukey. Biweight{log(PMj - MM*j)} MAS. 5 Micro. Array Suite version 5 uses Signal = Tukey. Biweight{log(PMj - MM*j)} MM* is an adjusted MM that is never bigger than PM Tukey biweight is a robust average procedure with weights and outlier rejection DNA Microarray Bioinformatics - #27612

Methods compared on expression variance Standard deviation of gene measures from 20 replicate arrays Methods compared on expression variance Standard deviation of gene measures from 20 replicate arrays Std Dev of gene measures from 20 replicate arrays Expression level RMA: Blue and Red MAS 5: Green d. Chip: Black From Terry speed DNA Microarray Bioinformatics - #27612

Robustness MAS 5. 0 Log fold change estimate from 20 ug c. RNA MAS Robustness MAS 5. 0 Log fold change estimate from 20 ug c. RNA MAS 5. 0 Log fold change estimate from 1. 25 ug c. RNA (Irizarry et al. , Biostatistics, 2003) DNA Microarray Bioinformatics - #27612

Robustness d. Chip Log fold change estimate from 20 ug c. RNA d. Chip Robustness d. Chip Log fold change estimate from 20 ug c. RNA d. Chip Log fold change estimate from 1. 25 ug c. RNA (Irizarry et al. , Biostatistics, 2003) DNA Microarray Bioinformatics - #27612

Robustness RMA Log fold change estimate from 20 ug c. RNA RMA Log fold Robustness RMA Log fold change estimate from 20 ug c. RNA RMA Log fold change estimate from 1. 25 ug c. RNA (Irizarry et al. , Biostatistics, 2003) DNA Microarray Bioinformatics - #27612

All of this is implemented in… R In the Bio. Conductor packages ‘affy’ (Gautier All of this is implemented in… R In the Bio. Conductor packages ‘affy’ (Gautier et al. , 2003). DNA Microarray Bioinformatics - #27612

References Li and Wong, (2001). Model-based analysis of oligonucleotide arrays: Model validation, design issues References Li and Wong, (2001). Model-based analysis of oligonucleotide arrays: Model validation, design issues and standard error application. Genome Biology 2: 1– 11. Irizarry, Bolstad, Collin, Cope, Hobbs and Speed, (2003) Summaries of Affymetrix Gene. Chip probe level data. Nucleic Acids Research 31(4): e 15. ) Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA, version 5 edition, 2001. Gautier, Cope, Bolstad, and Irizarry, (2003). affy - an r package for the analysis of affymetrix genechip data at the probe level. Bioinformatics DNA Microarray Bioinformatics - #27612