Скачать презентацию Mapping Disease Genes Why A Скачать презентацию Mapping Disease Genes Why A

b90e55be1456dd0a5f55e0aade354504.ppt

  • Количество слайдов: 52

Mapping Disease Genes Mapping Disease Genes

Why • • A fundamental problem in human genetics today is locating and identifying Why • • A fundamental problem in human genetics today is locating and identifying the specific gene responsible for a given genetic disease. However, the disease is just a phenotype, and gene responsible for that phenotype might be very different from what we would expect. – For instance, Lesch-Nyhan syndrome’s most spectacular manifestation is self-mutilating behavior. The Lesch-Nyhan gene codes for hypoxanthine-guanine phosphoribosyl transferase, which helps salvage nucleotides derived from the breakdown of nucleic acids. • So, we need to reduce the number of candidate genes to a manageable level. • Using the naturally occurring recombination process to map genes remains the best way to localize the gene responsible for a genetic disease. The goal is to reduce the amount of DNA that need to be searched to a small region, a few million base pairs or so. Below that level, molecular tools need to be employed.

Markers for Mapping • What makes a good marker: – co-dominant (so homozygotes and Markers for Mapping • What makes a good marker: – co-dominant (so homozygotes and heterozygotes can be distinguished) – many alleles at each locus (so most people will be heterozygous and different from each other) – many loci well distributed throughout the genome – easy to detect, especially with automated machinery • No system is perfect

Marker Systems • Originally, genetic markers were visible phenotypes and blood groups. There simply Marker Systems • Originally, genetic markers were visible phenotypes and blood groups. There simply aren’t enough markers available, and many of them are dominant. Also, very few people display visible phenotypes that can be attributed to single genes. – before the advent of molecular markers, very few genes had been mapped, and most of them were on the X. • Protein electrophoresis. Isozymes are enzymes that have different electrophoretic mobility because they are produced by different alleles at the same gene. – They are usually co-dominant, but frequently form dimers that can confuse interpretation. – However, no more than 100 have ever been described, and many of these are not very polymorphic. – Each enzyme requires a unique set of reaction conditions, which makes automation difficult. Isozymes of alcohol dehydrogenase (ADH). The enzyme is a dimer of 2 identical subunits, and there are 3 alleles here: Slow, Medium , and Fast. The heterozygotes show 3 bands, with the middle band having 1 subunit from each allele.

More Marker Systems • Restriction Fragment length Polymorphisms (RFLPs). The original DNA-based marker system. More Marker Systems • Restriction Fragment length Polymorphisms (RFLPs). The original DNA-based marker system. – These markers are (usually) single nucleotide polymorphisms which create or destroy a restriction site (a 6 -8 bp sequence that can be cut by a restriction enzyme). Thus, they have only 2 alleles per locus. – The original detection technique, Southern blots, were expensive, time-consuming and finicky (and radioactive too). • Microsatellites (also called Simple Sequence Repeats: SSRs or Short Tandem Repeats: STRs). Short repeats of 2 -5 bp in a tandem array. During replication, DNA polymerase occasionally “stutters”: increases or decreases the number of repeats, which creates new alleles. – Lots of loci well scattered throughout the genome. Most loci have multiple alleles that are easily distinguishable. – Detected by PCR followed by electrophoresis – Electrophoresis needs to be high resolution: to easily detect length differences of 2 bp.

Single Nucleotide Polymorphisms • Single Nucleotide Polymorphisms (SNPs). Which of the 4 possible nucleotides Single Nucleotide Polymorphisms • Single Nucleotide Polymorphisms (SNPs). Which of the 4 possible nucleotides is present at an exact position in the DNA. – The current method of choice. – Each locus has a maximum of 4 alleles (with 2 being the usual case). – There are very large numbers of SNP loci, often several per gene even within exons. – Detection can be done with assays that don’t require electrophoresis and so are very fast and easy to automate. – At present there approximately 12 million human SNPs recorded in the NCBI database.

Fingerprinting Markers • Fingerprinting markers are used to distinguish the DNA of one person Fingerprinting Markers • Fingerprinting markers are used to distinguish the DNA of one person from another. Not generally useful for mapping. – Criminal investigations – Paternity tests – Body identification • Major Histocompatibility Locus (MHC) also called Human Leukocyte Antigen (HLA). The main gene locus involved in the immune system’s ability to distinguish self from non-self. – Lots of haplotypes, but all at one location of chromosome 6. • Minisatellites also called Variable Number Tandem Repeats (VNTRs). – Longer than microsatellites: 10 -60 bp, thus easier to detect with electrophoresis. – Many loci (about 1000 known), but mostly clustered near telomeres. – No general method of finding them.

 • CODIS (Combined DNA Index System) is the marker system used by the • CODIS (Combined DNA Index System) is the marker system used by the FBI and foreign police agencies for DNA-based identification. – Based on Short Tandem Repeats (STRs) • The FBI currently uses a set of 13 markers, located on many different chromosomes, plus a marker for distinguishing the X and y chromosomes. • • The European Union uses a somewhat different set of markers, and there are proposals and add and drop several of the current CODIS markers. The FBI’s plan is to expand from 13 to 18 markers soon. All are 4 or 5 bp repeats, which PCR-amplify better than 2 bp repeats. And, easier to tell apart. – The markers aren’t associated with any disease genes or other visible phenotypes. • Detected with commercially available kits, with PCR amplification products run on a DNA sequencing machine, which gives precise band sizes (which are easily compared between labs) CODIS markers are multiplexed: several different loci are run on the same electrophoresis gel lane. PCR primers are chosen to give different, non-overlapping sizes to the amplified bands.

STR Alleles • Alleles are named by the number of complete repeats they have. STR Alleles • Alleles are named by the number of complete repeats they have. Some variant alleles have a partial repeat: the number of bases in the partial repeat is used after the decimal point. For example, the TH 01 locus has an allele called 9. 3 that is common in Caucasians. It has 9 complete repeats plus another partial repeat that has only 3 bases in it.

Some CODIS Markers for 10 Random Individuals • D 1 S 80 • D Some CODIS Markers for 10 Random Individuals • D 1 S 80 • D 21 S 11

Probability of Identity • The fundamental question with fingerprinting: what is the chance that Probability of Identity • The fundamental question with fingerprinting: what is the chance that two unrelated individuals will have the same genotype? (Probability of identity, Pi) – More alleles at any given locus improves the chances of not having unrelated people matching. – Since loci are genetically independent, Pi for several loci together is just the product of the individual Pi’s. – For perspective: there about 7 x 109 people living today, which means there about 25 x 1018 possible pairs of individuals. To be sure that you don’t misidentify someone, you need a Pi that is much less than 2. 5 x 10 -19. • Study done by National Institute of Standards and Technology (NIST) in 2012. – Examined 1036 unrelated individuals from the US, divided into these groups: Caucasian, African-American, Hispanic, and Asian. Ethnicity was self-identified, a procedure that obviously has some issues.

Probability of Identity for Individual Loci • This table shows the probability that two Probability of Identity for Individual Loci • This table shows the probability that two people of the same ethnicity share the same genotype at specific loci. • Range is about 0. 5% to 20%, depending on ethnicity and locus.

Pi with different marker sets and ethnic groups Pi with different marker sets and ethnic groups

Mutations in STR Loci • • STR loci have a high mutation rate relative Mutations in STR Loci • • STR loci have a high mutation rate relative to base change mutations (SNPs). This phenomenon produces multiple alleles, which is very useful for easy identification of individuals. However, it also complicates paternity tests and other relationship studies. Situations where both parents and their child have been tested, and it is clear that they are the real parents, and the child contains an allele not found in either parent. From the American Association of Blood Banks. – For 19 alleles, examined in roughly 500, 000 cases, mutation rates are between 0. 1% and 0. 3% most cases.

CODIS Issues • NIST works to understand unusual variants by sequencing them when they CODIS Issues • NIST works to understand unusual variants by sequencing them when they are reported. • Variant alleles. The more individuals are tested, the more new, rare variants appear. – Different numbers of repeat units as well as partial repeats – Sometimes large changes in repeat number moves a band out of the expected range on the gel. Images from http: //www. cstl. nist. gov/biotech/strbase/pub_pres/Kline_Duck. Key 2005. pdf

More Problems • Null alleles and drop-outs. No amplification occurs with a specific locus: More Problems • Null alleles and drop-outs. No amplification occurs with a specific locus: – Appears as if the subject were a homozygote. – can be caused by a mutation in one of the primer sites, or the deletion of the entire locus. – This event is detected when two different sets of primers are used to amplify the same locus: one set produces a band the other doesn’t. • Tri-allelic cases. Sometimes due to duplications of the locus (including trisomy 21), sometimes due to mosaic tissue (or even mixed samples). Images from http: //www. cstl. nist. gov/biotech/strbase/pub_pres/Kline_Duck. Key 2005. pdf

Ethnicity Prediction • • Some loci have very different frequencies in different ethnic groups Ethnicity Prediction • • Some loci have very different frequencies in different ethnic groups However, self-reported ethnicity isn’t very reliable. And, ethnicity isn’t a well-defined concept anyway. Mutation rates in STRs: identity by state (2 people have the same allele) vs. identity by descent (2 people have inherited an allele from the same common ancestor). • SNPs and Alu element insertions are more stable than STRs and probably work better for ethnicity prediction. • A related issue: linkage to disease genes. A DNA profile may give information about susceptibility to diseases. – Note that many disease genes were mapped using STR markers

Genetic Diseases and Pedigree Analysis Genetic Diseases and Pedigree Analysis

Genetic Diseases and Genes • A genetic disease is a condition that “runs in Genetic Diseases and Genes • A genetic disease is a condition that “runs in families”: if one person in a family has the condition, others are likely to get it as well. – The recurrence risk is the probability that a newly born member of a family will have a genetic disease given that another family member also has the disease. • • A major goal in modern human genetics is to locate the gene responsible for a given genetic disease. Many genetic diseases map to specific loci (chromosomal locations), and when the DNA of that region is examined, a gene (transcription unit) is found there, with mutated versions associated with having the disease. – There are several thousand known human genetic diseases with known genes. They are documented at OMIM, the Online Mendelian Inheritance in Man web site: http: //www. ncbi. nlm. nih. gov/omim – Occasionally diseases behave in a Mendelian way but there is no molecular gene present in the mapped location: Fascio-scapulohumeral muscular dystrophy (OMIM 158900) maps to 4 q 35. It is a set of small deletions, but there is no protein-coding or RNA gene nearby. Some genetic disease are “complex”: they are caused by a combination of many genes (i. e. multifactorial inheritance) and environmental factors. Complex traits have proven more difficult to understand than single gene traits

Empiric Risk • The inheritance of a complex trait does not follow Mendelian rules. Empiric Risk • The inheritance of a complex trait does not follow Mendelian rules. In these cases, recurrence risks can’t be based on theory. Instead, they are based on empiric risks. • The empiric risk of a genetic disease is estimated using population surveys. It is affected by how close a relationship you have to the affected person. It is also affected by ethnicity in many cases. – First degree relations: parents, siblings, children. Share 1/2 their genes – Second degree: half-siblings, grandparents, aunts and uncles. Share 1/4 of their genes – Third degree: first cousins: share 1/8 of their genes

Pedigree Analysis and Complications • The large majority of genetic diseases are inherited as Pedigree Analysis and Complications • The large majority of genetic diseases are inherited as autosomal dominants or recessives. • Sex-linked traits are also well-known, because they get expressed in males much more frequently than in females • There are other known modes of inheritance that are rarer. • Many issues complicate simple pedigree analysis.

Pedigree Symbols Pedigree Symbols

Basic Pedigree Patterns • The main ones: 1. Autosomal dominant • • 2. Affects Basic Pedigree Patterns • The main ones: 1. Autosomal dominant • • 2. Affects either sex, transmitted by either sex, Child of affected x unaffected has a 50% risk Autosomal recessive • • • 3. Usually neither parent is affected Increased risk with consanguineous parents (i. e. related) Recurrence risk = 25% X-linked dominant • • • 4. Females affected more than males (they have 2 X’s), but females often have milder or more varied symptoms Children have affected female have 50% recurrence risk Daughters of affected males have 100% risk, sons have 0% risk X-linked recessive • • • Mainly males affected No father-to-son (male-to-male) transmission Usually neither parent is affected: mother is usually an unaffected heterozygote

Autosomal Dominant Pedigree • • • Every affected person has at least one affected Autosomal Dominant Pedigree • • • Every affected person has at least one affected parent D = dominant affected allele; d = recessive normal allele. Dd x dd = 1/2 chance of affected offspring Dd x Dd = 3/4 chance of affected offspring The ? person has a 1/2 chance of being affected regardless of gender.

Autosomal Recessive Pedigree • Commonly seen in consanguineous matings (between close relatives) • Often Autosomal Recessive Pedigree • Commonly seen in consanguineous matings (between close relatives) • Often appears when neither parent is affected • R = dominant normal allele, r = recesive affected allele • Rr x Rr. Neither parent is affected, but children have 1/4 chance. • The ? person has a 1/4 chance of being affected

X-linked Recessive • Sex-linked = X-linked. • Males never pass it on to their X-linked Recessive • Sex-linked = X-linked. • Males never pass it on to their sons • Usually an unaffected heterozygote mother has affected sons. • The ? person has a 1/2 chance of being affected if male, and a 0 chance if female

Hemophilia in European Royal Families • An X-linked recessive, that apparently was a new Hemophilia in European Royal Families • An X-linked recessive, that apparently was a new mutation in Queen Victoria.

X-linked Dominant • If the father is affected, none of the sons are affected, X-linked Dominant • If the father is affected, none of the sons are affected, but all the daughters are affected. • The ? person has a 0 chance of being affected if male, and a 100% chance if female.

Complications: Autosomal Dominants • • For Mendel, a dominant allele gave the same phenotype Complications: Autosomal Dominants • • For Mendel, a dominant allele gave the same phenotype for both homozygotes and heterozygotes. Thus, PP and Pp both give purple flowers. Most human dominant traits are actually partial dominants: the heterozygote is affected less than the homozygote. – In most cases, homozygotes are affected far more than heterozygotes. Achondroplasia (OMIM 100800), for example. Often die before birth. – Nevertheless, these conditions are refered to as “dominant”, because heterozygotes are affected. • • Most matings are affected heterozygote x unaffected (Dd x dd, where D is the dominant affected allele). But, two affected heterozygotes often mate (Dd x Dd), giving a 3/4 chance of affected offspring, or 2/3 if homozygotes die before birth.

Complications: X Chromosome Inactivation • Only 1 X is active in human cells. At Complications: X Chromosome Inactivation • Only 1 X is active in human cells. At about the 1000 cell stage of the embryo, each cell in a female (XX) inactivates one randomly chosen X, converting it to a Barr body (facultative heterochromatin). – This means that females are mosaics of cells expressing different X chromosomes. • For traits affecting the blood, all the cells are mixed, and the effect is averaged out. For example, in hemophilia (failure of the blood to clot) , the phenotype of a heterozygote is normal blood clotting even though half the cells lack a critical enzyme • For cells in fixed locations, patches can appear. Best seen in calico cats, where black and orange are 2 different alleles of an Xchromosome gene. – Human condition: hypohydrotic ectodermal dysplasia (OMIM 305100; abnormalities of sweat glands, hair, teeth, nails). Some patches of skin have sweat glands and others don’t. • X-linked lethal alleles will kill all males before birth, as well as female cells having the lethal allele active. However, a female heterozygote will live because those cells with the normal X will function.

Ascertainment Bias • • Ascertainment bias is a big problem: families with no affected Ascertainment Bias • • Ascertainment bias is a big problem: families with no affected children are not diagnosed as carriers of a genetic condition. For example, the offspring of two recessive heterozygotes (Rr x Rr) should have 1/4 affected offspring. However, you only see families with at least 1 affected child, so you miss all those families that by chance didn’t have any affected children. – 16 families, each with 2 children. Each child has a 3/4 chance of being normal, so with 2 children the chance of having no affected children is 3/4 x 3/4 = 9/16. You see only 7 of the families. Of them , 6 have 1 affected child and the last family has 2 affected. So, you have seen 14 children with 8 affected, or 8/14 = 4/7 chance of affected. Seems like about the ratio expected for an autosomal dominant. • Dealt with by mathematical corrections

Heterogeneity • Locus heterogeneity: the same phenotype (disease symptoms, as determined by a physician) Heterogeneity • Locus heterogeneity: the same phenotype (disease symptoms, as determined by a physician) might be caused by more than one gene. – Example: recessive congenital deafness. Lots of ways to be born deaf. – Complementation test: if two people are deaf because they are both homozygous for mutations in the same gene, all of their offspring will be deaf. Both copies of the gene are mutant in the offspring. – However, if the two people are deaf because they are homozygous for mutations in different genes, their children will have normal hearing. The children are heterozygotes for both genes. • • Allele heterogeneity: different mutant alleles of the same gene can have effects that are different enough that they appear to be different diseases. This is often determined only after the genes have been mapped and cloned. Becker muscular dystrophy (OMIM 300376) is caused by partially inactive alleles of the same gene that causes Duschenne muscular dystrophy (OMIM 310200). DMD is more severe, caused by mutations that completely inactivate the gene. The bottom line: there isn’t a one-to-one correspondence between disease phenotypes and transcription units.

Penetrance and Expressivity • Some dominant genes don’t always get expressed in heterozygotes: we Penetrance and Expressivity • Some dominant genes don’t always get expressed in heterozygotes: we say they have incomplete penetrance. – Other genes or environmental factors can affect gene expression – In a dominant pedigree, every affected person has at least one affected parent. If the parent does not express the trait, affected offspring will apparently have no affected parents, which is the mark of a recessive trait, • Variations in the degree of expression are also seen. • Also, some diseases don’t manifest until later in life. – A trait that is present at birth is called congenital. – Huntington’s disease (OMIM 143100). Neural degeneration that generally isn’t expressed until age 40 or so.

Population Genetics Issues • A recessive condition that is common in the population can Population Genetics Issues • A recessive condition that is common in the population can look like a dominant: every affected person can have an affected parent. This example is the ABO blood type: the O allele is recessive to both A and B, yet is appears in many individuals because many people with A blood are AO heterozygotes. • Inbreeding (mating with close relatives) causes similar problems, by greatly increasing the probability of getting a homozygous offspring.

New Mutations • The average rate of new mutations is about 1 new mutation New Mutations • The average rate of new mutations is about 1 new mutation in a given gene per 105 -106 births. This implies that many new mutations appear in the population every year. • Given that most cell divisions are somatic and don’t involve the germ line, most new mutations are not passed on to the offspring. However, mutations in early cell divisions can create mosaic individuals. Large portions of such a person’s body can be mutant, enough to give them a mutant phenotype. – Thus, a person with a mutant phenotype would not pass the mutation on to their offspring. • It is also possible to have a mosaic germ line, so some gametes are derived from mutant cells and others from normal cells. This skews the ratio of offspring that appear.

Gene Mapping Gene Mapping

Recombination Basics • in prophase of meiosis I, homologous chromosomes synapse (pair up) and Recombination Basics • in prophase of meiosis I, homologous chromosomes synapse (pair up) and crossing over occurs. The chromosomes break at approximately the same location and are rejoined to each other. This is called crossing over or recombination. – the recombinase enzyme complex catalyzes this reaction. • A crossing over event has 2 possible outcomes: – Crossover: genetic markers outside the site of crossing over switch chromosomes. This is what we usually think of. – Gene conversion. Markers outside the site of crossing over stay on the same homologues, but a short region of DNA at the site is made homozygous: one allele is replaced by another allele.

More Basics • • Recombination appears in the offspring’s phenotype as exchange of marker More Basics • • Recombination appears in the offspring’s phenotype as exchange of marker genes on either side of the crossover. Thus, to detect crossing over we examine two marker genes. The parent we are observing must be heterozygous for both genes. – if both dominant alleles are on one homologue and both recessives are on the other, the alleles are in coupling phase. – if one dominant and one recessive are on each homologue, the alleles are in repulsion phase. – coupling and repulsion can also use to describe relationships between codominant markers. • The marker alleles in an offspring are either in the Parental configuration (same as they were in the parents) or in the Recombinant configuration (marker exchange has occurred).

Map Distances • • • Crossing over occurs at random along chromosome--means that the Map Distances • • • Crossing over occurs at random along chromosome--means that the closer 2 genes are, the less frequently recombination occurs. Basis for mapping. Recombination Fraction (RF or theta or θ) is the percentage of recombinant gametes produced. – one complicating factor when looking at offspring: meiosis occurs in both parents. RF is never more than 50%--due to only 2 of the 4 chromatids recombining 1% recombination = 1 map unit = 1 centi. Morgan (c. M), but only for short distances. for longer distances, double crossovers decrease observed recombination frequency. – two crossovers between marker genes leaves the markers in the parental configuration: no way to tell there were any crossovers. Double crossovers should occur at frequency predictable from distances between genes, but there is also interference, which affects the chance for CO in any interval. – interference: one crossover inhibits the occurrence of another nearby.

Mapping Function • • We want a gene map to be calibrated in map Mapping Function • • We want a gene map to be calibrated in map units that accurately reflect the frequency of crossovers between genes. The equation used to convert the observed recombination fraction into map units is called the mapping function. For a simple model of randomly placed crossovers and no interference, Haldane’s function works well: – • w = - ½ ln(1 -2θ) , where w is map distance and θ is the observed proportion of recombinants this expression produces the curve on the previous slide Interference complicates things, and a variety of functions can be used. Kosambi’s function is a common one: w = ¼ ln[(1+2θ) / (1 -2θ)] • • • Interference has been estimated for human genes, and it seems to be a very small effect. For a 10 c. M interval, only 0. 01% of the potential crossovers is inhibited by interference. Also, from a practical point of view, the main value of recombination mapping is finding a small region of DNA to search with molecular tools. Worrying about interference seems (to me) to be a lot of work for very little benefit. Further, it is clear that a crossover is not equally probable at every nucleotide: at the level of the DNA sequence, recombination primarily occurs at hot spots with very little in between:

Chiasmata • • • Crossing over is visible in the microscope as chiasmata (which Chiasmata • • • Crossing over is visible in the microscope as chiasmata (which is the plural form of chiasma). It is possible to count chiasmata. Each one counts as 50 map units (one crossover between 2 of the 4 DNA molecules at prophase of meiosis 1). In male meiosis (testicular biopsy), one study showed an average of 50. 6 chiasmata per cell. Multiplying by 50, this gives 2530 c. M as the length of the genetic map in males. In female meiosis (between 16 and 24 weeks of fetal life), an average of 70. 3 chiasmata per cell were seen. This gives a female map of 3515 c. M. Recombination mapping has given estimates of 2590 c. M for males and 4281 for females. So, females have more crossovers and a larger map than males. The total map length in humans is about 3000 c. M.

LOD Score Mapping • The general problems with mapping genes in humans: small families, LOD Score Mapping • The general problems with mapping genes in humans: small families, uncontrolled matings, uncertain paternity. – Thus you can’t set up a test cross, where one parent is a heterozygote and the other is homozygous for other alleles, and count parental and recombinant offspring. • • • Given a pedigree family, the LOD score method involves determining the probability (the likelihood) of that family at different values of θ, the recombinant fraction. Then, the method allows you to add probabilities across different families, even if some information about them is missing or ambiguous. Also, each family can start with different parental arrangements of markers, and can have different numbers and types of children. The LOD score method is an example of a maximum likelihood procedure. The point of the maximum likelihood procedure is to estimate the value of a parameter that can’t be directly observed, in this case the recombination fraction. The likelihood (probability) of an observed set of data (the phenotypes seen in a family, in this case) is calculated as a function of that parameter. The parameter value that gives the maximum likelihood is taken as the best estimate of the parameter.

LOD Procedure 1. Start with a model of inheritance for the gene of interest: LOD Procedure 1. Start with a model of inheritance for the gene of interest: an equation that gives the expected frequency of various types of offspring given an arbitrary value of θ. 2. Using a form of the binomial expansion, determine the likelihood of your data (family) at a number of different values of θ: L(θ) 3. Determine the odds ratio: the likelihood at each value of θ divided by the likelihood at θ = 0. 5 (unlinked). – The LOD score is the base 10 logarithm of the odds ratio. This is the log of the odds, the LOD score for each value of θ. 4. Add LOD scores for all θ values between families. This is the beauty of logarithms: they can be added. Thus, data from many small families can be added to achieve a statistically significant value for θ.

Statistical significance • A LOD score of 3. 0 for some value of θ Statistical significance • A LOD score of 3. 0 for some value of θ is considered the threshold for accepting that the two genes are linked, with a 5% chance of a false positive (p = 0. 05). • A LOD score of -2 is considered evidence for the genes not being linked. • Generally more than one value of θ will go over the 3. 0 level. The θ with the highest LOD score is the point estimate of the true map distance. All other adjacent θ values with a LOD score of at least 1 less than the maximum value are considered the “support interval”, the region in which the true linkage value is found.

Developing a Model • We will use an example of two heterozygotes mating. We Developing a Model • We will use an example of two heterozygotes mating. We want to estimate the recombination distance between genes A and B, which both show complete dominance. • Both parents produce recombinant and parental gametes, which we can combine using a Punnett square. • θ is the proportion of recombinant gametes. Since there are two recombinant gametes, each has a proportion of 1/2 θ. • 1 - θ is the proportion of parental gametes. Each of the two parental gametes has a proportion of 1/2(1 - θ). Gametes: Parental: A B a b Recombinant: A b a B

Punnett Squares with Frequency Equations • The next step is to create equations showing Punnett Squares with Frequency Equations • The next step is to create equations showing the frequency of each phenotype of offspring. This is most easily done using a Punnett square. • For each cell, the equations for the gamete frequencies are multiplied together. • Then all cells with the same phenotype are added together. • Final result: 4 equations showing the expected frequency of each phenotype as a function of Ɵ (the proportion of recombinant gametes, the map distance). • Note that the sum of the 4 equations is 1. 0. Punnett square with equations for the frequency of each type of offspring. The equations are generated by multiplying the gamete frequencies together.

Expected Frequencies at Different Values of Ɵ • • Once the equations for phenotype Expected Frequencies at Different Values of Ɵ • • Once the equations for phenotype frequencies as a function of recombination frequency have been generated, it is easy to substitute in different values. This generates a table of expected frequencies of the phenotypes. Range: RF = 0. 0 is completely linked, to RF = 0. 5, which is unlinked.

Likelihood of a Family • • • Likelihood functions determine the probability of the Likelihood of a Family • • • Likelihood functions determine the probability of the observed data in terms of the parameter being estimated. For lod scores, a version of the binomial expansion is used. The binomial describes the probability of families with two different phenotypes – – – • p = probability of a normal child q = probability of a mutant child n = total number of children each term describes a different family composition the exponents on p and q represent the number of children with each phenotype. Consider a family of 3 children whose parents are heterozygous for a recessive genetic disease. – p = chance of normal child = 3/4 – q = chance of mutant child = 1/4 • • Here, p 3 is a family of 3 normal children, 3 p 2 q is 2 normal plus 1 affected, 3 pq 2 is 1 normal plus 2 affected, and q 3 is 3 affected. Chance of 2 normal + 1 affected is described by the term 3 p 2 q. Thus, 3 * (3/4)2 * 1/4 = 27/64.

Multinomial Distribution • The multinomial distribution extends the binomial to more than two phenotypes. Multinomial Distribution • The multinomial distribution extends the binomial to more than two phenotypes. It is very simple: just add more components to each term. – For example, for 4 phenotypes, C p 2 q 1 r 3 s 1 (where C is some coefficient) describes the probability of a family of 7 children, where 2 of them have the “p” phenotype, 1 has the “q” phenotype, 3 have the “r” phenotype, and 1 has the “s” phenotype. • The coefficients in front of each term represent the number of possible families of the given composition. For the binomial we can calculate the coefficients using Pascal’s triangle (or a useful formula). • However, for LOD score mapping we don’t need to bother with the coefficients because they get divided out.

Likelihood Ratio • Using a spreadsheet, we first calculate the expected frequency of each Likelihood Ratio • Using a spreadsheet, we first calculate the expected frequency of each type of offspring at different values of θ. • Then we use the data from actual families to calculate the likelihood of each family at each value of θ. • Then we take the likelihood ratio: divide the likelihood at each θ by the likelihood at θ = 0. 50 (i. e. unlinked). • Then we take the logarithm (base 10) of each likelihood.

Example • Consider a family of 7 children: – – • • • A_ Example • Consider a family of 7 children: – – • • • A_ B_ : 4 children A_ bb : 2 children aa B_ : 0 children aa bb : 1 child The expression we will use to determine likelihood L(Ɵ) is p 4 q 2 r 0 s 1, where p, q, r, and s are the probabilities of the 4 types of offspring (A_ B_, A_ bb, aa B_, and aa bb) at different values of Ɵ. The likelihood ratio L(Ɵ) / L(0. 5) is obtained by dividing each L(Ɵ) value by the unlinked likelihood L(0. 5), which is 0. 00021997 for this family. The LOD score is the base 10 logarithm of the likelihood ratio.

Maximum LOD Score • The LOD score data for this family shows that a Maximum LOD Score • The LOD score data for this family shows that a recombination frequency of 0. 3 is the most likely. • However, the maximum LOD score is only 0. 133, far less than the value of 3. 0 need to prove linkage • More data from other families is needed. LOD scores for each value of Ɵ can be added together. – It typically requires about 20 families to prove linkage.