Скачать презентацию Alexei Fedorov Ph D Associate Professor Head of Скачать презентацию Alexei Fedorov Ph D Associate Professor Head of

61dc9da8a98459e517f4e9889fc0f417.ppt

  • Количество слайдов: 35

Alexei Fedorov, Ph. D. Associate Professor Head of Bioinformatics Lab Department of Medicine Vice Alexei Fedorov, Ph. D. Associate Professor Head of Bioinformatics Lab Department of Medicine Vice Director Program in Bioinformatics and Genomics/Proteomics Tel: (419)‑ 383‑ 5270 Email: alexei. [email protected] edu http: //bpg. utoledo. edu/~afedorov/lab/ 1

May 2011 May 2011

 Bioinformatics Lab in 2013 -2014 Ph. D students Shuhao Qiu Masters students Ahmed Bioinformatics Lab in 2013 -2014 Ph. D students Shuhao Qiu Masters students Ahmed Al-Khudair Current grants NSF Career Development 2007 -2012 “Investigation of intron cellular roles” 4

MAJOR GOAL: Bioinformatics Investigation of the Human Genome 5 MAJOR GOAL: Bioinformatics Investigation of the Human Genome 5

Education in Bioinformatics (TWO TYPES OF STUDENTS) • Computer/math background gain experience in Biology Education in Bioinformatics (TWO TYPES OF STUDENTS) • Computer/math background gain experience in Biology (Sam, Andy) • Biological background gain experience in programming (Dave, Maryam) • Example of computational projects: Binary-absrtacted Markov models and their application to sequence classification http: //etd. ohiolink. edu/view. cgi? acc_num=mco 1271271172 http: //bpg. utoledo. edu/~sshepard/defense/ video

Genomic MRI http: //bpg. utoledo. edu/gmri/ http: //www. jove. com/Details. php? ID=2663 Genomic MRI http: //bpg. utoledo. edu/gmri/ http: //www. jove. com/Details. php? ID=2663

Job perspectives (example: Ashwin Prakash) Ph. D – November 2011, HSC UT Ph. D Job perspectives (example: Ashwin Prakash) Ph. D – November 2011, HSC UT Ph. D research fellow -- from January 2011 Johns Hopkins School of Medicine Declined offers: • Cold Spring Harbor Laboratory • Baylor College of Medicine

The PI’s students received the following awards: • Jason Bechtel, Outstanding MSBS student in The PI’s students received the following awards: • Jason Bechtel, Outstanding MSBS student in 2008 at HSC UT. • Theodor Rais, Second/Third Poster award by Ohio Bioinformatics Consortium, 2009. • Samuel Shepard, Outstanding Ph. D student in 2010 at HSC UT. • Lorraine Walters, Undergraduate Research Recognition Award, UT May 2012. • Arnab Saha-Mandal, 1) Outstanding MSBS student in 2013 at HSC UT; and 2) Canadian Institute of Health Research fellowship support ($20, 000). • Jasmine Serpen, 1) Ohio Governor's Thomas Edison Award for Excellence in Biotechnology & Biomedical Technologies-1 st place; and 2) OSERA Biomedical Research/Bioengineering Award-1 st place (for high school students).

Program in Bioinformatics and Genomics/Proteomics (BPG) • http: //hsc. utoledo. edu/depts/bioinfo/ • BPG offers Program in Bioinformatics and Genomics/Proteomics (BPG) • http: //hsc. utoledo. edu/depts/bioinfo/ • BPG offers a Certificate in association with the degrees of Doctor of Philosophy (Ph. D. ) or Doctor of Medicine (M. D. ). BPG also offers a Master of Science in Biomedical Sciences (MSBS). 10

Two courses in Spring semester: • Application of Bioinformatics, Proteomics, and Genomics (BIPG 640) Two courses in Spring semester: • Application of Bioinformatics, Proteomics, and Genomics (BIPG 640) or “Advanced Bioinformatics” (should be taken after “Fundamental Bioinformatics” of Dr. Trumbly) • Introduction to Bioinformatic Computation (BIPG 610) The main goal of this course is to provide basic programming skills to biological and medical students who may lack a background in computer sciences. Programming will be specifically taught using important biological examples, focusing in particular on the PERL language. No programming skills are required! 11

In the “Introduction to Bioinformatic Computation” course, rather than doing “cookbook” lab exercises, students In the “Introduction to Bioinformatic Computation” course, rather than doing “cookbook” lab exercises, students participate in real-world, challenging problems whose resolution advances the field of genome biology. In addition to learning programming and other bioinformatic skills the students of this course acquire knowledge in how to present the final product of bioinformatic research and how to write a scientific paper on the subject. • In 2005 the class developed a program to identify novel genes for non-coding RNAs in humans and other mammals. This work resulted in publication of an 1, coauthored by the group of students who article in Nucleic Acids Research were actively working on this project. • In 2006 course students created a novel public database (ASMD) and also a novel computational resource “Splicing Potential”. Ten students were co 2, 3. authors in two manuscripts • In 2007 the class participated in the “Genomic MRI” project. Seven of these students are co-authors in BMC Genomics, 20084 • 2008 class continued “Genomic MRI” project. They performed whole genome comparisons for human, chimpanzee, and macaque and also analyzed distribution of 4 million SNPs inside and outside MRI regions. The results are in preparation for publication in Genome Research with 6 students among the authors. 12

Publications with IBC students 54. Prakash A. , Shepard S. , Mileyeva-Biebesheimer O. , Publications with IBC students 54. Prakash A. , Shepard S. , Mileyeva-Biebesheimer O. , He J. , Hart B. , Chen M. , Amarachiniha S. , Bechtel J. , Fedorov A. “Molecular forces shaping human genomic sequence at midrange scales”, BMC Genomics 2009, 10: 513. 53. Bechtel J. M. , Wittenschlaeger T. , Dwyer T. , Song J. , Arunachalam S. , Ramakrishnan S. K. , Shepard S. , Fedorov A. Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures. BMC Genomics 2008, 9: 284. 52. Bechtel J. M. , Rajesh P. , Ilikchyan I. , Deng Y. , Mishra P. K. , Wang G. , Wu X. , Afonin K. , Grose W. , Wang Y. , Khuder S. , and Fedorov A. Calculation of Splicing Potential from the Alternative Splicing Mutation Database Research Notes 2008, 1: 4. 51. Bechtel J. M. , Rajesh P. , Ilikchyan I. , Deng Y. , Mishra P. K. , Wang G. , Wu X. , Afonin K. , Grose W. , Wang Y. , Khuder S. , and Fedorov A. The Alternative Splicing Mutation Database: a hub for investigations of alternative splicing using mutational evidence. Research Notes 2008, 1: 3. 44. Fedorov A, Stombaugh J. , Harr M. W. , Yu S. , Nasalean L. , Shepelev V. Computer identification of sno. RNA genes using a Mammalian Orthologous Intron Database. Nucl. Acids Res. 2005. 33, 4578 -4583.

http: //www. utoledo. edu/centers/brim/index. html http: //www. utoledo. edu/centers/brim/index. html

COURSE: Bioinformatics of Biomarkers and Individualize Medicine, Spring 2012 • Course time line: 14 COURSE: Bioinformatics of Biomarkers and Individualize Medicine, Spring 2012 • Course time line: 14 Weeks • No prerequisites, recommended: Introduction of bioinformatics and molecular biology • Reserve materials: None • Unit 1 Biomarker discovery and validation • Unit 2 Individualized Medicine

Investigation of the human genome BASE COUNT 846302 a 578512 c 575805 g 843114 Investigation of the human genome BASE COUNT 846302 a 578512 c 575805 g 843114 t 1703 others ORIGIN 1 gaattcaaaa aagaca atgacttgta gctgaagcta tgatcaggaa 61 ggacggcatt tgagaaaatc aggacagtgg tgtacttatc aaataagaag 121 aagattgttg aaaaagcaga cacagcactg agtagcagca tggagcagaa 181 aacaagtagt gcagtgtgcc tgaacatagg atgggaaatt aggaaagata 241 gactgtggga agccttacat tccaggctta gtggaataag taaatattta 301 gttcttttct ctctgctttc tatttttcac gacctgaact cacctcccag 361 tttccaccta gcactaaaca gtaactagtt cagactatat atttaaaaaa 421 aaaaa gcagaacagc tcagatcatc cagtgaagtg gtgctactat 481 acggggagat gaaagccaga taagatggag aagtaggaaa tttacgaaac 541 aaaatttattcatcaa tatttacata aatgtttatt aattctaagt 601 gcacccattt attactttca aaaattgaca atatacaagt taataaaatc 661 cctcttctaa taaaattatc tcactcaaat tcatataact aaaaatacat 721 ttatttttaa aatataggcc acttctactc tattcatttt tgcacttaac 781 tttcaaaaat gtatgaaaaa tttcagttta gtccccacca aatctcaatt 841 ataaagagta aataaattaa agagctgtca gaattaaaac actactacag 901 ctttatggca tagatgaagg caggaaatac tggctgaaaa ttttgtttat 961 ttgatgatta ccatcagaga tctgatatct cagggaagaa aagcctttca 1021 aaaaaattct gccaggcgcg gtggctcacg cctgtaatcc cagcactttg 1081 gtgggcagat cacctgaggt cagaagttcg agaccagcct gaccaacatg 1141 gtctctacta aaaatacaaa atcagccggg cgtggtggcg catgcctgta 1201 cttgggaggc tgaggcagga gaatcacttg aacccaggag gcagaggttg 1261 agatcacacc attgcactcc agcctgggca acaagggcga aactctgtct 1321 aaaacttctg gggaaatggt ggcctt gtaacatcta tgtgtcttag 1381 tatgacaccc ttgggcagtc atttatagag tccttccctg accagggaat aagatggggt atctgggcag aagcataagg aatggaggct aatctcatga tgaggagatg aaaaa tatactatta attttaaaag actatagtag atattagttt ttaataaatt attctcttgc tagaccccgg gtctccttca gtcaaagatt tataccactt ggaggctgag gagaaaccct atcccagcta cggtgagccg caaaaa agggccatgg catcctgcca 16

. . . after the first 50 pages. . 141601 141661 141721 141781 141841 . . . after the first 50 pages. . 141601 141661 141721 141781 141841 141901 141961 142021 142081 142141 142201 142261 142321 142381 142441 142501 142561 142621 142681 142741 142801 142861 142921 142981 143041 143101 143161 143221 143281 143341 cagcaccaaa tgtccatgca cccacactat aaacttgaaa acagataacc cttaagtact gcatttatta aagaatgcta acctgaggaa acttaaaaac agagcagcat gcaattaggc ccacacgtgt ttagccaaaa gtatatcaaa agacctcaaa tacgtaatga cacttacatt cacccaggct caagcaattc tccagctaat tcgaactcct gcatcagccg ctgtctctacttggga ccgagatcac aaaaa tgttgatgct agttaaaatg agttgagaaa tcctctcatt atctgttgaa atatcaaaat atattgagat aacagaggaa tcaaaaaagt caaataattc agatcacatt aaaagctaac ctatcgaaat ttttccccat aatcttgtat gaatcctaaa ggaaaacgac atgatgaaat aatgcccaaa aacagaatac cagatttttt ggagggcagt tcctgcctca ttttgtattt ggcctcaagt ggtgcggtgg taaaatacaa ggctgaggga accactgtac aaaaaagaaa agtctattgt tatcaaaatgtaagca gcctttttaa aaatctggct aaacccaagt gaatattagt gtcagaaaac cattacaata agaaaaagga ttttaaaaag ctcacaagta aacgaagtgt tgtggaggga caaaaatctt acaattaaaa ctaaatgacg attttgcagc atatattaat agttgatcct tctttttgct ggcaccattc gcctcccaag ttagtagaga aatccacctg cttatgcctg aaaattagct tgagaattgc tccagcctgg aagaaaaaga gtaatttacc tatacacaaa aacatgaaga aaaatgttgt atttgcaaac gtataaaaga tagagctttg agtaatcatt cttaaaaacc tttatatccc tagctaaagg ttcaaccaaa ttggaaaatg gtgtgtaaat caaagtgttc gtatgaacat aatgatgtgc tttgaaaaggat tgaacaacgc ttttt tggctcacta tagctggaat cggagtttca cctcagcctc caatcccatc gagtgtggtg ttgaacctgg gcaacagagc aaaaggtatg accataaaat cacttagaga tgcagtatta ccaatttaac aaagaaaaaa gaaaatttta agtaggaaag tccttaatga ttacaacaat taataactaa ataatataaaataacc acaagattca tggtgtggtc ttactctttg atttttatgc aactgcatgg taattttgaa acaaaacttt tggtttgaac gagacgaagt caacctgcgt tacaggcgcc ccatgttggc ccaaagtgct ctggctaaca gcacatgcct gaggcagagg aagactccat ttatgaatgc atacacaggt tagtacatgg aatcataact atcaagacac tgtatagcct agtgaaacca gattttttga aaatacaaaa catgtggaaa agaagtgagg tgactaacag tcgagatacc aaatctggta tttctgaaaa atgaagaatt acaaagatgt ataaattgtt aaaactttaa attatttcac tgcactcgtc ctcactctgt ataccaggtt tgtcaccacg caggctggtc gggattacag cggtgaaacc atagttccag ttgcagtgag ctcaaaaaaa agaaagtata ctattataga tatcattccc gtataaaatt 17

. . . after next 200 pages 683041 683101 683161 683221 683281 683341 683401 . . . after next 200 pages 683041 683101 683161 683221 683281 683341 683401 683461 683521 683581 683641 683701 683761 683821 683881 683941 684001 684061 684121 684181 684241 684301 684361 684421 684481 684541 684601 684661 684721 684781 684841 684901 ggaggtgggg agccaccaac gaggagcacc agcgaccatc aatgtgggga ataggagact tctataacct aaatggatta aaaaaagaaa ctccaacact tttaaaggtt aagaatgttg cactgttagt ttttttcctt tcaaggagta ataggttggg tctcccagtc atatttcttg gctttatttc acatttcttg agtaccatta taatatgttg gaatgggtgg caattccact taaaacagtgtcatag tgtttgtccc gcactatttt taatcccagc ccaacgtggt atgcctataa cagaggttgt agcgcctctg ccatctggga tctgccgggc gagaatgggc aaagag ccattttgtt tacccccaaa agggcgatgc gagaaaaaaa tgtcacctaa ttcagcttaa aatattggcc ctgatggctt catttcaacc tctttgtggt gaagttctcc actttcaggt gaggctttgc attaagttag tcttttttgg cgctccgtga cctggtccag gtggttagat tactggtgag acaatgatat tgatcaggaa aatgtatatg atgaacttta actttgggag gaaaccacat tcccagctac ggtgagctga cccagccgcc agtgaggagc tgccccgtct catgatgacg agatcagatt ctgtactaag cccctgctct aagatgtgct aaatcattga tgaccaggga ctgttttgtc cccactctct ccctttgtgg atggtgaatc gttctctgta tggataatat acaccaatca tcattccttt tttatatttg gcctgataat ggacagggac agtagatact gaatggaatt aagccttgtc tgtttctgct taaagccagg gcagagggag aaatcctcat gccaaggcag ctctactaaa ttgggaggct gattgtgcca ccatctggga gcctctgcct gggaagtgtt atggtggttt gttactgtgt aaaaattctt ctgaaacatg ttgttaaaca aggattattt tcaataccca tcttaataaa tctggcttgt gtaacccagt tgacaattat tttcctgaat cctgaagagt aatgtaggtt tcattctttt actgtgcttt tactctgcaa tattttgttc catatataaa tgccttaatt taagtcttta accacaatgg gcttgaagca aaagaaaacc agcagggcca gcagatcact aatacaaaaa gaggcaggag ctgtactcca ggtggggagc ggccaccccg cccaacagct tgtcgaaaag ctgtgtagaa ctgccttggg tgctgtgtca gatgcttgaa atgccctatg caaatacagt tttttatata agagtttctg ctttct gtgtcttggt ttgaatattg gttttccaac tggtcttttc ttctctaatc atacttgaca gttaaaaagg attgttgcaa tacttgctga ttcaagatgg aaccttactt aaaaaaggac tctcctgatt gttgagtctt ggtgcagtgg tgaggtcagg ttagccaggc aaatgcttga gcctgggcaa gcctctgtcc tctgggaagt ctgaagagac aaaaggggga agaagtagac atgctgttaa actcagggtt gacagaaaaa gcatcccttt aagacctatt ggaaaa cagagagatc gcccttaaca gttgctcttc gcctgtgtgg ttggttccat acatagtccc ttgtcttcaa aagcactttc aaaaactcca cctaagcact ataaagggat attcaatttc tcctcatcta agaattactt cctagggcat aatctgtcag ctcacacctg accagcctgt gtggtggtgc acctgggagg cagaacaaga 18

Human chromosome 1 4, 814, 628 lines = =100, 000 pages = 100 books Human chromosome 1 4, 814, 628 lines = =100, 000 pages = 100 books (1000 pages each) 19

Nature 2012, Sept th, 6 v. 489, p 46 Nature 2012, Sept th, 6 v. 489, p 46

Lab 2013 Lab 2013

The 1000 Genome Project A guide to your ancestry The pattern of the human The 1000 Genome Project A guide to your ancestry The pattern of the human genetic variations believed to be a key to reveal much about the human population history and diversity. The 1000 Genome project has sequences 1092 genome from different populations and by identifying the sequence that correspond to LWK, GBR, JPT and FIN, we are aiming to learn more about the population genetic patterns and to get a picture of the genetic diversity existed within the mentioned populations. The 1000 genome project effort to catalogue the human genetic variation is utilized in this project to calculate and compare these genetic differences between 14 populations. I am presenting the results that our bioinformatics lab’s team obtained so far and working on having it put in a paper. Using Perl programming to compute the differences between each two individual’s genomes from the 1000 Genome project for the 14 populations • • • • ASW CEU CHB CHS CLM FIN GBR IBS JPT LWK MXL PUR TSI YRI Hap. Map African ancestry individuals from SW US CEPH individuals (CHB) Han Chinese in Beijing (CHB) Han Chinese South Colombian in Medellin, Colombia Hap. Map Finnish individuals from Finland British individuals from England Scotland (GBR) Iberian populations in Spain JPT Japanese individuals (LWK) Luhya individuals Hap. Map Mexican individuals from LA California Puerto Rican in Puerto Rico Toscan individuals (YRI) Yoruba individuals

The Graph above illustrates the distribution of the genetic differences among the 14 populations. The Graph above illustrates the distribution of the genetic differences among the 14 populations. The X axis shows the range in the number of differences (2. 7 million – 5. 5 million). The Y axis represents the number of pairs (two individuals compared by calculating the number of genetic differences between their genomes).

Figure 2: The Graph below showing the 14 populations consisting 4 distinct origins and Figure 2: The Graph below showing the 14 populations consisting 4 distinct origins and lets call them 4 ancestries. 1_African , 2_Hybrid , 3_European, 4 Asian. 4 3 1 2

Figure 3: The three populations that have African origin, they total differences distributed close Figure 3: The three populations that have African origin, they total differences distributed close to each other. The LWK population(Luhya individuals ) showd some individual who had almost half (2. 7 million – 4. 8 million) the number of differences, almost all of these have been declared as siblings and relatives. Some of them are not declared to be relatives by the 100 Genome project so our results suggest that they might be some undeclared relatives in the 100 genome project.

We further examined some populations for any declared relationships between any of these individuals; We further examined some populations for any declared relationships between any of these individuals; the relatives showed that they have the minimum difference in their genetic variation. For example, In the LWK population as showing in the table below, the relatives fall at the top of the list when we sorted the total differences from lowest to highest. The green highlighted cells showing that these individuals are related to each other as been declared by the 1000 genome appendix, The ones that are not highlighted we suggest that they are somehow relatives but they haven’t been declared by the 1000 genome project. 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 ID 1_L WK NA 193 74 NA 193 52 NA 194 70 NA 193 97 NA 194 44 NA 193 34 NA 193 82 NA 194 53 NA 194 70 NA 193 31 NA 193 82 NA 194 53 NA 193 34 NA 194 69 ID 2_LW K NA 1937 3 NA 1934 7 NA 1944 3 NA 1939 6 NA 1943 4 NA 1933 1 NA 1938 1 NA 1944 5 NA 1946 9 NA 1931 3 NA 1938 0 NA 1944 4 NA 1931 3 NA 1944 3 Total_LWK differences 2756691 Siblings 2777456 Siblings 2848500 Aunt/Uncle 2871776 Siblings 3004459 Siblings 3007478 ? 3070661 uncertain parent/child relationship 3077137 ? 3111728 Niece/Nephew 3119208 ? 3970915 Half Siblings 4106949 ? 4178970 Unknown relation 4236592 Niece/Nephew

Figure 4: CLM, PUR and MXL populations, they show a very wide distribution ranged Figure 4: CLM, PUR and MXL populations, they show a very wide distribution ranged from 3. 1 -4. 86. what our results indicate that these population have wide range of mixed blood. The PUR population have a second peak showing on the right side (range between 4. 74 -4. 9 million), we expect that these individuals having different blood. More investigation on these people being conducted to know where do they have blood from.

Figure 5: Populations from FIN, GBR, TSI, CEU and IBS. All these population fall Figure 5: Populations from FIN, GBR, TSI, CEU and IBS. All these population fall under European origin. The IBS population show as a really low curve because only 13 person have been sequenced from this population.

Figure 6: The population from Asian origin showed how they are close in their Figure 6: The population from Asian origin showed how they are close in their blood by having really close shape of distribution that ranged between 3. 4 million- 3. 69 million.

We are more investigating the highest differences pairs (the highest differences between pairs of We are more investigating the highest differences pairs (the highest differences between pairs of individuals) that we suggest that they possibly have a different origin. We investigated the highest 40 pairs in some population and we found that some individuals showed high difference with other individual and that were significantly repeated. Example in the figure below

The list below is the CLM individuals that showed the highest genetic differences with The list below is the CLM individuals that showed the highest genetic differences with each other and when we looked at them individually we noticed that some of them have been repeated significantly more than others as it shows in the right side list of repeats. We see that HG 01551 and HG 01342 has been repeated as highest difference for 20 times while others were repeated 2 and 3 times. So we decided to investigate the possibility of these individuals having other origin. • HG 01551 4479513 HG 01136 • HG 01365 • HG 01342 • HG 01551 • HG 01488 • HG 01366 • HG 01551 • HG 01342 • HG 01377 • HG 01462 • HG 01551 • HG 01461 • HG 01342 • HG 01551 • HG 01375 • HG 01551 • HG 01389 • HG 01342 • HG 01551 • HG 01342 • HG 01440 • HG 01342 • HG 01551 • HG 01551 • HG 01390 • HG 01462 • HG 01551 4480834 4481529 4481637 4483529 4485279 4487693 4488647 4490996 4493212 4493218 4494064 4494414 4496682 4497146 4498051 4499694 4499713 4500523 4501432 4503181 4506393 4508562 4510222 4514486 4519187 4520380 4527415 4533004 4535490 4537772 4541901 4542804 4558088 4561600 4562418 4564478 4577349 4608288 4678948 HG 01342 HG 01250 HG 01375 HG 01125 HG 01342 HG 01259 HG 01271 HG 01277 HG 01342 HG 01390 HG 01365 HG 01342 HG 01125 HG 01148 HG 01345 HG 01342 HG 01134 HG 01495 HG 01342 HG 01148 HG 01377 HG 01134 HG 01389 HG 01124 HG 01342 HG 01275 HG 01272 HG 01488 HG 01461 HG 01462 HG 01275 HG 01342 HG 01440 HG 01390 HG 01342

The idea was to take those repeated high difference individuals with 10 other controls The idea was to take those repeated high difference individuals with 10 other controls from the same population that showed average number of genetic difference within the same population , we then randomly took individuals from other populations and calculated the genetic differences between our 10 control +2 high repeats and the 1 control from the other populations. The comparison below was between 10 controls from CLM plus the 2 high repeated high genetic difference (HG 01551 and HG 01342 ) , against one control individual from YRI population(Yoruba individuals ) “African Ancestry “. HG 01551 and HG 01342 had the lowest difference indicating that these two persons might be from African origin.

We more compared CLM controls with individual from African population(LWK) and another individual from We more compared CLM controls with individual from African population(LWK) and another individual from Asian(CHS). The two control individuals showed lowest genetic difference against LWK control while showed highest difference when against CHS individual. This suggest that our two individuals from CLM population are originally belong to an African origin. CLM - LWK CLM - CHS

Conclusions • Total variants showed substantial geographic differentiation, • Total number of differences determines Conclusions • Total variants showed substantial geographic differentiation, • Total number of differences determines diverse populations that are more geographically and ancestrally remote. • populations are grouped by the predominant component of ancestry: Europe (CEU, TSI, GBR, FIN and IBS), Africa (YRI, LWK and ASW), East Asia (CHB, JPT and CHS) and the Americas (MXL, CLM and PUR). • Relatives within the same population have significantly less number of genotype variations “almost half the number” comparing to the non relatives. • The study of human genetic variation has evolutionary significance. It can help to understand ancient human population migrations as well as how different human groups are biologically related to one another.