Print or Save to your Computer

Genetic Variation and Human Disease

Take a Class

This guide supports the Galter Library class called Genetic Variation and Human Disease. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule, it is still available to you or your group by request.

Background

Two types of genetic variation events are the sources of most human genetic variation:

  • Single base mutation which substitutes one nucleotide for another
    • Single Nucleotide Polymorphisms (SNPs)
      • SNPs are the most common form of polymorphism
      • Example: a single SNP from T to A on the hemoglobin beta locus of chromosome 11 causes a missense coding event changing glutamine to valine, leading to the sickle cell anemia phenotype.
  • Insertion or deletion of one or more nucleotide(s) 
    • Tandem Repeat Polymorphisms
      • Tandem repeats or variable number of tandem repeats (VNTR) are a very common class of polymorphism, consisting of variable length of sequence motifs that are repeated in tandem in a variable copy number
      • VNTRs are subdivided into two subgroups based on the size of the tandem repeat units.
        • Microsatellites or Short Tandem Repeat (STR)
          repeat unit: 1-6 (dinucleotide repeat: CACACACACACA)
        • Minisatellites
          repeat unit: 14-100
      • Example: Spinocerebellar ataxia Type10 (SCA10) is caused by the largest tandem repeat seen in the human genome. Normal population has 10-22 mer pentanucleotide ATTCT repeat in intron 9 of SCA10 gene; where as SCA10 patients have 800-4500 repeat units, which causes the disease allele to be up to 22.5 kb larger than the normal one.
    • Insertion/Deletion Polymorphisms (indels)
      • Common and widely distributed in the human population
      • Association between coronary heart disease and a 287 bp indel polymorphism located in intron 16 of the angiotensin converting enzyme (ACE) have been reported. This indel, known as ACE/ID is responsible for 50% of the inter-individual variability of plasma ACE concentration.
In addition to the events listed above, gross chromosomal aberrations like deletions, inversions or translocations of large segments of DNA have been associated with numerous clinically characterized genomic syndromes.

Example:  Velocardiofacial syndrome (VCSF), characterized by the presence of features like cleft palate, cardiac anomalies and learning disabilities, is associated with a deletion mutation on chromosome 22q11.2.

Very few polymorphisms show direct impact by creating deleterious phenotypes.  However, non-disease-causing polymorphisms, when mapped to the genome, may serve as markers to identify and map other genes that do cause disease when mutated.  If these non-disease-causing variations are found to be inherited with a particular trait, but do not cause the trait, they may provide evidence of where the trait's gene is located in the genome.

Terminology

  • Allele: Alternative form of a genetic locus; a single allele for each locus is inherited separately from each parent
  • Polymorphism: Difference in DNA sequence among individuals
  • Linkage Disequilibrium (LD): If two alleles tend to be inherited together more often than would be predicted, then the alleles are in linkage disequilibrium
  • Haplotype:  The set of alleles on one particular chromosome; each person has two haplotypes in a given region, and each haplotype will be passed on as a complete unit
  • SNP or mutation?
    • A single base change occurring in a population at a frequency of greater than 1% is a SNP
    • A single base change occurring in a population at a frequency of less than 1% is a mutation

NCBI's dbSNP

NCBI's single nucleotide polymorphism database dbSNP is the most-used SNP database worldwide. 

The direct URL for dbSNP is:

http://www.ncbi.nlm.nih.gov/SNP/index.html

but most often SNP records are accessed through links from other databases such as NCBI's Gene or OMIM databases or from European Bioinformatics Institute (EBI) databases.  The URL above is useful for searching submitted SNP identifiers or searching for batches of SNP or experiments.

In addition to SNPs, dbSNP contains data from small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs) and microsatellite repeat variations (also called short tandem repeats or STRs).

You can view summary statistics for the current build of dbSNP or for past builds.   dbSNP also has extensive documentation and a submission system for researchers to submit SNP data from their own experiments.  These features can be accessed from the menu on the left side of the dbSNP home page.

 

dbSNP_menu

 

When a researcher uploads SNP data, each SNP is assigned a submitted SNP identifier (ss#).  Submitted SNPs are then checked against the current contents of dbSNP.  SNPs that are redundant (i.e., match already-submitted SNP types) are assigned to the appropriate reference SNP (rs#) cluster.  Unique SNPs are assigned new rs numbers upon the next build of dbSNP.  Submitted SNP records contain information on the experimental procedures used to identify the SNP, the research project in which the SNP was identified and more data on the SNP.

 

ss_page

 

A different searchable interface for NCBI's dbSNP is found at the URL:

http://www.ncbi.nlm.nih.gov/snp/

This page allows you to search more efficiently for SNPs associated with a specific gene or disease in specific organisms or in a particular region of a chromosome.  You can do this by using the Limits tab on this page.

Example:  Mutations on BRCA1 gene have been reported to be associated with the early onset of breast cancer. Retrieve all non-synonymous and validated coding refSNPs for human BRCA1 from dbSNP.

Solution:

  • Go to the NCBI SNP home page
  • Enter BRCA1 [Gene Name] in the search box
  • Click on Limits
  • Go to Organism(s) and select Homo Sapiens
  • Go to Function Class and select the checkbox next to coding nonsynonymous
  • Go to Validation Status and select all options except "no-info"
  • You can click on Details and review your Query Translation (optional)
  • Click on Go
SNP_BRCA1_search



Limits in dbSNP Searches

  • Organism
  • Chromosome (including entries for nonmammals)
  • Chromosome Range
  • Map Weight (how many times in genome)
  • Function Class (coding nonsynonymous, intron, etc.)
  • SNP Class (heterozygosity, indels, etc.)
  • Variation Alleles (using IUPAC-International Union of Pure and Applied Chemistry-codes)
  • "Created" and "Updated" Builds
  • Annotation: Records with links to other NCBI data domains (OMIM, Nucleotide, Protein, Structure, PubMed)
  • Type of validation 
  • % Heterozygosity 
  • Success Rate (likelihood that the SNP is real; = 1 minus false positive rate) 
  • Method Class
  • Individual SNP maps (Venter, Watson and Chinese_YH1)
  • Minor Allele Frequency (by HapMap population class) 

dbSNP Search Results

 

dbSNP_results_graphic

 

You can access specific links from the dbSNP search results by using the colored graphic representation bar, or view the full record by clicking the rs number.

From the full record, you can view SNPs in Sequence Viewer to see their genomic orientation, plus mapping to mRNA and protein products.  You can also view population diversity for the alleles and the FASTA sequence of the polymorphism.

The default view of a record from a search of dbSNP is a view of only coding SNPs.  To view all SNPs in a gene region, click on  the radio button next to in gene region in the GeneView section of the SNP record, then click Go.

 

GeneRegionSNPs

 

Viewing SNPs in Map Viewer

You can also view SNPs in a chromosomal context by using NCBI's MapViewer.

Example:  Mutations in Dopamine Receptor 5 (DRD5) gene have been observed in patients with various neurological disorders. How many refSNP records can you find for DRD5 and how many of them are present in its coding region? Show all ref SNPs in the context of a chromosome.

Solution:

  • Start at Entrez Gene
  • Enter DRD5 in the search box
  • Click on the Limits tab
  • Select Gene Name from the drop down list of "To limit your search to a specific field"
  • Go to Limit by Taxonomy and select Homo sapiens
  • Click Go
  • Click on DRD5 from Entrez Gene search results to view gene information
  • Under Links, click Map Viewer to display all SNPs in Map Viewer
  • In Map Viewer, click on "Map and Options" (appears at left side bar or near upper right corner)
  • A new pop up window will appear
  • Select Variation available under Sequence Maps in Available Maps section and click on ADD button to include it in Maps Displayed (left to right) box
MapsOptionsVar


  • Select Variation in Maps Displayed (left to right) box and click on Make Master/Move to Bottom button
  • Click on Apply button

 

Variation in MapViewer

 



You now will see SNPs that align to the gene region of interest, with links to details on each SNP in dbSNP.  NCBI has a descriptive legend for all of the symbols you see for refSNPs in Map Viewer, so you can learn how to tell quickly which SNPs match to transcript and coding regions, for example.

 

 

Online Mendelian Inheritance in Man (OMIM)

Online Mendelian Inheritance in Man (OMIM) is a database of human gene-phenotype correlations maintained by the NCBI.  Records in OMIM are presented as summary reviews of the current state of knowledge on known Mendelian disorders and over 12,000 genes.  OMIM records are good starting points for learning disease-gene links, because a great deal of information can be accessed from each record.

 

OMIM

 Accessing Allelic Variants in OMIM

You can view a list of allelic variants for a gene from its record in OMIM.  

Example:  Search OMIM for information on the gene glucose-6-phosphate dehydrogenase (G6PD).  What known disorders are caused by allelic variants in this gene?

Solution:

  • Start at OMIM
  • Type G6PD in the search box
  • Click the first entry (+305900...)
OMIM_result
  • On the record for G6PD, click the Allelic Variants link in the Table of Contents near the upper right of the page.  You can see all of the allelic variants in the record and if any of these variants are linked to diseases or disorders.  Usually, pathological conditions linked to an allelic variant will be listed just below the allele description.
  • You can also see a table of allelic variants by clicking the See List link in the Table of Contents

Limitations of Allelic Variants in OMIM

Not all possible allelic variants will be listed in OMIM.  The Gene View in dbSNP will give you a better listing of all polymorphisms for a gene.  However, the allelic variants in OMIM are usually selected because they have some significance.  Reasons for allelic variants to be included in OMIM include:

  • the first mutation to be discovered
  • high population frequency
  • distinctive phenotype 
  • historic significance 
  • unusual mechanism of mutation 
  • unusual pathogenic mechanism 
  • distinctive inheritance

HapMap and 1000 Genomes Projects

NCBI's dbSNP includes data from both the International HapMap Project and the 1000 Genomes Project.  When viewing a GeneView in dbSNP, you can see which SNPs have been sequenced or mapped by either project in the Validation column of the gene report. 

 

HapMap1000Genomes

 

 

You can also browse data from the International HapMap project directly from the HapMap Project website.  Links in the left menu will take you to a search interface for the most current data from the project.

Haplotypes are contiguous, linear sets of SNP alleles along a genome that are inherited as a block.  Haplotypes can give information on what markers are inherited together and thus can be used to find markers for disease trait inheritance.

  • Search by rs number, chromosomal region or gene name
  • Example: Type ACE in the search box
  • On the resulting page, click one of the chromosomal regions available to get a more detailed view of a region
HapMapSearch


  • Now you can view detailed SNPs in the region, plus haplotype frequencies in each population for each SNP
HapMap_frequencies

 

Data from the 1000 Genomes Project does not currently have its own separate search interface, but the 1000 Genomes website has links through which you can download entire sequence datasets for your own research use.

Other Genome Variation Resources

Other genome variation resources and servers are available on the Web. 

For further information, contact us

Last Updated: 08/05/2010

Contact Northwestern University |  Disclaimer |  Campus Emergency Information |  Policy Statements

Northwestern Home  |  Northwestern Calendar: Plan-It Purple  |  Northwestern Search