Print or Save to your Computer

Nucleotide Sequence and Structure

Take a Class

This guide supports the Galter Library class called Nucleotide Sequence and Structure. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule, it is still available to you or your group by request.

Acknowledgements

This class is based on the excellent NCBI classes formerly offered as the NCBI Advanced Workshop for Bioinformatics Information Specialists.  Specifically, the modules DNA Analysis, taught by David Osterbur of Countway Library at Harvard University; and RNA Analysis, developed by Nicola Gaedeke of Biotools.info, Rana C. Morris of the NCBI, Donna Messersmith of LABS-Now LLC, and Kristi Holmes of Becker Library at Washington University at St. Louis. 

Searching for Nucleotide Sequences

The best way to find a nucleotide sequence is to begin with NCBI's GenBank, which is searched through the NCBI Nucleotide database.

Go to the NCBI home page:

http://www.ncbi.nlm.nih.gov/

You can begin your search simply by typing search terms in the search box.  You can use boolean operators (AND, OR, NOT) to create a more focused search by adding organism names, keywords, gene names, etc.  This guide uses the example of early growth response 1.  Type EGR1 in the search box and search.

 

NCBI_search_results_EGR1

 

Now click the results in the category Nucleotide.

 

Entrez Nucleotide Database

 

Entrez_Nucleotide_results

 

If you see the sequence you are searching for easily in your results, you can click the description to access the record.  However, many sequences are returned in Nucleotide database results.  To narrow your search within Entrez Nucleotide, you can use many of the features available on the nucleotide results page.

  • Click on the Limits text link to set limits by species, gene name, and other categories
  • Choose to view only reference sequences (RefSeq) or mRNA sequences by clicking the links in the upper right of this page
  • Choose sequences from just one organism by clicking on the numbers in the organisms box on the right side of the page
  • To get the most reliable validated sequences, try clicking on the Gene results displayed in the gold box near the top of the page

Click on the blue link for the number of results in Gene (152 in this example).

Accessing Sequences Through the Entrez Gene Record

Scroll down the results in Entrez Gene until you see the record for the entry you are seeking.  Species are listed in square brackets in the descriptions for each gene entry. 

 

EGR_Gene_listing

 

Click the blue gene symbol link to access the detailed record.

On the Entrez Gene record for your selected gene, you can look in the right side blue Links menu for the reference sequences for genes or mRNAs using the RefSeq links. 

To access the best curated reference sequence for your gene, scroll down the page to the section labeled Genomic regions, transcripts and products

 

EntrezGene_products

 

In this area are

  • Text links to NCBI's reference sequences for mRNA, protein and consensus coding DNA sequences 
  • Click on the link above the gene schematic to view the GenBank record for this gene on it's chromosome sequence page, with just the specific region of the gene selected in the GenBank record
  • Click on the reference sequence details link to jump down the page to other reference sequences that are available independent of the annotated genome build
    • The mRNA and protein sequences in this area may be the same as or different from the sequences shown in the genomic regions, transcripts and products area of the record, so you should compare the records to find the most suitable sequence
    • The sequences in the reference sequence details area will also give you links to source data from Ensembl, SwissProt and the conserved domain database Pfam
  • Link to NCBI's Sequence Viewer

Click on Try our new Sequence Viewer.

NCBI's Sequence Viewer

 

Sequence_Viewer

 

Sequence Viewer allows you to view and link to many features of your genomic or RNA sequence from a single interface.  You can

  • Click and drag to move the map left or right
  • Use the Tools menu to BLAST the region shown
  • Zoom in to sequence level view
  • Change the view style using the drop down menu (Default is automatically selected)
  • Use the Options menu to add more maps
  • Right click on any feature (the colored bars in the view) to open a specific menu that lets you link directly to that sequence in FASTA format, GenBank format, OMIM records for a gene and many more options 

Right click on the green (genomic view) bar, move your mouse down to Views & Tools, and click on the FASTA link.  Your record will open in a new window.  Only the sequence for your gene's region is displayed in the chromosome assembly record.

 

GenBank_cDNA_FASTA

 

You can change the region shown or display the entire chromosome cDNA sequence.

Copy the FASTA format of the cDNA from this page.

Restriction Mapping Tools

There are many web servers that provide restriction maps for selection of enzymes to cut DNA sequences.  Unfortunately, many of these sites are not updated to reflect currently enzyme availability, and are therefore not listed in this guide.  Below are some restriction mapping servers that are available.  Users are advised to try a few of the available restriction mapping websites to determine which suit their needs best.

NEBcutter

NEBcutter is provided by New England BioLabs and reflects the current available enzymes from their catalog, but allows users to explore restriction enzymes supplied by all commercial suppliers and some non-commercial suppliers, as well.  NEBcutter can be found at the URL:

http://tools.neb.com/NEBcutter2/

  • Input your DNA sequence in FASTA format (limit for sequence length is 300 kilobases)
  • Select your DNA type and preferred enzyme set
  • Click Submit

Your restriction map will be returned in a graphical format, with the names of the restriction enzymes as clickable links to their available vendors' pages.

 

NEBcutter

 

  • Zoom in to see sequence features
  • Use the List menu to get a table of enzymes with details about their cleavage type (blunt, 5', or 3' overhang) and sequence recognition features
  • View details on open reading frames 
RestrictionMapper

RestrictionMapper can be found at the URL:

http://www.restrictionmapper.org/

This site is updated approximately yearly to include all enzymes in the REBASE restriction enzyme database, so it is more current than some restriction mapping sites.

  • Input your sequence in plain nucleotide format (remove the > and the text description following it) 
  • Specify the DNA type, number of cuts or recognition sequence length, enzymes or leave the settings at the defaults
  • Click Map Sites
  • Your sites will be returned to you in a table format

RestrictionMapper does not return a graphical map of restriction sites, but the table of results is easy to read and includes links to suppliers of each restriction enzyme.

Other Restriction Mapping Websites

  • Silent - from the Pasteur Institute 
    • Returns results in table format
  • In Silico Restriction Mapper - from the University of the Basque Country
    • Returns results in table format
    • Current and updated regularly
 

Generating Primers with Primer-BLAST

As with restriction mapping, there are a number of Web servers that will generate primers for you for a nucleotide sequence.  Selection of primer pairs for your sequence is important, because you don't want to create primer dimers:  primer pairs that will bind to each other instead of binding to your nucleotide strands.

NCBI's Primer-BLAST is based on the Primer3 software program developed at the Whitehead Institute and the Howard Hughes Medical Institute.  Primer BLAST provides users the convenience and speed of using NCBI's BLAST servers to generate primers from their sequences.

There are a couple of ways to access Primer-BLAST at the NCBI:

  1. Go to NCBI's BLAST page:  http://blast.ncbi.nlm.nih.gov/ and click on the Primer-BLAST link in the Specialized BLAST section

    Primer_BLAST
  2. Click on the Pick Primers link from the Nucleotide database record for your sequence.  These links are located on the right side of NCBI's Nucleotide records in the Analyze this sequence section

    Primer_BLAST_Nucleotide_record

Primer-BLAST allows you to specify
  • Melting temperature of your primers or PCR products
  • PCR product size and primer size
  • Your own 5' or 3' primer sequence string
  • Exon junction spanning characteristics (for mRNAs)
  • Stringency characteristics (numbers of matches between primers and unintended targets)
  • Primer GC content or poly-nucleotide content
  • Complementarity
  • Hybridization oligo properties

After you click the Get Primers button:

  • Be patient!  The primer designer may take several minutes to complete its operation 
  • Results are returned to you aligned to your template nucleotide sequence, with details on product length and other characteristics

 

Primer_BLAST_results


Primer 3 and Primer3plus

If you prefer, you can use Primer3 directly by going to the Primer3 Web interface.  This interface has more features that users can specify, so you have more control over settings in selection of primers.  Primer3 also permits you enter multiple N's for undesirable elements such as vector sequences, LINEs or Alu repeats.

The Primer3plus Web interface is an updated version of the Primer3 interface, although it runs the same program source code as the Primer3 interface mentioned above.  Primer3plus has a more user-friendly interface, but also has expanded features for specifying type of primer design purpose (detection, cloning, sequencing) and easier input of specific sequence to target or mask.

Users who wish to know more about Primer3's design and use are encouraged to consult the Open Source manuscript:
Steve Rozen and Helen J. Skaletsky (2000). Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365-386.

Nucleotide Structure & Other Nucleotide Analysis Resources

RNA can take on many complex three-dimensional structures, since it is (usually) a single stranded molecule and thus more flexible than DNA.  To find nucleotide structures in the NCBI Structure database (MMDB), use the Limits or Preview/Index tabs to focus your search of structures on RNA or DNA.

NCBI's Structure Database, the Molecular Modeling Database (MMDB)

Go to the NCBI's Structure database home page:  

http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure

Example:  You want to see any structures in the database that include 2 chains of tRNA.

  • Click on the Preview/Index tab near the top of the Structure database home page.
  • Use the pull-down menu (default says All Fields) to select the field RNAStrandCount
  • Put the number 2 in the text box next to the field pull-down menu
  • Click the Index button
  • In the Index menu that appears, click the line of results for the number 2
    • Note:  If you want to select structures that contain more or less than 2 strands of RNA, you can hold down the Control button (command button on Macs) and select more than one result in the index menu.
  • Click the AND button
RNA_structure_search
  • Now go to the search box at the top of the page
  • Type AND after [RNAChainCount]
  • Now type tRNA and click Go
tRNA_search


Your results will consist of any structures that have 2 tRNA chains.

The Nucleic Acid Database (NDB)

If you want to search in a database that contains only nucleic acid structures, the NDB is a good option. 

The NDB's home page:

http://ndbserver.rutgers.edu/

Select Search on the NDB home page.

 

NDB

 

The search page has numerous fields that will help you narrow your search to find if any NMR or x-ray crystallography structures exist for your nucleotide of interest. 

NDB also is home to a number of other tools and resources:

  • Reports
  • A musical DNA atlas
  • Base pair viewer (predicts base pairing in nucleotide structures from .pdb files)

Other Nucleotide Structure Sites

Many other good nucleotide structure sites are available on the Web.

  • Mfold RNA folding prediction server - Rensselaer Polytechnic Institute 
    • Try entering this sequence at the Mfold server:
GUCUACGGCCAUACCACCCUGAACGCGCCCGAUCUCGUCUGAUCUCGGAA
GCUAAGCAGGGUCGGGCCUGGUUAGUACUUGGAUGGGAGACCGCCUGGG
AAUACCGGGUGCUGUAGGCUU
  • RNAfold structure prediction web server - University of Vienna
  • The Comparative RNA Web Site and Project - University of Texas at Austin 
    • Structure models for reference RNA molecules and RNA comparison data
  • RNAstructure - University of Rochester Medical Center
    • Not a Web server, but a downloadable Windows application for structural analysis of RNA

Other Useful Nucleotide Sites (Miscellaneous)

  •  The RNA Modification Database - University of Utah
    • Database of post-transcriptional modifications of RNA
  • NCBI's Probe database of reagents for functional genomics
    • Search for reagents used for experimental projects on gene silencing, genome mapping, SNP discovery, expression mapping and gene discovery

For further information, contact us

Last Updated: 03/07/2012

Contact Northwestern University |  Disclaimer |  Campus Emergency Information |  Policy Statements

Northwestern Home  |  Northwestern Calendar: Plan-It Purple  |  Northwestern Search