Protein Motifs and Proteomics Tools
Take a Class
This guide supports the Galter Library class called Protein Motifs and Proteomics Tools. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule, it is still available to you or your group by request.
Searching for Protein Sequences
Protein sequences can be found at the NCBI site or at the UniProt database through the EBI.
NCBI's Protein Database
Go to the NCBI home page at:
Select the Protein database from the pull down menu above the search box or from the Popular Resources box on the right of the page. Enter your search terms in the search box. This guide uses the example of microtubule-associated protein tau in rat. For this sample search, use the terms "MAPT AND rat[organism]" in the search box.
- See the number of results
- Choose Limits or Advanced Search to refine your search results
- Choose only reference sequences by selecting RefSeq from the right side filtering box
- Link to the suggested gene records by using links in the Gene results box above your results list
- Run BLAST or create a multiple alignment of all sequences in your results using COBALT (COBALT is only available for Protein database results)
Choose the reference sequence result by either clicking on RefSeq then on the description on the next page, or by clicking on the description for the second result: microtubule-associated protein tau [Rattus norvegicus]
You are now viewing the NCBI Protein record for tau in rat. Use the links on the right side of the page to link to relevant information on this particular protein.
- Link to manuscripts in PubMed dealing with your protein
- View your protein's coding gene's involvement in pathways in NCBI's BioSystems database
- View identical proteins or homologous genes in other species
- Link to the RefSeq mRNA record
- Link to external sources of antibodies or cDNA clones for your protein
- Link to other related records using the All links from this record list
- Note: If a structure exists for the protein record you are viewing, there will be a Structure link in this list. If there is a structure to a protein that bears some sequence similarity to your protein, you will see a Related Structures link.
- Link to conserved domains by clicking on the Identify Conserved Domains link in the Analyze this sequence section near the top right of the protein record view
Click the Identify Conserved Domains link.
The NCBI Conserved Domains Database (CDD)
On the Conserved Domains page, you will see all of the regions of your protein that map to one or more conserved domains in NCBI's Conserved Domains Database. These domains are curated from the Pfam, SMART and COGs databases of protein families.
- Click on the colored bars aligned to the query sequence in the graphic to view your query aligned to that specific domain
- Click on the plus signs (+) to expand each domain hit region to see your query sequence aligned to a consensus sequence for that domain family
- Click on Search for similar domain architectures to open the Conserved Domain Architecture Retrieval Tool (CDART), which will retrieve all sequences in NCBI's protein database that have a predicted match to this domain.
Searching for Protein Sequences in UniProt
UniProt's search page makes it easy to construct an efficient search for a protein of interest. Access the UniProt home page from:
You can simply enter your search terms in the query box (eg. MAPT AND rat), but you can get more precise search results by clicking on the Fields blue text link (to the right of the Clear button).
If you don't know the gene name associated with your protein of interest, you can try searching the field Protein name by using the pull down Field menu and typing the protein name.
- Try typing tau in the term box next to the Protein name field.
- Click the Add & Search button.
- You get a lot of tau results but you can probably find your protein of interest in the list. However, to make your search more refined, click on Fields again.
- Now pull down to Organism in the Field menu and type rat in the term box. An index list will show up, so just select rat from that list.
- Click the Add & Search button.
Now you have much more specific search results.
You can view clusters of similar sequences or browse by taxonomy, gene ontology, enzyme class or pathway using links at the top.
Click the Accession number link to access the record for TAU_RAT.
- Use the tabs at the top to BLAST the sequence, align all of the isoforms on the page, retrieve sequences or data, or map IDs to other databases
- Note: When using the ID Mapping tab, in order to get the best results between UniProt and NCBI databases, use the pull down menus on the ID Mapping page to map from UniProtKB AC/ID to GI number, then click the links to the GI records at NCBI. If you need to find RefSeq information for these records, look for the reference sequence information in the right side of the NCBI records.
- View clusters of similar sequences using the percent identity links
- Internal links to document features will take you to detailed sections on alternative products, sequences, attributes, ontologies and sequence annotation features such as mutated residues
- Download the sequence in many formats, including FASTA
Click on the FASTA
button, and copy
the sequence from the page.
Protein Sequence Tools
There are numerous protein sequence tools available on the Web, and finding them all is sometimes difficult. Fortunately, ExPASy has assembled a large list of protein analysis tools.
ExPASy is located at the URL:
There are many databases and tools linked from the ExPASy site. This guide cannot describe them all, so it will concentrate on the tools linked from the Proteomics tools link.
To identify a protein's composition and isoelectric point:
- Scroll down the page to the link for ProtParam
- Input the protein's UniProt/SwissProt identifier, or paste your protein's sequence in the text box (this example is using UniProt/SwissProt identifier P19332)
- Click Compute parameters
- If multiple chains are identified, select the chain you are most interested in
You will get results for the molecular weight, isoelectric point, amino acid percentage composition, stability and many other features.
To find predicted protease cleavage sites:
- Scroll down the ExPASy proteomics tools site to PeptideCutter
- Input the protein's UniProt/SwissProt identifier or paste an amino acid sequence in the text box
- You can choose all cutters, or specify certain enzymes. You can also select enzymes or chemicals that cut your protein a certain number of times. This is useful if you are interested in seeing what enzymes cleave your protein just once.
- Click Perform
- The program will retrieve a table of all of the substances that cleave your protein along with the locations of the cleavage sites.
To find patterns or profiles in a protein sequence:
- In ExPASy proteomics tools, find InterPro Scan (under the section Pattern and profile searches)
- InterPro Scan will search Pfam, Prosite, Prints, SCOP, PANTHER and other protein pattern databases
- Input your sequence in FASTA or plain sequence format in the input box - you cannot use an identifier with this tool
- Use the checkboxes to choose specific profile tools (the default is that all tools available are selected)
- Click Submit Job - Be patient. Results may take a few minutes, depending on the size of your sequence
- Results are returned in a graphic table format, with links to results in each protein profile database
- Motif Scan is also located in the Pattern and profile searches section of the ExPASy tools page
- Input your sequence or a UniProt/SwissProt identifier in the text box
- Select the motif databases you want to search using the checkboxes in the Parameters section
- Click Search
- Results are returned in an interactive graphical format. Results in Motif Scan include more types of profiles from Prosite than you will see with InterPro Scan, but it searches fewer profile databases.
Structure Prediction Tools
The ExPASy site has a number of structure prediction tools, from primary sequence prediction/analysis (coiled-coil predictions, protein coloring applications, disulfide bond prediction), to secondary and tertiary structure prediction. Many of these tools are useful for finding areas in your sequence that have features that may imply function.
Use caution if trying tertiary structure predictors: they return only hypothetical results, so use the tertiary structure assessment tools on any predictions you create.
Secondary structure prediction servers generate maps of predicted loops, helices and beta strands in your sequence, as well as predictions for disulfide bonds, transmembrane domains and solvent accessibility. Some good secondary structure prediction sites listed at ExPASy are:
- Jpred - consensus secondary structure prediction
- Jpred will notify you if a match has been found in PDB that is close to your sequence. You then will have the choice of viewing this PDB file or continuing with a unique Jpred prediction.
- PredictProtein - provides multiple structure feature predictions from a sequence, generates a multiple sequence alignment to best BLAST matches and searches Prosite for patterns
- PredictProtein will run 3 sequence queries for free using the "premium service". After that, you can choose to use the older server for predictions for free or use the premium server for a fee. The free (older) server works fine for most smaller proteins.
- PSIpred - combines multiple prediction tools into one server
Users are encouraged to try many of the tools listed at the ExPASy site, in addition to any others they may find elsewhere.
Try Practicing with a Protein
Here is a protein to practice with:
- NCBI accession number: NP_083049.1
- UniProt accession number: Q8K352
These two accession numbers are for the same protein sequence, but in two different databases.
- What is the name of this protein? What is the species in which it is found? How many amino acids long is it?
- Finding Conserved Domains in CDD at NCBI (Hint: use the conserved domains link from the NCBI protein record)
- How many conserved domains does this protein contain according to the Conserved Domains Database at NCBI?
- Can you find structure(s) for any of these conserved domains?
- Finding protein families from the UniProt record
- How many unique InterPro family identifiers are there for this protein? How many Pfam families?
- Using Peptide Cutter, what enzymes/reagents cleave this protein exactly 2 times?
Proteomics Experimental Data
If you are seeking data from proteomics experiments, there are a few sites that have data integrated under searchable interfaces.
- The Swiss-2D PAGE site at ExPASy has data from 2D PAGE and SDS PAGE experiments, plus protocols and standards.
- The PRIDE database at EBI is a searchable database of proteomics data from a number of experiments.
- The IntAct database at EBI is another searchable proteomics data collection. It deals specifically with protein interaction experiments.
Resources and Help
Here are help pages and resources on using some of the tools and databases shown in this guide.
For further information, contact us