BLAST: fastacmd

By Haktan Suren, PhD
Oct 3rd, 2012
4 Comments
11447 Views

I ll list two useful commands of this function.

1-) Get a brief summary about the BLAST database

Usage:

fastacmd -d database_name -I T

Example:

fastacmd -d nr -I T

Output:

Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
 excluding environmental samples from WGS projects
 13,123,072 sequences; 4,491,037,066 total letters

File names:
 nr.00
 Date: Feb 16, 2011 6:27 PM Version: 4 Longest sequence: 36,805 res
 nr.01
 Date: Feb 16, 2011 6:27 PM Version: 4 Longest sequence: 35,213 res
 nr.02
 Date: Feb 16, 2011 6:27 PM Version: 4 Longest sequence: 33,423 res
 nr.03
 Date: Feb 16, 2011 6:27 PM Version: 4 Longest sequence: 34,170 res
 nr.04
 Date: Feb 16, 2011 6:27 PM Version: 4 Longest sequence: 33,452 res

2-) Get sequence(s) of a given query (list). You can either use GI or accession

Usage:

fastacmd -d database_name -s "seq_name 1, seqname 2 ..."
fastacmd -d database_name -i "file name of the list of the seqs"

Example:

fastacmd -d nr -s 90110050

Output:

>gi|157266326|ref|NP_000266.2| P protein [Homo sapiens] >gi|90110050|sp|Q04671.2|P_HUMAN RecName: Full=P protein; AltName: Full=Melanocyte-specific transporter protein; AltName: Full=Pink-eyed dilution protein homolog >gi|773328|gb|AAC13784.1| P protein [Homo sapiens] >gi|119578067|gb|EAW57663.1| oculocutaneous albinism II (pink-eye dilution homolog, mouse), isoform CRA_c [Homo sapiens]
MHLEGRDGRRYPGAPAVELLQTSVPSGLAELVAGKRRLPRGAGGADPSHSCPRGAAGQSSWAPAGQEFASFLTKGRSHSS
LPQMSSSRSKDSCFTENTPLLRNSLQEKGSRCIPVYHPEFITAEESWEDSSADWERRYLLSREVSGLSASASSEKGDLLD
SPHIRLRLSKLRRCVQWLKVMGLFAFVVLCSILFSLYPDQGKLWQLLALSPLENYSVNLSSHVDSTLLQVDLAGALVASG
PSRPGREEHIVVELTQADALGSRWRRPQQVTHNWTVYLNPRRSEHSVMSRTFEVLTRETVSISIRASLQQTQAVPLLMAH
QYLRGSVETQVTIATAILAGVYALIIFEIVHRTLAAMLGSLAALAALAVIGDRPSLTHVVEWIDFETLALLFGMMILVAI
FSETGFFDYCAVKAYRLSRGRVWAMIIMLCLIAAVLSAFLDNVTTMLLFTPVTIRLCEVLNLDPRQVLIAEVIFTNIGGA
ATAIGDPPNVIIVSNQELRKMGLDFAGFTAHMFIGICLVLLVCFPLLRLLYWNRKLYNKEPSEIVELKHEIHVWRLTAQR
ISPASREETAVRRLLLGKVLALEHLLARRLHTFHRQISQEDKNWETNIQELQKKHRISDGILLAKCLTVLGFVIFMFFLN
SFVPGIHLDLGWIAILGAIWLLILADIHDFEIILHRVEWATLLFFAALFVLMEALAHLHLIEYVGEQTALLIKMVPEEQR
LIAAIVLVVWVSALASSLIDNIPFTATMIPVLLNLSHDPEVGLPAPPLMYALAFGACLGGNGTLIGASANVVCAGIAEQH
GYGFSFMEFFRLGFPMMVVSCTVGMCYLLVAHVVVGWN

That’s all!

About the Author

Haktan Suren, PhD
- Webguru, Programmer, Web developer, and Father :)

4 Responses to “BLAST: fastacmd”

  1. Woody_SInbad says:

    Hi ~ I have tried to test fastacmd get sequence with NR database which work perfect , but why I got wrong sequence with Uniprot database ?

    • Haktan Suren, PhD Haktan Suren says:

      Hi, Can you give me a toy example so i can test what’s going on?

      • Woody_SInbad says:

        OK , I use Uniprot accession as arg , the command line is this :

        fastacmd -d /path/to/uniprot/uniprot_sprot.fasta -s “Q197F8”

        this gives back :

        >gnl|BL_ORD_ID|2 sp|Q197F5|005L_IIV3 Uncharacterized protein 005L OS=Invertebrate iridescent virus 3 GN=IIV3-005L PE=4 SV=1
        MRYTVLIALQGALLLLLLIDDGQGQSPYPYPGMPCNSSRQCGLGTCVHSRCAHCSSDGTLCSPEDPTMVWPCCPESSCQL
        VVGLPSLVNHYNCLPNQCTDSSQCPGGFGCMTRRSKCELCKADGEACNSPYLDWRKDKECCSGYCHTEARGLEGVCIDPK
        KIFCTPKNPWQLAPYPPSYHQPTTLRPPTSLYDSWLMSGFLVKSTTAPSTQEEEDDY

        which is not what I expected …

        • Haktan Suren, PhD Haktan Suren says:

          Thanks for the example, I will check this and let you know if i can figure out what’s going on. Btw, Have you tried other accession IDs in uniprot? I was wondering if this is specific only to the above accession ID or more like generic?

Wrap your code in <code class="{language}"></code> tags to embed!

Leave a Reply

E-mail address is required for commenting. However, it won't be visible to other users.

Loading Facebook Comments ...
Loading Disqus Comments ...