Some people run into trouble trying to setup a standalone version of BLAST using the NCBI instructions. Here a stremalined process will be presented, targeted at Ubuntu.
I assume that you are aware of the paradigms of blast, meaning that there are several executables for searching nucleic acids or proteins and there are different databases you can blast against. Sinon, you should
read up on the
available search tools and
databases before you attempt to install Blast. NB, throughout this document, I am using protein blast and protein input – changing to nucleotide sequences is trivial as you just change blastp to blastn and ‘prot’ to ‘nt’ in obvious places (and of course you use different queries and target databases).
Without further ado, Blast setup for UNIX.
There are two components for the installation:
- Executables (bastn, blastp etc.)
- Databases. (nr, nt etc.)
Both are described below with follow-up examples of usage.
Ad.1 The executables can be downloaded and compiled from
here (download the source, run ./configure then make and finally make install in the directory of the untarred file). However a much easier way to do it under Ubuntu is:
sudo apt-get install ncbi-blast+
This automatically installs everything. In both cases to check if all went ok, type:
If you get a directory such as /usr/local/bin than all went well and that’s where your executables are.
Ad.2 FIrst, you need to decide on where to store the databases. Do this by setting the environment variable:
export BLASTDB=/path/to/blastdbs/of/your/chosing
Now, we can either use one of the ncbi-curated databases or create our own. We will do both.
A) Downloading and using an ncbi-curated database.
The databases can be downloaded using the
update_blastdb script. As an example I will download a non redundant protein database which is referred to as ‘nr’:
cd $BLASTDB
sudo update_blastdb --passive --timeout 300 --force --verbose nr
ls *.gz |xargs -n1 tar -xzvf
rm *.gz.*
The penultimate command extracts all the files you have downloaded and the last one removes the downloaded archives.
Now you should be able to use your new database by executing (where somesequence.fasta is your sample query):
blastp -db nr -query somesequence.fasta
Done.
B) Creating your own database.
Firstly, put a bunch of fasta protein sequences into a file called sample.fa
Next, execute the following
makeblastdb -in sample.fa -dbtype 'prot' -out NewDb
mv NewDB* $BLASTDB/
We have now created a blast protein database from your fasta file, called NewDB. The last line simply moves all the blast files to the database directory.
Now you should be able to use your new database by executing (where somesequence.fasta is your sample query):
blastp -db NewDb -query somesequence.fasta
Done.
Afterword
These instructions are the shortest way I could find to get a working stand-alone BLAST application. If you require more info, you can look here.
Hey,
I have small comment regarding this seqment:
A) Downloading and using an ncbi-curated database.
The databases can be downloaded using the update_blastdb script. As an example I will download a non redundant protein database which is referred to as ‘nr’:
cd $BLASTDB
sudo update_blastdb –passive –timeout 300 –force –verbose nr
Here you are not runing script that you mentioned above, but you are calling instaled program.
Secodly please remove sudo, because for loading stuff from ftp to local pc you do not need root access! If you want to run script that you dowloaded, you need to add execute privilege to “update_blastdb.pl” file with this command “chmod u+x update_blastdb.pl” and run it with command:
./update_blastdb.pl –passive –timeout 300 –force –verbose nr
Also one one more question. Is it possible to run blast with just nr.00 and nr.01 and not having whole database dowloaded? I tried tu run it, but I got error that he is missing nr.02. Is there a way to tell him that my database is just two nr arhives long?
Thanks for sharing this blog and hoping to get reply soon.
We’ve been using sequenceserver for local blasting – very happy with it.