Installation
Software Dependencies
Here’s what you’ll need to get started. Unless otherwise noted, every program
must be available on your PATH
in bash (other shells don’t count).
Git v2.7.0 or higher (🌐/🍺/🐍)
Curl v2.41.0 or higher (🌐/🍺/🐍)
Java Runtime v8-v15 (🌐/🍺/🐍)
Nextflow v20.10.0 or higher (🌐/🐍)
- One or more of the following container engines
Docker v20.10.0 or higher (🌐)
Podman v3.0 or higher (🌐)
Singularity v3.7 or higher (🌐)
- Conda v3.7 or higher (🌐/🍺)
Full Anaconda and Miniconda both work
Most of these programs can be installed without root privileges in your homedirectory using either Homebrew or Conda. Sources legend:
🌐: Distro/Vendor Repos
🍺: Linuxbrew
🐍: Bioconda/conda-forge
Database Dependencies
You will need a Kraken 2 database and an NCBI BLAST nt database accessible from your compute machine to run the pipeline.
Kraken 2 Database
First, we recommend installing our custom version of kraken2-build, available at https://github.com/ksumngs/kraken2. This custom version of Kraken2 incorporates code from @imichaeldotorg, @karolisr, and our own @millironx that allows for downloading additional libraries, and is more resilient to failure when NCBI changes file locations on their servers. The database generated by our custom build is perfectly compatible with Derrick Wood’s original Kraken2, and vice-versa.
When working with animal samples, we like to use a “complete” unmasked RefSeq
Kraken2 database. You can get it with the following, assuming ~/.local/bin
is on your PATH
(it is on most distros)
#!/bin/bash
KRAKEN2_DIR=$HOME/.local/opt/kraken2
KRAKEN2_DEFAULT_DB=/kraken2/complete-refseq-unmasked
git clone https://github.com/ksumngs/kraken2.git
cd kraken2
./install_kraken2.sh $KRAKEN2_DIR
ln -s $KRAKEN2_DIR/kraken2{,-build,-inspect,lib.pm} $HOME/.local/bin
cd
kraken2-build --download-taxonomy --db $KRAKEN2_DEFAULT_DB
LIBRARIES=(archaea bacteria plasmid viral plant fungi protozoa UniVec_Core plastid mitochondrion invertebrate vertebrate_mammalian vertebrate_other)
for LIB in ${LIBRARIES[@]}; do
kraken2-build --download-library $LIB --db $KRAKEN2_DEFAULT_DB --no-masking
done
kraken2-build --build --db $KRAKEN2_DEFAULT_DB --threads $(nproc)
kraken2-build --clean --db $KRAKEN2_DEFAULT_DB
For some viruses that don’t have good RefSeq entries (e.g. rotavirus), it is
better to use a masked nt Kraken2 database. You will need to have NCBI’s
dustmasker
in your PATH
(typically comes installed with BLAST). In our experience,
using multiple threads when when building the nt database tends to slow down the
build process and cause it to crash. You can get this database using
#!/bin/bash
KRAKEN2_DIR=$HOME/.local/opt/kraken2
KRAKEN2_DEFAULT_DB=/kraken2/complete-refseq-unmasked
git clone https://github.com/ksumngs/kraken2.git
cd kraken2
./install_kraken2.sh $KRAKEN2_DIR
ln -s $KRAKEN2_DIR/kraken2{,-build,-inspect,lib.pm} $HOME/.local/bin
cd
kraken2-build --download-taxonomy --db $KRAKEN2_DEFAULT_DB
kraken2-build --download-library nt --db $KRAKEN2_DEFAULT_DB
kraken2-build --build --db $KRAKEN2_DEFAULT_DB
kraken2-build --clean --db $KRAKEN2_DEFAULT_DB
NCBI nt BLAST Database
Downloading the BLAST database is technically optional, as its use can be
bypassed by passing none
to –blast_target parameter.
To download the database, use NCBI’s update_blastdb.pl script, like
#!/bin/bash
cd /blastdb
update_blastdb.pl --decompress nt taxdb
Downloading the taxdb
database into the same directory allows BLAST to give
the common and scientific names in the BLAST results.
Getting the Pipeline
The recommended way to run it is to pull and run at the same time
nextflow run ksumngs/v-met -latest ...
If you need a specific version, use Nextflow’s -r
option with the version
tag.
nextflow run ksumngs/v-met -r v0.1.0-alpha ...
If you really want to download the pipeline before use, you can run
nextflow pull ksumngs/v-met
If you want to tweak the code and run it locally, you can clone the repo and run from the code itself.
git clone https://github.com/ksumngs/v-met.git
./v-met/main.nf ...