Installation

Software Dependencies

Here’s what you’ll need to get started. Unless otherwise noted, every program must be available on your PATH in bash (other shells don’t count).

Git v2.7.0 or higher (🌐/🍺/🐍)
Curl v2.41.0 or higher (🌐/🍺/🐍)
Java Runtime v8-v15 (🌐/🍺/🐍)
Nextflow v20.10.0 or higher (🌐/🐍)
One or more of the following container engines
- Docker v20.10.0 or higher (🌐)
- Podman v3.0 or higher (🌐)
- Singularity v3.7 or higher (🌐)
Conda v3.7 or higher (🌐/🍺)
- Full Anaconda and Miniconda both work

Most of these programs can be installed without root privileges in your homedirectory using either Homebrew or Conda. Sources legend:

🌐: Distro/Vendor Repos
🍺: Linuxbrew
🐍: Bioconda/conda-forge

Database Dependencies

You will need a Kraken 2 database and an NCBI BLAST nt database accessible from your compute machine to run the pipeline.

Kraken 2 Database

First, we recommend installing our custom version of kraken2-build, available at https://github.com/ksumngs/kraken2. This custom version of Kraken2 incorporates code from @imichaeldotorg, @karolisr, and our own @millironx that allows for downloading additional libraries, and is more resilient to failure when NCBI changes file locations on their servers. The database generated by our custom build is perfectly compatible with Derrick Wood’s original Kraken2, and vice-versa.

When working with animal samples, we like to use a “complete” unmasked RefSeq Kraken2 database. You can get it with the following, assuming ~/.local/bin is on your PATH (it is on most distros)

#!/bin/bash
KRAKEN2_DIR=$HOME/.local/opt/kraken2
KRAKEN2_DEFAULT_DB=/kraken2/complete-refseq-unmasked

git clone https://github.com/ksumngs/kraken2.git
cd kraken2
./install_kraken2.sh $KRAKEN2_DIR
ln -s $KRAKEN2_DIR/kraken2{,-build,-inspect,lib.pm} $HOME/.local/bin

cd
kraken2-build --download-taxonomy --db $KRAKEN2_DEFAULT_DB
LIBRARIES=(archaea bacteria plasmid viral plant fungi protozoa UniVec_Core plastid mitochondrion invertebrate vertebrate_mammalian vertebrate_other)
for LIB in ${LIBRARIES[@]}; do
  kraken2-build --download-library $LIB --db $KRAKEN2_DEFAULT_DB --no-masking
done

kraken2-build --build --db $KRAKEN2_DEFAULT_DB --threads $(nproc)
kraken2-build --clean --db $KRAKEN2_DEFAULT_DB

For some viruses that don’t have good RefSeq entries (e.g. rotavirus), it is better to use a masked nt Kraken2 database. You will need to have NCBI’s dustmasker in your PATH (typically comes installed with BLAST). In our experience, using multiple threads when when building the nt database tends to slow down the build process and cause it to crash. You can get this database using

#!/bin/bash
KRAKEN2_DIR=$HOME/.local/opt/kraken2
KRAKEN2_DEFAULT_DB=/kraken2/complete-refseq-unmasked

git clone https://github.com/ksumngs/kraken2.git
cd kraken2
./install_kraken2.sh $KRAKEN2_DIR
ln -s $KRAKEN2_DIR/kraken2{,-build,-inspect,lib.pm} $HOME/.local/bin

cd
kraken2-build --download-taxonomy --db $KRAKEN2_DEFAULT_DB
kraken2-build --download-library nt --db $KRAKEN2_DEFAULT_DB

kraken2-build --build --db $KRAKEN2_DEFAULT_DB
kraken2-build --clean --db $KRAKEN2_DEFAULT_DB

NCBI nt BLAST Database

Downloading the BLAST database is technically optional, as its use can be bypassed by passing none to –blast_target parameter.

To download the database, use NCBI’s update_blastdb.pl script, like

#!/bin/bash
cd /blastdb
update_blastdb.pl --decompress nt taxdb

Downloading the taxdb database into the same directory allows BLAST to give the common and scientific names in the BLAST results.

Getting the Pipeline

The recommended way to run it is to pull and run at the same time

nextflow run ksumngs/v-met -latest ...

If you need a specific version, use Nextflow’s -r option with the version tag.

nextflow run ksumngs/v-met -r v0.1.0-alpha ...

If you really want to download the pipeline before use, you can run

nextflow pull ksumngs/v-met

If you want to tweak the code and run it locally, you can clone the repo and run from the code itself.

git clone https://github.com/ksumngs/v-met.git
./v-met/main.nf ...