haplink haplotypes
HapLink.haplotypes
— Functionhaplink haplotypes [options] reference variants bam
Call haplotypes
Introduction
Calls haplotypes based on the linkage disequilibrium between subconsensus variant sites on long reads. Variant sites are chosen based on having a "PASS" filter in the variants
file, and linkage is calculated based on the reads present in the bam
file. Note this means that haplotypes can be called on a different set of sequences than variants were (e.g. variant calling using high accuracy short-read chemistry like Illumina and haplotype calling using low accuracy long-read chemistry like Oxford Nanopore). There are no guarantees that the variants
file and bam
file match, so use this feature with caution!
Arguments
reference
: path to the reference genome to call haplotypes against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: ifreference
includes more than one sequence, all but the first will be ignored.variants
: path to the variants file that will define variant sites to call haplotypes from. Must be in VCF (not BCF) v4 format.haplink variants
generates a compatible file, although output from other tools can also be used.bam
: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.
Flags
--simulated-reads
: Use maximum likelihood simulation of long reads based on overlapping short reads
Options
--outfile=<path>
: The file to write haplotype calls to. If left blank, haplotype calls are written to standard output.--consensus-frequency=<float>
: The minimum frequency at which a variant must appear to be considered part of the consensus.--significance=<float>
: The alpha value for statistical significance of haplotype calls.--depth=<int>
: Minimum number of times a variant combination must be observed within the set of reads to be called a haplotype--frequency=<float>
: The minimum proportion of reads that a variant combination must be observed within compared to all reads covering its position for that haplotype to be called--overlap-min=<int>
: The minimum number of bases that must overlap for two short reads to be combined into one simulated read. Can be negative to indicate a minimum distance between reads. Only applies when--simulated-reads
is set.--overlap-max=<int>
: The maximum number of bases that may overlap for two short reads to be combined into one simulated read. Can be negative to indicate a cap on how far two reads must be apart from one another. Must be greater than--overlap-min
. Only applies when--simulated-reads
is set.--iterations=<int>
: The number of simulated reads to create before calling haplotypes. Only applies when--simulated-reads
is set.--seed=<int>
: The random seed used for picking short reads to create simulated reads when using maximum likelihood methods. Leaving unset will use the default Julia RNG and seed. See Julia's documentation on randomness for implementation details. Only applies when--simulated-reads
is set.