HaplotypeCall
HapLink.HaplotypeCall
— TypeHaplotypeCall{S,T}
A Haplotype
that has had linkage disequilibrium calculations performed along with metadata to verify or refute its validity. Very similar to a VariationCall
, but for Haplotype
s.
Fields
depth::UInt64
: How many sequencing reads contain all theVariation
s inhaplotype
linkage::Float64
: The unweighted linkage disequilibrium coefficient of this callfrequency::Float64
: How often this haplotype occurs compared to the entire read poolsignificance::Float64
: The $χ^2$ statistical significance ($p$-value) of this callhaplotype::Haplotype{S,T}
: The set ofVariation
s that are inherited together in this call
HapLink.HaplotypeCall
— MethodHaplotypeCall(
haplotype::Haplotype{S,T}, reads::AbstractArray{Haplotype{S,T}}
) where {S<:BioSequence,T<:BioSymbol}
HaplotypeCall(
haplotype::AbstractArray{Variation{S,T}}, reads::AbstractArray{Haplotype{S,T}}
) where {S<:BioSequence,T<:BioSymbol}
Calls haplotypes by calculating the linkage disequilibrium of haplotype
within reads
HapLink.depth
— Methoddepth(hc::HaplotypeCall)
Gets the number of times hc
was found
HapLink.frequency
— Methodfrequency(hc::HaplotypeCall)
Gets the relative number of times hc
was found
HapLink.haplotype
— Methodhaplotype(hc::HaplotypeCall)
Gets the underlying Haplotype
of hc
HapLink.ishaplotype
— Methodfunction ishaplotype(
haplotype::Union{AbstractArray{Variation{S,T}}, Haplotype{S,T}},
reads::AbstractArray{Haplotype{S,T}};
frequency::Union{Float64,Nothing}=nothing,
significance::Union{Float64,Nothing}=nothing,
depth::Union{Int,Nothing}=nothing,
) where {S<:BioSequence,T<:BioSymbol}
Determines if a call of haplotype
is supported by the sequences in reads
based upon the provided keyword criteria.
Arguments
haplotype::Union{AbstractArray{Variation}, Haplotype}
: AVector
ofVariation
s or aHaplotype
to search for as a haplotypereads::AbstractArray{Haplotype}
: The reads to search forhaplotype
in
Keywords
frequency::Union{Float64,Nothing}=nothing
: The minimum number of times the entirehaplotype
must appear withinreads
compared to the number of reads to returntrue
significance::Union{Float64,Nothing}=nothing
: The $χ^2$ significance level ($α$) of linkage disequilibrium that a haplotype must surpass to returntrue
depth::Union{Int,Nothing}=nothing
: The minimum number of times the entirehaplotype
must appear withinreads
to returntrue
Extended help
Linkage disequilibrium ($Δ$) is calculated by
\[Δ = P_{reference} - \prod_i P_{ref,i}\]
where
- $P_{reference}$ is the probability of finding a read that contains only reference (consensus) bases
- $P_{ref,i}$ is the probability of finding a read that contains the reference (consensus) base for variant $i$ within a haplotype
and the $χ^2$ statistic is calculated by
\[χ^2 = \frac{ Δ^2 n } { \left(\prod_i^N {P_{ref,i} \left(1 - P_{ref,i}\right)}\right)^\frac{2}{N} }\]
where
- $N$ is the number of
Variation
s inhaplotype
(i.e.,length(haplotype)
) - $n$ is the number of total reads sampled (i.e.
length(reads)
)
The significance is then calculated from the cumulative $χ^2$ distribution function.
HapLink.linkage
— Methodlinkage(hc::HaplotypeCall)
Gets the unweighted linkage disequilibrium coefficient of hc
HapLink.linkage
— Methodlinkage(counts::AbstractArray{<:Integer, N}) where N
Calculates the linkage disequilibrium of a haplotype given its $N$-dimensional contingency matrix, counts
.
counts
is an $N$-dimensional array where the $N$th dimension represents the $N$th variant call locus within a haplotype. findoccurrences
produces such an array.
Extended help
linkage(::AbstractArray{<:Integer, N})
calculates an unweighted linkage disequilibrium as given by Equation (6) of Slatkin (1972).
\[D(1..N) = E\left( \prod_k^N i_k - P_k \right)\]
where
- $N$ is the number of variant loci
- $D(1..N)$ is the linkage disequilibrium of all $N$ variant loci
- $E$ is an operator returning the arithmetic mean of its argument over every read
- $i_k$ is a function that returns $1$ if the $k$-th locus of the given read contains the reference allele, and returns $0$ otherwise.
- $P_k$ is the probability of any read containing the reference allele in the $k$-th locus, i.e. the frequency at which the reference allele is found within the entire read set at the $k$-th locus
HapLink.occurrence_matrix
— Methodoccurrence_matrix(
haplotype::AbstractArray{Variation{S,T}},
reads::AbstractArray{Haplotype{S,T}},
) where {S<:BioSequence,T<:BioSymbol}
occurrence_matrix(
haplotype::Haplotype{S,T},
reads::AbstractArray{S,T}
) where {S<:BioSequence,T<:BioSymbol}
Determine how many times the variants in haplotype
appear in reads
as an $N$- dimensional matrix.
Arguments
haplotype::AbstractArray{Variation}
: AVector
ofVariation
s or aHaplotype
to search for as a haplotypereads::AbstractArray{Haplotype}
: The reads to search forhaplotype
in
Returns
2x2x... Array{Int, length(haplotype)}
: An $N$-dimensional matrix where $N$ is the number of variant positions inreadmatches
. The $1$ index position in the $n$th dimension represents the number of times the $n$th variant position was found to have the reference base called, while the $2$ index position represents the number of times the $n$th variant position was found to have the alternate base called. E.g.first(occurrence_matrix(reads))
gives the number of times the all-reference base haplotype was found inreads
, whileoccurrence_matrix(reads)[end]
gives the number of times the all-alternate base haplotype was found.
HapLink.significance
— Methodsignificance(counts::AbstractArray{<:Integer})
Calculates the $χ^2$ significance of a haplotype given its $N$-dimensional contingency matrix, counts
See the documentation on linkage(::AbstractArray{<:Integer})
or occurrence_matrix
for details on how counts
should be constructed.
HapLink.significance
— Methodsignificance(hc::HaplotypeCall)
Gets the $χ^2$ statistical significance ($p$-value) of hc
HapLink.sumsliced
— Functionsumsliced(A::AbstractArray, dim::Int, pos::Int=1)
Sum all elements that are that can be referenced by pos
in the dim
dimension of A
.
Example
julia> A = reshape(1:8, 2, 2, 2)
2×2×2 reshape(::UnitRange{Int64}, 2, 2, 2) with eltype Int64:
[:, :, 1] =
1 3
2 4
[:, :, 2] =
5 7
6 8
julia> sumsliced(A, 2)
14
julia> sumsliced(A, 2, 2)
22
Heavily inspired by Holy, Tim "Multidimensional algorithms and iteration" https://julialang.org/blog/2016/02/iteration/#filtering_along_a_specified_dimension_exploiting_multiple_indexes