HaplotypeCall
HapLink.HaplotypeCall — TypeHaplotypeCall{S,T}A Haplotype that has had linkage disequilibrium calculations performed along with metadata to verify or refute its validity. Very similar to a VariationCall, but for Haplotypes.
Fields
depth::UInt64: How many sequencing reads contain all theVariations inhaplotypelinkage::Float64: The unweighted linkage disequilibrium coefficient of this callfrequency::Float64: How often this haplotype occurs compared to the entire read poolsignificance::Float64: The $χ^2$ statistical significance ($p$-value) of this callhaplotype::Haplotype{S,T}: The set ofVariations that are inherited together in this call
HapLink.HaplotypeCall — MethodHaplotypeCall(
haplotype::Haplotype{S,T}, reads::AbstractArray{Haplotype{S,T}}
) where {S<:BioSequence,T<:BioSymbol}
HaplotypeCall(
haplotype::AbstractArray{Variation{S,T}}, reads::AbstractArray{Haplotype{S,T}}
) where {S<:BioSequence,T<:BioSymbol}Calls haplotypes by calculating the linkage disequilibrium of haplotype within reads
HapLink.depth — Methoddepth(hc::HaplotypeCall)Gets the number of times hc was found
HapLink.frequency — Methodfrequency(hc::HaplotypeCall)Gets the relative number of times hc was found
HapLink.haplotype — Methodhaplotype(hc::HaplotypeCall)Gets the underlying Haplotype of hc
HapLink.ishaplotype — Methodfunction ishaplotype(
haplotype::Union{AbstractArray{Variation{S,T}}, Haplotype{S,T}},
reads::AbstractArray{Haplotype{S,T}};
frequency::Union{Float64,Nothing}=nothing,
significance::Union{Float64,Nothing}=nothing,
depth::Union{Int,Nothing}=nothing,
) where {S<:BioSequence,T<:BioSymbol}Determines if a call of haplotype is supported by the sequences in reads based upon the provided keyword criteria.
Arguments
haplotype::Union{AbstractArray{Variation}, Haplotype}: AVectorofVariations or aHaplotypeto search for as a haplotypereads::AbstractArray{Haplotype}: The reads to search forhaplotypein
Keywords
frequency::Union{Float64,Nothing}=nothing: The minimum number of times the entirehaplotypemust appear withinreadscompared to the number of reads to returntruesignificance::Union{Float64,Nothing}=nothing: The $χ^2$ significance level ($α$) of linkage disequilibrium that a haplotype must surpass to returntruedepth::Union{Int,Nothing}=nothing: The minimum number of times the entirehaplotypemust appear withinreadsto returntrue
Extended help
Linkage disequilibrium ($Δ$) is calculated by
\[Δ = P_{reference} - \prod_i P_{ref,i}\]
where
- $P_{reference}$ is the probability of finding a read that contains only reference (consensus) bases
- $P_{ref,i}$ is the probability of finding a read that contains the reference (consensus) base for variant $i$ within a haplotype
and the $χ^2$ statistic is calculated by
\[χ^2 = \frac{ Δ^2 n } { \left(\prod_i^N {P_{ref,i} \left(1 - P_{ref,i}\right)}\right)^\frac{2}{N} }\]
where
- $N$ is the number of
Variations inhaplotype(i.e.,length(haplotype)) - $n$ is the number of total reads sampled (i.e.
length(reads))
The significance is then calculated from the cumulative $χ^2$ distribution function.
HapLink.linkage — Methodlinkage(hc::HaplotypeCall)Gets the unweighted linkage disequilibrium coefficient of hc
HapLink.linkage — Methodlinkage(counts::AbstractArray{<:Integer, N}) where NCalculates the linkage disequilibrium of a haplotype given its $N$-dimensional contingency matrix, counts.
counts is an $N$-dimensional array where the $N$th dimension represents the $N$th variant call locus within a haplotype. findoccurrences produces such an array.
Extended help
linkage(::AbstractArray{<:Integer, N}) calculates an unweighted linkage disequilibrium as given by Equation (6) of Slatkin (1972).
\[D(1..N) = E\left( \prod_k^N i_k - P_k \right)\]
where
- $N$ is the number of variant loci
- $D(1..N)$ is the linkage disequilibrium of all $N$ variant loci
- $E$ is an operator returning the arithmetic mean of its argument over every read
- $i_k$ is a function that returns $1$ if the $k$-th locus of the given read contains the reference allele, and returns $0$ otherwise.
- $P_k$ is the probability of any read containing the reference allele in the $k$-th locus, i.e. the frequency at which the reference allele is found within the entire read set at the $k$-th locus
HapLink.occurrence_matrix — Methodoccurrence_matrix(
haplotype::AbstractArray{Variation{S,T}},
reads::AbstractArray{Haplotype{S,T}},
) where {S<:BioSequence,T<:BioSymbol}
occurrence_matrix(
haplotype::Haplotype{S,T},
reads::AbstractArray{S,T}
) where {S<:BioSequence,T<:BioSymbol}Determine how many times the variants in haplotype appear in reads as an $N$- dimensional matrix.
Arguments
haplotype::AbstractArray{Variation}: AVectorofVariations or aHaplotypeto search for as a haplotypereads::AbstractArray{Haplotype}: The reads to search forhaplotypein
Returns
2x2x... Array{Int, length(haplotype)}: An $N$-dimensional matrix where $N$ is the number of variant positions inreadmatches. The $1$ index position in the $n$th dimension represents the number of times the $n$th variant position was found to have the reference base called, while the $2$ index position represents the number of times the $n$th variant position was found to have the alternate base called. E.g.first(occurrence_matrix(reads))gives the number of times the all-reference base haplotype was found inreads, whileoccurrence_matrix(reads)[end]gives the number of times the all-alternate base haplotype was found.
HapLink.significance — Methodsignificance(counts::AbstractArray{<:Integer})Calculates the $χ^2$ significance of a haplotype given its $N$-dimensional contingency matrix, counts
See the documentation on linkage(::AbstractArray{<:Integer}) or occurrence_matrix for details on how counts should be constructed.
HapLink.significance — Methodsignificance(hc::HaplotypeCall)Gets the $χ^2$ statistical significance ($p$-value) of hc
HapLink.sumsliced — Functionsumsliced(A::AbstractArray, dim::Int, pos::Int=1)Sum all elements that are that can be referenced by pos in the dim dimension of A.
Example
julia> A = reshape(1:8, 2, 2, 2)
2×2×2 reshape(::UnitRange{Int64}, 2, 2, 2) with eltype Int64:
[:, :, 1] =
1 3
2 4
[:, :, 2] =
5 7
6 8
julia> sumsliced(A, 2)
14
julia> sumsliced(A, 2, 2)
22Heavily inspired by Holy, Tim "Multidimensional algorithms and iteration" https://julialang.org/blog/2016/02/iteration/#filtering_along_a_specified_dimension_exploiting_multiple_indexes