For advanced beginners (REPL mode)
Julia is an ahead-of-time compiled language. Practically, that means that every time you restart Julia, you have to recompile all the code you were running. Using HapLink on the command line involves up to four different commands. Translation: up to four cases where you lose time to recompiling code that was just running. Surely there's a better way, right? Well, you can stay within a single Julia session by using HapLink's REPL mode.
Julia's latency (aka, Time-to-first-plot or TTFP) is a big deal among Julia programmers. Although there's no "definitive" place to learn about TTFP, Jakob Nissen's blog provides some great explanations and actionable advice for reducing latency.
Installing HapLink ... into a project
Wait, isn't that what the first tutorial was about? Yes and no. Those projects installed HapLink's command line interface to a global environment. We're going to locally install HapLink as a library into a local package environment.
If you want to learn more about package environments, check out the official Julia environments documentation or Julius Krumbiegel's tutorial.
Press ]
to enter Pkg mode, then follow along with these prompts.
(@v1.6) pkg> activate .
Activating project at `~/haplink-tutorial-4`
(haplink-tutorial-4) pkg> add HapLink
Finding the example files
Instead of downloading the example files from the internet, we can reference them directly from the package.
julia> using HapLink
julia> const PACKAGE_DIR = dirname(dirname(pathof(HapLink)))
"/home/runner/work/HapLink.jl/HapLink.jl"
julia> const EXAMPLE_DIR = joinpath(PACKAGE_DIR, "example")
"/home/runner/work/HapLink.jl/HapLink.jl/example"
julia> const EXAMPLE_BAM = joinpath(EXAMPLE_DIR, "sample.bam")
"/home/runner/work/HapLink.jl/HapLink.jl/example/sample.bam"
julia> const EXAMPLE_REFERENCE = joinpath(EXAMPLE_DIR, "reference.fasta")
"/home/runner/work/HapLink.jl/HapLink.jl/example/reference.fasta"
Perfect!
Pileup time
Variants can't exist as a single unit, they have to be supported by a set of reads. We can view those reads in a pileup, which can easily be made by using the pileup
function.
julia> sample_pileup = pileup(EXAMPLE_BAM, EXAMPLE_REFERENCE)
21-element Vector{VariationPileup}: VariationPileup(C13G, 0x0000000000000002, [0.27586206896551724, 0.10344827586206896], [30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS]) VariationPileup(C27G, 0x0000000000000005, [0.7586206896551724, 0.5862068965517241, 0.41379310344827586, 0.2413793103448276, 0.06896551724137931], [30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C37G, 0x0000000000000006, [0.9310344827586207, 0.7586206896551724, 0.5862068965517241, 0.41379310344827586, 0.2413793103448276, 0.06896551724137931], [30.0, 30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C40G, 0x0000000000000006, [1.0, 0.8620689655172413, 0.6896551724137931, 0.5172413793103449, 0.3448275862068966, 0.1724137931034483], [30.0, 30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C15G, 0x0000000000000002, [0.3448275862068966, 0.1724137931034483], [30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS]) VariationPileup(T29A, 0x0000000000000005, [0.4827586206896552, 0.3103448275862069, 0.13793103448275862], [63.0, 63.0, 63.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C52G, 0x0000000000000006, [0.9310344827586207, 0.7586206896551724, 0.5862068965517241, 0.41379310344827586, 0.2413793103448276, 0.06896551724137931], [30.0, 30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C64G, 0x0000000000000004, [1.0, 0.6551724137931034, 0.4482758620689655], [30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C32G, 0x0000000000000006, [0.9310344827586207, 0.7586206896551724, 0.5862068965517241, 0.41379310344827586, 0.2413793103448276, 0.06896551724137931], [30.0, 30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C45G, 0x0000000000000006, [1.0, 0.8620689655172413, 0.6896551724137931, 0.5172413793103449, 0.3448275862068966, 0.1724137931034483], [30.0, 30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) ⋮ VariationPileup(C17G, 0x0000000000000003, [0.41379310344827586, 0.2413793103448276, 0.06896551724137931], [30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(A34T, 0x0000000000000006, [1.0, 0.6551724137931034, 0.3103448275862069], [20.0, 20.0, 20.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C50G, 0x0000000000000006, [1.0, 0.8620689655172413, 0.6896551724137931, 0.5172413793103449, 0.3448275862068966, 0.1724137931034483], [30.0, 30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(Δ63-63, 0x0000000000000004, [0.41379310344827586], [30.0], GenomicFeatures.Strand[STRAND_POS]) VariationPileup(T48A, 0x0000000000000006, [0.9655172413793104, 0.7931034482758621, 0.6206896551724138], [63.0, 63.0, 63.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(C42G, 0x0000000000000006, [0.9310344827586207, 0.7586206896551724, 0.5862068965517241, 0.41379310344827586, 0.2413793103448276, 0.06896551724137931], [30.0, 30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS]) VariationPileup(Δ62-64, 0x0000000000000004, [0.7241379310344828], [30.0], GenomicFeatures.Strand[STRAND_POS]) VariationPileup(C73G, 0x0000000000000002, [0.9655172413793104, 0.7586206896551724], [30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS]) VariationPileup(C26G, 0x0000000000000005, [0.7241379310344828, 0.5517241379310345, 0.3793103448275862, 0.20689655172413793, 0.0], [30.0, 30.0, 30.0, 30.0, 30.0], GenomicFeatures.Strand[STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS, STRAND_POS])
But is it real?
Pileups tell us what the sequences are saying, but they can't tell us if there are any real variants. We can use the call_variant
function to filter a pileup into a set of variants.
julia> all_variant_calls = VariationCall[]
VariationCall[]
julia> for pile in sample_pileup push!(all_variant_calls, call_variant(pile, 0.5; D=0x02, Q=10.0, X=0.01, F=0.05)) end
julia> all_variant_calls
21-element Vector{VariationCall}: VariationCall(C13G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.1896551724137931, 0.3333333333333333) VariationCall(C27G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.4137931034482758, 0.007936507936507936) VariationCall(C37G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C40G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(C15G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.25862068965517243, 0.3333333333333333) VariationCall(T29A, 63.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000003, 0.3103448275862069, 0.05303030303030305) VariationCall(C52G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C64G, 30.0, ["PASS"], 0x0000000000000004, 1.0, 0x0000000000000003, 0.7011494252873564, 0.07936507936507946) VariationCall(C32G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C45G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) ⋮ VariationCall(C17G, 30.0, ["PASS"], 0x0000000000000003, 1.0, 0x0000000000000003, 0.2413793103448276, 0.10000000000000005) VariationCall(A34T, 20.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000003, 0.6551724137931034, 0.033566433566433566) VariationCall(C50G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(Δ63-63, 30.0, ["d2"], 0x0000000000000004, 1.0, 0x0000000000000001, 0.41379310344827586, 0.21212121212121218) VariationCall(T48A, 63.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000003, 0.7931034482758621, 0.033566433566433566) VariationCall(C42G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(Δ62-64, 30.0, ["d2"], 0x0000000000000004, 1.0, 0x0000000000000001, 0.7241379310344828, 0.21212121212121218) VariationCall(C73G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.8620689655172413, 0.3333333333333333) VariationCall(C26G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.3724137931034483, 0.007936507936507936)
We can now remove all variants that did not pass our filters.
julia> filter!(var -> filters(var) == ["PASS"], all_variant_calls)
19-element Vector{VariationCall}: VariationCall(C13G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.1896551724137931, 0.3333333333333333) VariationCall(C27G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.4137931034482758, 0.007936507936507936) VariationCall(C37G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C40G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(C15G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.25862068965517243, 0.3333333333333333) VariationCall(T29A, 63.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000003, 0.3103448275862069, 0.05303030303030305) VariationCall(C52G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C64G, 30.0, ["PASS"], 0x0000000000000004, 1.0, 0x0000000000000003, 0.7011494252873564, 0.07936507936507946) VariationCall(C32G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C45G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(C59G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.6551724137931034, 0.007936507936507936) VariationCall(C54G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5689655172413792, 0.0021645021645021662) VariationCall(C17G, 30.0, ["PASS"], 0x0000000000000003, 1.0, 0x0000000000000003, 0.2413793103448276, 0.10000000000000005) VariationCall(A34T, 20.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000003, 0.6551724137931034, 0.033566433566433566) VariationCall(C50G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(T48A, 63.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000003, 0.7931034482758621, 0.033566433566433566) VariationCall(C42G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C73G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.8620689655172413, 0.3333333333333333) VariationCall(C26G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.3724137931034483, 0.007936507936507936)
Rule of the majority
Filtering variants
We can sort into consensus and subconsensus variants. Note that for this fake set, consensus variants will be taken at a >75% frequency.
julia> consensus_variant_calls = filter(var -> frequency(var) > 0.75, all_variant_calls)
15-element Vector{VariationCall}: VariationCall(C13G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.1896551724137931, 0.3333333333333333) VariationCall(C27G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.4137931034482758, 0.007936507936507936) VariationCall(C37G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C40G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(C15G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.25862068965517243, 0.3333333333333333) VariationCall(C52G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C32G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C45G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(C59G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.6551724137931034, 0.007936507936507936) VariationCall(C54G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5689655172413792, 0.0021645021645021662) VariationCall(C17G, 30.0, ["PASS"], 0x0000000000000003, 1.0, 0x0000000000000003, 0.2413793103448276, 0.10000000000000005) VariationCall(C50G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5977011494252874, 0.0021645021645021662) VariationCall(C42G, 30.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000006, 0.5, 0.0021645021645021662) VariationCall(C73G, 30.0, ["PASS"], 0x0000000000000002, 1.0, 0x0000000000000002, 0.8620689655172413, 0.3333333333333333) VariationCall(C26G, 30.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000005, 0.3724137931034483, 0.007936507936507936)
julia> subconsensus_variant_calls = filter( var -> !(var in consensus_variant_calls), all_variant_calls )
4-element Vector{VariationCall}: VariationCall(T29A, 63.0, ["PASS"], 0x0000000000000005, 1.0, 0x0000000000000003, 0.3103448275862069, 0.05303030303030305) VariationCall(C64G, 30.0, ["PASS"], 0x0000000000000004, 1.0, 0x0000000000000003, 0.7011494252873564, 0.07936507936507946) VariationCall(A34T, 20.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000003, 0.6551724137931034, 0.033566433566433566) VariationCall(T48A, 63.0, ["PASS"], 0x0000000000000006, 1.0, 0x0000000000000003, 0.7931034482758621, 0.033566433566433566)
julia> consensus_variations = variation.(consensus_variant_calls)
15-element Vector{SequenceVariation.Variation{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA}}: C13G C27G C37G C40G C15G C52G C32G C45G C59G C54G C17G C50G C42G C73G C26G
julia> subconsensus_variations = variation.(subconsensus_variant_calls)
4-element Vector{SequenceVariation.Variation{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA}}: T29A C64G A34T T48A
Creating the consensus sequence
We will now need to convert the consensus variations into a consensus sequence. HapLink doesn't have these tools in-and-of itself, so we'll have to reach into its dependency packages.
(haplink-tutorial-4) pkg> add BioAlignments SequenceVariation
Resolving package versions...
Updating `~/haplink-tutorial-4/Project.toml`
[00701ae9] + BioAlignments v3.1.0
[eef6e190] + SequenceVariation v0.2.2
No Changes to `~/haplink-tutorial-4/Manifest.toml`
julia> using BioAlignments, SequenceVariation
julia> reference_sequence = reference(first(consensus_variations))
80nt DNA Sequence: ACAACTTTATCTCTCTCAACTTCTTCCCTTACTATCCTTCACAACAATCCACACATTACTGCACTTTAAACACTTTTTTA
julia> consensus_haplotype = Haplotype(reference_sequence, consensus_variations)
SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 15 edits: C13G C15G C17G C26G C27G C32G C37G C40G C42G C45G C50G C52G C54G C59G C73G
julia> consensus_sequence = reconstruct(consensus_haplotype)
80nt DNA Sequence: ACAACTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGTGCACTTTAAACAGTTTTTTA
julia> consensus_alignment = PairwiseAlignment( AlignedSequence(consensus_sequence, Alignment(HapLink.cigar(consensus_haplotype))), reference_sequence, )
BioAlignments.PairwiseAlignment{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}}: seq: 1 ACAACTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGT 60 |||||||||||| | | |||||||| |||| |||| || | || |||| | | |||| | ref: 1 ACAACTTTATCTCTCTCAACTTCTTCCCTTACTATCCTTCACAACAATCCACACATTACT 60 seq: 61 GCACTTTAAACAGTTTTTTA 80 |||||||||||| ||||||| ref: 61 GCACTTTAAACACTTTTTTA 80
Minority report
We are interested in looking at the subconsensus variants, but they need to be reoriented to be put in terms of the consensus sequence.
julia> map!( var -> translate(var, consensus_alignment), subconsensus_variations, subconsensus_variations, )
4-element Vector{SequenceVariation.Variation{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA}}: T29A C64G A34T T48A
Bringing in the reads
Now that we have a consensus sequence, we can properly import the reads for haplotype calling into HapLink's specialized Pseudoread
class. There is a convenient pseudoreads
function that can directly convert a BAM file for us.
julia> haplotyping_pseudoreads = pseudoreads(EXAMPLE_BAM, consensus_sequence)
10-element Vector{Pseudoread}: Pseudoread(0x0000000000000006, 0x0000000000000023, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: A34T) Pseudoread(0x000000000000000b, 0x0000000000000028, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 0 edit:) Pseudoread(0x0000000000000010, 0x000000000000002d, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 2 edits: T29A A34T) Pseudoread(0x0000000000000015, 0x0000000000000032, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 2 edits: T29A T48A) Pseudoread(0x000000000000001a, 0x0000000000000037, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 3 edits: T29A A34T T48A) Pseudoread(0x000000000000001f, 0x000000000000003c, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: T48A) Pseudoread(0x0000000000000024, 0x0000000000000041, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G) Pseudoread(0x0000000000000029, 0x0000000000000046, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: Δ62-64) Pseudoread(0x000000000000002e, 0x000000000000004b, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G) Pseudoread(0x0000000000000033, 0x0000000000000050, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 2 edits: Δ63-63 C64G)
Cutting the chaff
We only need reads that span every one of our subconsensus variants, so let's get rid of every other read.
(haplink-tutorial-4) pkg> add BioGenerics
Resolving package versions...
Updating `~/haplink-tutorial-4/Project.toml`
[47718e42] + BioGenerics v0.1.2
No Changes to `~/haplink-tutorial-4/Manifest.toml`
julia> using BioGenerics
julia> filter!( var -> leftposition(var) >= leftposition(first(subconsensus_variations)), haplotyping_pseudoreads, )
5-element Vector{Pseudoread}: Pseudoread(0x000000000000001f, 0x000000000000003c, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: T48A) Pseudoread(0x0000000000000024, 0x0000000000000041, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G) Pseudoread(0x0000000000000029, 0x0000000000000046, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: Δ62-64) Pseudoread(0x000000000000002e, 0x000000000000004b, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G) Pseudoread(0x0000000000000033, 0x0000000000000050, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 2 edits: Δ63-63 C64G)
julia> filter!( var -> rightposition(var) >= rightposition(first(subconsensus_variations)), haplotyping_pseudoreads, )
5-element Vector{Pseudoread}: Pseudoread(0x000000000000001f, 0x000000000000003c, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: T48A) Pseudoread(0x0000000000000024, 0x0000000000000041, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G) Pseudoread(0x0000000000000029, 0x0000000000000046, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: Δ62-64) Pseudoread(0x000000000000002e, 0x000000000000004b, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G) Pseudoread(0x0000000000000033, 0x0000000000000050, SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 2 edits: Δ63-63 C64G)
Finding haplotypes...
This one is a bit of a mind-bender, but bear with me.
julia> haplotyping_haploypes = haplotype.(haplotyping_pseudoreads)
5-element Vector{SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA}}: SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: T48A SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: Δ62-64 SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 1 edit: C64G SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA} with 2 edits: Δ63-63 C64G
julia> valid_haplotypes = findset( subconsensus_variations, hap -> ishaplotype(hap, haplotyping_haploypes) )
ERROR: MethodError: no method matching HaplotypeCall(::UInt8, ::Float64, ::Float64, ::Float64, ::SequenceVariation.Haplotype{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, BioSymbols.DNA}) Closest candidates are: HaplotypeCall(::UInt64, ::Float64, ::Float64, ::Float64, ::SequenceVariation.Haplotype{S, T}) where {S<:BioSequences.BioSequence, T<:BioSymbols.BioSymbol} @ HapLink ~/work/HapLink.jl/HapLink.jl/src/haplotypecalling.jl:18
...and deciding if they're real
Just because a haplotype could exist, doesn't mean it should. You can check the statistics of a haplotype by converting it into a HaplotypeCall
.
julia> map(hap -> HaplotypeCall(hap, haplotyping_haploypes), valid_haplotypes);
ERROR: UndefVarError: `valid_haplotypes` not defined
julia> hc = first(ans)
ERROR: MethodError: no method matching iterate(::UndefVarError) Closest candidates are: iterate(::Union{LinRange, StepRangeLen}) @ Base range.jl:880 iterate(::Union{LinRange, StepRangeLen}, ::Integer) @ Base range.jl:880 iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} @ Base dict.jl:698 ...
julia> depth(hc)
ERROR: UndefVarError: `hc` not defined
julia> linkage(hc)
ERROR: UndefVarError: `hc` not defined
julia> significance(hc)
ERROR: UndefVarError: `hc` not defined
Nope! This one sure isn't!
This tutorial only scratches the surface of what's possible with HapLink's REPL mode. Take a look at the full API reference for more details on how to tweak your haplotype calling. Have fun!