The following demonstrates the utility of Medaka’s neural network in forming an improved consensus from a pileup of reads.
Results were obtained using the default models provided with
medaka. These models
were trained using data obtained from E.coli, S.cerevisiae and H.sapiens samples.
Error statistics were calculated using the pomoxis program
aligning 100kb chunks of the consensus to the reference. Reported metrics are
median values over all chunks.
Comparison of medaka and nanopolish¶
In this comparison the
medaka E.coli Walkthrough dataset was used.
These data were not used to train the model. Basecalling was performed using
Guppy v2.2.1; both the older transducer and the newer flip-flop algorithm
were used for comparison. Basecalled reads were trimmed using porechop to remove adapters, and assembly was
performed using canu v1.8. The assembly was
corrected using racon v1.3.1 before being passed
nanopolish. nanopolish v0.10.1 was run using
The workflow used here includes four iterations of
racon. This should not be viewed as optimal for all
datasets, see Origin of the draft sequence for further details.
|racon (x4)||medaka||nanopolish||racon (x4)||medaka||nanopolish|
|CPU time / hrs||00:50||00:07||49:10||00:50||00:07||50:24|
For this dataset the older transducer basecaller with
similar results to
nanopolish in a fraction of the time. The flip-flop
workflow is seen to be superior to nanopolish. The runtime of
medaka can be
reduced further by utilizing a GPU, the runtime with a NVIDIA GTX1080Ti is
found to be less than one minute!
A particular advantage of
medaka over other methods is its improved
accuracy in recovering homopolymer lengths.
Above the main plot we show homopolymer frequencies from H.sapiens Chrom. 1, adapted from Statistical analysis of simple repeats in the human genome.
Evaluation across samples and depths¶
The comparison below illustrates results at various coverage depths for a
collection of further organisms. Assemblies were performed as above with
canu and racon, using the
Guppy v3.0.3 high accuracy basecaller and