Read Filtering Commands¶
Read filtering commands can be useful to extract the most out out of a set of reads for modified base detection. Read filtering commands effect only the Tombo index file, and so filters can be cleared or applied iteratively without re-running the re-squiggle command. Five filters are currently made available (genome_locations
, raw_signal_matching
, q_score
, level_coverage
and stuck
).
Hint
Hint: Save a set of filters for later use by copying the Tombo index file: cp path/to/native/rna/.fast5s.RawGenomeCorrected_000.tombo.index save.native.tombo.index
. To re-set to a set of saved filters after applying further filters simply replace the index file: cp save.native.tombo.index path/to/native/rna/.fast5s.RawGenomeCorrected_000.tombo.index
.
tombo filter genome_locations
¶
The tombo filter genome_locations
command filters out reads falling outside of a specified set of --include-regions
. These regions can either be whole chromosomes/sequence records or sub-regions within sequence records.
tombo filter raw_signal_matching
¶
The tombo filter raw_signal_matching
command filters out reads with poor matching between raw observed signal and expected signal levels from the canonical base model. Specify a new threshold to apply with the --signal-matching-score
option. These scores are the mean half z-score (absolute value of z-score) taken over all bases of a read. A reasonable range for this threshold should be approxiamtely between 0.5 and 3. Reads with a larger fraction of modifications may require a larger value to process successfully.
tombo filter q_score
¶
The tombo filter q_score
command filters out reads with poor mean basecalling quality scores. This value can be indicative of low quality reads. Set this value with the --q-score
option.
tombo filter level_coverage
¶
The tombo filter level_coverage
command aims to filter reads to achieve more even read depth across a genome/transcriptome. This may be useful in canonical and alternative model estimation. This filter may also help make test statistics more comparable across the genome.
This filter is applied by randomly selecting reads weighted by the approximate coverage at the mapped location of each read. The number of reads removed from downstream processing is defined by the --percent-to-filter
option.
This filter is likely to be more useful for PCR’ed sample where duplicate locations are more likely to accumulate and cause large spikes in coverage.
tombo filter stuck
¶
The tombo filter stuck
command aims to remove reads where bases tend to get stuck in the pore for longer durations of time. These reads can be indicative of poor quality reads and thus negatively effect modified base detection.
This filter is based on the number of observations per genomic base along a read. The filter can be set on any number of percentiles of obervations per base. Reasonable values depend strongly on the sample type (DNA or RNA). A reasonable filter for DNA reads would be to filter reads with 99th percentile > 200 obs/base or a maximum base with > 5k obs/base. This filter would be set with the --obs-per-base-filter 99:200 100:5000
option. Larger values should be used for RNA reads.
tombo filter clear_filters
¶
The tombo filters clear_filters
command removes any applied filters to this sample (including those applied during the resquiggle
command; though reads that failed before signal to sequence assignment will not be included). New filters can then be applied to this set of reads.
All Tombo sub-commands will respect the filtered reads when parsed for processing.
Hint
Save a set of filters for later use by copying the Tombo index file: cp path/to/native/rna/.fast5s.RawGenomeCorrected_000.tombo.index save.native.tombo.index
. To re-set to a set of saved filters after applying further filters simply replace the index file: cp save.native.tombo.index path/to/native/rna/.fast5s.RawGenomeCorrected_000.tombo.index
.