Troubleshooting
It's recommended to run all modkit
commands with the --log-filepath <path-to-file>
option set. When unexpected outputs are produced inspecting this file will often indicate
the reason.
Missing secondary and supplementary alignments in output
As of v0.2.4 secondary and supplementary alignments are supported in adjust-mods
, update-tags
, call-mods
, and (optionally) in extract
.
However, in order to use these alignment records correctly, the MN
tag must be present and correct in the record.
The MN
tag indicates the length of the sequence corresponding to the MM
and ML
tags.
As of dorado v0.5.0 the MN
tag is output when modified base calls are produced.
If the aligner has hard-clipped the sequence, this number will not match the sequence length and the record cannot be used.
Similarly, if the SEQ field is empty (sequence length zero), the record cannot be used.
One way to use supplementary alignments is to specify the -Y
flag when using dorado or minimap2.
For these programs, when -Y
is specified, the sequence will not be hardclipped in supplementary alignments and will be present in secondary alignments.
Other mapping algorithms that are "MM tag-aware" may allow hard-clipping and update the MM
and ML
tags, modkit
will accept these records as long as the MN
tag indicates the correct sequence length.
No rows in modkit pileup
output.
First, check the logfile, there may be many lines with a variant of
record 905b4cb4-15e2-4060-9c98-866d0aff78bb has un-allowed mode
(ImplicitProbModified), use '--force-allow-implicit' or 'modkit update-tags --mode
ambiguous
As suggested, using update or setting the
--force-allow-implicit
flag should produce output from these records.
If the MM tag contains an un-supported mod-code (a limitation that will be removed in a future release), these errors will be logged. To process the modBAM, first convert the tags.
In general, if there are MM/ML tags in the modBAM and the bedMethyl is still empty the log file will contain a line explaining why each record is being skipped.
Not sampling enough reads to estimate threshold.
If you have previously downsampled the modBAM to a specific region, for example with a command like
samtools view -bh input.sorted.bam chr1
modkit will still try and sample reads evenly across the entire genome but fail to find
any aligned to other contigs. To remedy this, either pass the same --region chr1
option
or set the --sampling-frac 1.0
. The former will set modkit
to only use the region
present in the BAM. For example,
modkit pileup input.bam output.bed --region chr1
# or
modkit pileup input.bam output.bed --sample-region chr1
The latter will sample all of the reads in the BAM, which may slow
down the process, so in general the best advice is to either use the same --region <contig>
when using modkit
or don't downsample the BAM ahead of time (and use --region <contig>
in order to get the desired bedMethyl).
The summary
, sample-probs
, and pileup
subcommands all use the same --region
option.
CG positions are missing from output bedMethyl.
When running with --preset traditional
or --cpg
the resultant bedMethyl file will only
contains CG positions. However, it will not include positions for which the pass coverage
is zero (see the column
descriptions). This is to be
expected.