Migrating to Modkit v0.6.0+ pileup.
Addition of --modified-bases is the biggest change in Modkit v0.6.0+.
The option --modified-bases 5mC 5hmC or --modified-bases 6mA is roughly equivalent to using --motif C 0 or --motif A 0, respectively, to capture base modification calls only a reference bases for which you are interested in their modification status.
Base modifications at multiple primary bases can be emitted by adding more options to the command line, such as --modified-bases 5mC 5hmC 6mA.
Removal of --preset traditional
Modkit versions v0.6.0 and v0.6.1 do not have the --preset traditional option.
To produce output that is comparable to technologies where 5hmC will be reported as 5mC using best practices provide the following options:
$ modkit pileup $bam $bed --ref $ref
--modified-bases 5mC 5hmC \ # or --modified-bases C
--combine-mods \
--cpg \
--combine-strands
Keep in mind that this output is not exactly equivalent to what --preset traditional would produce previously, see point (1) below for more details.
Changes to pileup algorithm
--ignoreoption has been removed. This option was often a source of confusion and has been removed in favor of the following alternatives:
- Use
--modified-basesto emit records for only the base modifications you’re interested in. For example, a direct RNA run may have many base modifications on each of the primary sequence bases. If you’re only interested in m6A use--modified-bases m6Aand only get records with this base modification. - Use
--combine-modswhen you want to know if a base is “modified vs unmodified”. This can be necessary when comparing to results where all base modifications may be marked as a single base modification.
- Use
--phasedinstead of--partition-tags. The--phasedflag is a more performant version of--partition-tags HP. Instead of an “ungrouped” bedMethyl table, thecombined.bedmethyl[.gz]table contains counts tabulated from all records. - The
--bedgraphflag has been removed. The recommended way to create a “trace”-like output file is to make a bedMethyl and either pipe it directly intomodkit bedmethyl tobigwigor run thetobigwigcommand following bedMethyl creation. - Duplicate read names are no longer logged.
--max-depthremoved. Modkit v0.6.0 has a new algorithm that doesn’t usemax-depth. If you have a modBAM with very high depth and you don’t want to tabulate counts for this depth it is currently recommended to subsample the modBAM before using Modkit.--edge-filter-ed base modification counts are no longer tabulated in Nnocall, they will be completely ignored from tabulation. If this breaks or changes your consumption of these data, please open a GitHub issue.--modified-basesemploys a simpler threshold evaluation algorithm. Prior to v0.6.0, the threshold evaluation was to take the most likely passing base modification call. This lead to confusion about how to “tune” this parameter to tweak results. When using--modified-bases(strongly recommended) the algorithm is now simply to use the most likely call (the one with the highest probability) and determine if it is pass or fail by whether or not it is less than the threshold for that base or modification. In almost all cases this simpler algorithm produces the same results and is what the user expected in the first place.
Limitations
- As of v.0.6.0
--modified-basesdoes not support duplex base modification calls. To enable usage of base modification calls on duplex reads, pass the--duplexflag. This limitation will be removed in the future. --combine-strandswhen using multiple palindromic motifs is no longer implemented. If this is a crucial step in your workflow, please open a GitHub issue. The new workflow is to run with a single motif and--combine-strandsfor each motif. Modkit v0.6.0 is substantially quicker and lighter on resource requirements such that running multiple times will likely be quicker than the older versions.--invert-edge-filterremoved. This option is still available inmodkit modbam adjust-mods.- In Modkit versions v0.6.0 and v0.6.1
pileupwill saturate depth at 65,535 (maximum for a unsigned 16-bit integer). If your modBAM has a depth greater than this value, it is recommended to use the--high-depthflag so that 65,535 reads will be used at each genomic position. To achieve greater depth, divide the modBAM into subsets with depth <= 65,535, runpileupon each, thenmodbam merge. This limitation will be removed in the future when it can be proven not to have any performance or resource regression.