Merge multiple bedMethyl files

If you generate bedMethyl tables from multiple samples and want to perform a "join" where the counts summed modkit bedmethyl merge will perform an outer join for you. An example command is below:

modkit bedmethyl merge sample_a.bed.gz sample_b.bed.gz \
  -o samples_a_and_b.bed \
  -g genome_sizes.tsv # could be reference.fa.fai

The input bedMethyl files must be bgzipped and tabix indexed.

bgzip sample_a.bed
tabix sample_a.bed.gz

bgzip sample_b.bed
tabix sample_b.bed.gz

The genome_sizes.tsv is a tab-separated file containing the lengths of the contigs in the reference sequence, you can also use a .fai file generated by samtools faidx.

For example taken from bedtools:

The schema should be <chromName><TAB><chromSize>.

For example, Human (hg19):

chr1    249250621
chr2    243199373

Convert bedMethyl to bigWig

You may want to convert a bedMethyl to bigWig for easier visualization on a genome browser. Modkit provides a bedmethyl tobigwig function that can extract a bigWig track from a bedMethyl. The bigWig format is more constrained than bedMethyl in that it can only display a single stream of data (floats). So you can't have a bigWig with 5hmC and 5mC values together, you need to make separate tracks. Here are some annotated examples:

# you can run pileup and stream the output through the new command '-' means output to stdout
modkit pileup ${modbam} - \
  # --mod-code a will make a track of 6mA
  | tee >(modkit bedmethyl tobigwig - ${outdir}/subsample_pileup_a.bw --mod-code a -g ${sizes} --log ${outdir}/bm_a.log --negative-strand-values --suppress-progress) \
  # you can use "bm" or "bedmethyl"
  # --mod-code (or --mod-codes h,m) will combine the 5hmC and 5mC counts into a track
  | tee >(modkit bm tobigwig - ${outdir}/subsample_pileup_hm.bw --mod-codes h,m -g ${sizes} --log ${outdir}/bm_hm.log --negative-strand-values --suppress-progress) \
  # finally pipe out to the pileup
  > ${pileup}

# you can use a file also, if you have conflicting bases (e.g. 6mA and 5mC) - you'll get an error.
modkit bm tobigwig ${pileup} ${outbw} --mod-code m,a -g ${sizes} --negative-strand-values

With this CLI you can capture any number of bigWig tracks whilst making a bedMethyl pileup at the same time. You can also make a bigWig from a pre-computed bedMethyl (and use tabix to get just sections of a pre-made bedMethyl).