Investigating patterns with localise

One a bedMethyl table has been created, modkit localise will use the pileup and calculate per-base modification aggregate information around genomic features of interest. For example, we can investigate base modification patterns around CTCF binding sites.

5mC patterns at CTCF sites

The input requirements to modkit localise are simple:

  1. BedMethyl table that has been bgzf-compressed and tabix-indexed
  2. Regions file in BED format (plaintext).
  3. Genome sizes tab-separated file: <chrom>\t<size_in_bp>

an example command:

modkit localise ${bedmethyl} --regions ${ctcf} --genome-sizes ${sizes}

The output table has the following schema:

columnNameDescriptiontype
1mod codemodification code as present in the bedmethylstr
2offsetdistance in base pairs from the center of the genome features, negative values reflect towards the 5' of the genomeint
3n_validnumber of valid calls at this offset for this modification codeint
4n_modnumber of calls for this modification code at this offsetint
5percent_modifiedn_mod / n_valid * 100float

Optionally the --chart argument can be used to create HTML charts of the modification patterns.