pomoxis package¶
Submodules¶
pomoxis.assess_homopolymers module¶
-
class
pomoxis.assess_homopolymers.AverageScore(cumulative_score=0, count=0)[source]¶ Bases:
objectKeep track of a simple average of inputs.
-
property
average_score¶ Return the average of the accumulated samples.
-
property
-
pomoxis.assess_homopolymers.find_homopolymers(seq, min_length, alphabet='ACGT')[source]¶ - Parameters
seq – the sequence to search
min_length – minimum hp length to find
alphabet – alphabet of bases to look for
- Returns
-
pomoxis.assess_homopolymers.get_fraction_correct(counts)[source]¶ Calculate fraction correct vs length from counts.
pomoxis.bio module¶
-
pomoxis.bio.reverse_complement(seq)[source]¶ Reverse complement sequence.
- Param
input sequence string.
- Returns
reverse-complemented string.
-
pomoxis.bio.shotgun_library(fasta_file, mu, sigma, direction=(1, - 1))[source]¶ Generate random fragment sequences of a given input sequence
- Parameters
seq – input sequence.
mu – mean fragment length.
sigma – stdv of fragment length.
direction – tuple represention direction of output sequences with respect to the input sequence.
- Yields
sequence fragments.
Note
Could be made more efficient using buffers for random samples and handling cases separately.
pomoxis.catalogue_errors module¶
-
class
pomoxis.catalogue_errors.AlignSeg(rname, qname, pairs, rlen)¶ Bases:
tuple-
property
pairs¶ Alias for field number 2
-
property
qname¶ Alias for field number 1
-
property
rlen¶ Alias for field number 3
-
property
rname¶ Alias for field number 0
-
property
-
class
pomoxis.catalogue_errors.ClassifyErrorTest(methodName='runTest')[source]¶ Bases:
unittest.case.TestCase
-
class
pomoxis.catalogue_errors.Context(p_i, qb, rb)¶ Bases:
tuple-
property
p_i¶ Alias for field number 0
-
property
qb¶ Alias for field number 1
-
property
rb¶ Alias for field number 2
-
property
-
class
pomoxis.catalogue_errors.Error(rp, rname, qp, qname, ref, match, read, counts, klass, aggr_klass)¶ Bases:
tuple-
property
aggr_klass¶ Alias for field number 9
-
property
counts¶ Alias for field number 7
-
property
klass¶ Alias for field number 8
-
property
match¶ Alias for field number 5
-
property
qname¶ Alias for field number 3
-
property
qp¶ Alias for field number 2
-
property
read¶ Alias for field number 6
-
property
ref¶ Alias for field number 4
-
property
rname¶ Alias for field number 1
-
property
rp¶ Alias for field number 0
-
property
-
pomoxis.catalogue_errors.are_adjacent(inds)[source]¶ “Check if all int indices in the interable are consecutive.
- Parameters
inds – iterable of ints
- Returns
bool
-
pomoxis.catalogue_errors.classify_error(context, indel_sizes=None)[source]¶ Classify error within an alignment.
- Parameters
context – Context object
- Indel_sizes
iterable of int, for binning indel sizes. indels >= to indel_sizes[0] will not be considered as HP splitting/joining indels
- Returns
(str reference_context, str match_line, str query_context, dict counts of sub/ins/del within context)
-
pomoxis.catalogue_errors.classify_hp_indel(p_i, key, errors, runs1, seq2)[source]¶ Look for a specific kind of HP indel that splits or joins two HPs
- Parameters
p_i – int, index of error
key – key of error type within errors (should be ‘ins’ or ‘del’)
errors – dict of error positions
- Runs1
np.ndarray, rle encoding of sequence1 (query if deletions join two HPs, or ref if insertions split a HP
- Seq2
iterable of str of sequence2 (ref if deletions join two HPs, or query if insertions split a HP.
- Returns
str classification or None
-
pomoxis.catalogue_errors.classify_hp_sub(p_i, adjacent, errors, match_line, rb_runs, qb_runs, qp_is_hp, rp_is_hp)[source]¶
-
pomoxis.catalogue_errors.get_errors(aln, tree=None)[source]¶ Find positions of errors in an aligment.
- Parameters
aln – iterable of AlignPos objects.
bed_file – path to .bed file of regions to include in analysis.
tree – intervaltree.IntervalTree object of regions to analyse.
- Returns
( [(ri, qi, ‘error_type’, last_ri, last_qi)], aligned_ref_len) ri, qi: ref and query positions error_type: ‘D’, ‘I’ or ‘S’ last_ri, last_qi: ref and query positions of the last match aligned_ref_len: total aligned reference length (taking account of masking tree)
-
pomoxis.catalogue_errors.get_run(i, runs)[source]¶ Find run to which the i’th element belongs.
- Parameters
i – int, element index wihin input to rle.
- Returns
int, element index within runs to which i belongs.
-
pomoxis.catalogue_errors.plot_summary(df, outdir, prefix, ref_len)[source]¶ Create a plot showing Q-scores as largest remaining error klass is removed
-
pomoxis.catalogue_errors.preprocess_error(p, aln, search_by_q, offset=10)[source]¶ - Parameters
p – int, position (ref position, or query position)
aln – iterable of AlignPos objects.
search_by_q – bool, whether to search by query position (typically done if qi is None).
- Returns
Context object
pomoxis.common_errors_from_bam module¶
pomoxis.coverage_from_bam module¶
pomoxis.find_indels module¶
pomoxis.qscores_from_summary module¶
pomoxis.stats_from_bam module¶
Tabulate some simple alignment stats from sam.
pomoxis.subsample_bam module¶
pomoxis.summary_from_stats module¶
Tabulate some simple alignment stats from sam.
pomoxis.util module¶
-
class
pomoxis.util.AlignPos(qpos, qbase, rpos, rbase)¶ Bases:
tuple-
property
qbase¶ Alias for field number 1
-
property
qpos¶ Alias for field number 0
-
property
rbase¶ Alias for field number 3
-
property
rpos¶ Alias for field number 2
-
property
-
class
pomoxis.util.FastxWrite(fname, mode='w', width=80, force_q=False, mock_q=10)[source]¶ Bases:
object
-
class
pomoxis.util.Region(ref_name, start, end)¶ Bases:
tuple-
property
end¶ Alias for field number 2
-
property
ref_name¶ Alias for field number 0
-
property
start¶ Alias for field number 1
-
property
-
class
pomoxis.util.SeqLen(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]¶ Bases:
argparse.ActionParse a sequence length from str such as 4.8mb or from fastx.
-
pomoxis.util.cat(files, output, chunks=10485760)[source]¶ Concatenate a set of files.
- Parameters
files – input filenames.
output – output filenames.
chunks – buffersize for filecopy.
-
pomoxis.util.chunks(iterable, n)[source]¶ Generate fixed length chunks of an interable.
- Parameters
iterable – input sequence.
n – chunk size.
-
pomoxis.util.get_pairs(aln)[source]¶ Return generator of pairs.
- Parameters
aln – pysam.AlignedSegment object.
- Returns
generator of AlignPos objects.
-
pomoxis.util.get_trimmed_pairs(aln)[source]¶ Trim aligned pairs to the alignment.
- Parameters
aln – pysam.AlignedSegment object
- Yields pairs
-
pomoxis.util.intervaltrees_from_bed(path_to_bed)[source]¶ Created dict of intervaltrees from a .bed file, indexed by chrom.
- Parameters
path_to_bed – str, path to .bed file.
- Returns
{ str chrom: intervaltree.IntervalTree obj }.
-
pomoxis.util.parse_regions(regions, ref_lengths=None)[source]¶ Parse region strings into Region objects.
- Parameters
regions – iterable of str
ref_lengths – {str ref_names: int ref_lengths}, if provided Region.end will default to the reference length instead of None.
>>> parse_regions(['Ecoli'])[0] Region(ref_name='Ecoli', start=0, end=None) >>> parse_regions(['Ecoli:1000-2000'])[0] Region(ref_name='Ecoli', start=1000, end=2000) >>> parse_regions(['Ecoli:-1000'])[0] Region(ref_name='Ecoli', start=0, end=1000) >>> parse_regions(['Ecoli:500-'])[0] Region(ref_name='Ecoli', start=500, end=None) >>> parse_regions(['Ecoli'], ref_lengths={'Ecoli':4800000})[0] Region(ref_name='Ecoli', start=0, end=4800000) >>> parse_regions(['NC_000921.1:10000-20000'])[0] Region(ref_name='NC_000921.1', start=10000, end=20000)
-
pomoxis.util.reverse_bed()[source]¶ Convert bed-file coordinates to coordinates on the reverse strand.
-
pomoxis.util.split_fastx(fname, output, chunksize=10000)[source]¶ Split records in a fasta/q into fixed lengths.
- Parameters
fname – input filename.
output – output filename.
chunksize – (maximum) length of output records.
Module contents¶
-
pomoxis.get_prog_path(prog)[source]¶ Get the absolute path of bundled executables.
- Parameters
prog – programme (file) name.
- Returns
absolute path to executable.