pomoxis package¶
Submodules¶
pomoxis.assess_homopolymers module¶
-
class
pomoxis.assess_homopolymers.
AverageScore
(cumulative_score=0, count=0)[source]¶ Bases:
object
Keep track of a simple average of inputs.
-
property
average_score
¶ Return the average of the accumulated samples.
-
property
-
pomoxis.assess_homopolymers.
find_homopolymers
(seq, min_length, alphabet='ACGT')[source]¶ - Parameters
seq – the sequence to search
min_length – minimum hp length to find
alphabet – alphabet of bases to look for
- Returns
-
pomoxis.assess_homopolymers.
get_fraction_correct
(counts)[source]¶ Calculate fraction correct vs length from counts.
pomoxis.bio module¶
-
pomoxis.bio.
reverse_complement
(seq)[source]¶ Reverse complement sequence.
- Param
input sequence string.
- Returns
reverse-complemented string.
-
pomoxis.bio.
shotgun_library
(fasta_file, mu, sigma, direction=(1, - 1))[source]¶ Generate random fragment sequences of a given input sequence
- Parameters
seq – input sequence.
mu – mean fragment length.
sigma – stdv of fragment length.
direction – tuple represention direction of output sequences with respect to the input sequence.
- Yields
sequence fragments.
Note
Could be made more efficient using buffers for random samples and handling cases separately.
pomoxis.catalogue_errors module¶
-
class
pomoxis.catalogue_errors.
AlignSeg
(rname, qname, pairs, rlen)¶ Bases:
tuple
-
property
pairs
¶ Alias for field number 2
-
property
qname
¶ Alias for field number 1
-
property
rlen
¶ Alias for field number 3
-
property
rname
¶ Alias for field number 0
-
property
-
class
pomoxis.catalogue_errors.
ClassifyErrorTest
(methodName='runTest')[source]¶ Bases:
unittest.case.TestCase
-
class
pomoxis.catalogue_errors.
Context
(p_i, qb, rb)¶ Bases:
tuple
-
property
p_i
¶ Alias for field number 0
-
property
qb
¶ Alias for field number 1
-
property
rb
¶ Alias for field number 2
-
property
-
class
pomoxis.catalogue_errors.
Error
(rp, rname, qp, qname, ref, match, read, counts, klass, aggr_klass)¶ Bases:
tuple
-
property
aggr_klass
¶ Alias for field number 9
-
property
counts
¶ Alias for field number 7
-
property
klass
¶ Alias for field number 8
-
property
match
¶ Alias for field number 5
-
property
qname
¶ Alias for field number 3
-
property
qp
¶ Alias for field number 2
-
property
read
¶ Alias for field number 6
-
property
ref
¶ Alias for field number 4
-
property
rname
¶ Alias for field number 1
-
property
rp
¶ Alias for field number 0
-
property
-
pomoxis.catalogue_errors.
are_adjacent
(inds)[source]¶ “Check if all int indices in the interable are consecutive.
- Parameters
inds – iterable of ints
- Returns
bool
-
pomoxis.catalogue_errors.
classify_error
(context, indel_sizes=None)[source]¶ Classify error within an alignment.
- Parameters
context – Context object
- Indel_sizes
iterable of int, for binning indel sizes. indels >= to indel_sizes[0] will not be considered as HP splitting/joining indels
- Returns
(str reference_context, str match_line, str query_context, dict counts of sub/ins/del within context)
-
pomoxis.catalogue_errors.
classify_hp_indel
(p_i, key, errors, runs1, seq2)[source]¶ Look for a specific kind of HP indel that splits or joins two HPs
- Parameters
p_i – int, index of error
key – key of error type within errors (should be ‘ins’ or ‘del’)
errors – dict of error positions
- Runs1
np.ndarray, rle encoding of sequence1 (query if deletions join two HPs, or ref if insertions split a HP
- Seq2
iterable of str of sequence2 (ref if deletions join two HPs, or query if insertions split a HP.
- Returns
str classification or None
-
pomoxis.catalogue_errors.
classify_hp_sub
(p_i, adjacent, errors, match_line, rb_runs, qb_runs, qp_is_hp, rp_is_hp)[source]¶
-
pomoxis.catalogue_errors.
get_errors
(aln, tree=None)[source]¶ Find positions of errors in an aligment.
- Parameters
aln – iterable of AlignPos objects.
bed_file – path to .bed file of regions to include in analysis.
tree – intervaltree.IntervalTree object of regions to analyse.
- Returns
( [(ri, qi, ‘error_type’, last_ri, last_qi)], aligned_ref_len) ri, qi: ref and query positions error_type: ‘D’, ‘I’ or ‘S’ last_ri, last_qi: ref and query positions of the last match aligned_ref_len: total aligned reference length (taking account of masking tree)
-
pomoxis.catalogue_errors.
get_run
(i, runs)[source]¶ Find run to which the i’th element belongs.
- Parameters
i – int, element index wihin input to rle.
- Returns
int, element index within runs to which i belongs.
-
pomoxis.catalogue_errors.
plot_summary
(df, outdir, prefix, ref_len)[source]¶ Create a plot showing Q-scores as largest remaining error klass is removed
-
pomoxis.catalogue_errors.
preprocess_error
(p, aln, search_by_q, offset=10)[source]¶ - Parameters
p – int, position (ref position, or query position)
aln – iterable of AlignPos objects.
search_by_q – bool, whether to search by query position (typically done if qi is None).
- Returns
Context object
pomoxis.common_errors_from_bam module¶
pomoxis.coverage_from_bam module¶
pomoxis.find_indels module¶
pomoxis.qscores_from_summary module¶
pomoxis.stats_from_bam module¶
Tabulate some simple alignment stats from sam.
pomoxis.subsample_bam module¶
pomoxis.summary_from_stats module¶
Tabulate some simple alignment stats from sam.
pomoxis.util module¶
-
class
pomoxis.util.
AlignPos
(qpos, qbase, rpos, rbase)¶ Bases:
tuple
-
property
qbase
¶ Alias for field number 1
-
property
qpos
¶ Alias for field number 0
-
property
rbase
¶ Alias for field number 3
-
property
rpos
¶ Alias for field number 2
-
property
-
class
pomoxis.util.
FastxWrite
(fname, mode='w', width=80, force_q=False, mock_q=10)[source]¶ Bases:
object
-
class
pomoxis.util.
Region
(ref_name, start, end)¶ Bases:
tuple
-
property
end
¶ Alias for field number 2
-
property
ref_name
¶ Alias for field number 0
-
property
start
¶ Alias for field number 1
-
property
-
class
pomoxis.util.
SeqLen
(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]¶ Bases:
argparse.Action
Parse a sequence length from str such as 4.8mb or from fastx.
-
pomoxis.util.
cat
(files, output, chunks=10485760)[source]¶ Concatenate a set of files.
- Parameters
files – input filenames.
output – output filenames.
chunks – buffersize for filecopy.
-
pomoxis.util.
chunks
(iterable, n)[source]¶ Generate fixed length chunks of an interable.
- Parameters
iterable – input sequence.
n – chunk size.
-
pomoxis.util.
get_pairs
(aln)[source]¶ Return generator of pairs.
- Parameters
aln – pysam.AlignedSegment object.
- Returns
generator of AlignPos objects.
-
pomoxis.util.
get_trimmed_pairs
(aln)[source]¶ Trim aligned pairs to the alignment.
- Parameters
aln – pysam.AlignedSegment object
- Yields pairs
-
pomoxis.util.
intervaltrees_from_bed
(path_to_bed)[source]¶ Created dict of intervaltrees from a .bed file, indexed by chrom.
- Parameters
path_to_bed – str, path to .bed file.
- Returns
{ str chrom: intervaltree.IntervalTree obj }.
-
pomoxis.util.
parse_regions
(regions, ref_lengths=None)[source]¶ Parse region strings into Region objects.
- Parameters
regions – iterable of str
ref_lengths – {str ref_names: int ref_lengths}, if provided Region.end will default to the reference length instead of None.
>>> parse_regions(['Ecoli'])[0] Region(ref_name='Ecoli', start=0, end=None) >>> parse_regions(['Ecoli:1000-2000'])[0] Region(ref_name='Ecoli', start=1000, end=2000) >>> parse_regions(['Ecoli:-1000'])[0] Region(ref_name='Ecoli', start=0, end=1000) >>> parse_regions(['Ecoli:500-'])[0] Region(ref_name='Ecoli', start=500, end=None) >>> parse_regions(['Ecoli'], ref_lengths={'Ecoli':4800000})[0] Region(ref_name='Ecoli', start=0, end=4800000) >>> parse_regions(['NC_000921.1:10000-20000'])[0] Region(ref_name='NC_000921.1', start=10000, end=20000)
-
pomoxis.util.
reverse_bed
()[source]¶ Convert bed-file coordinates to coordinates on the reverse strand.
-
pomoxis.util.
split_fastx
(fname, output, chunksize=10000)[source]¶ Split records in a fasta/q into fixed lengths.
- Parameters
fname – input filename.
output – output filename.
chunksize – (maximum) length of output records.
Module contents¶
-
pomoxis.
get_prog_path
(prog)[source]¶ Get the absolute path of bundled executables.
- Parameters
prog – programme (file) name.
- Returns
absolute path to executable.