Fast5 Examples

The following code snippets demonstrate basic IO using key features of the API.

Read Files

The library provides the Fast5 class which extends h5py.File with methods for acquiring common datasets and attributes from files without requiring knowledge of the file structure. To read a file and obtain a useful summary:

from fast5_research import Fast5

filename='my.fast5'

with Fast5(filename) as fh:
    raw = fh.get_read(raw=True)
    summary = fh.summary()
print('Raw is {} samples long.'.format(len(raw)))
print('Summary {}.'.format(summary))

Note that in this example the raw data will be provided in pA s.

The library also allows writing of files which are conformant with Oxford Nanopore Technologies’ software. Certain meta data are needed, which the library will enforce are present:

import numpy as np
from fast5_research import Fast5

filename='my_new.fast5'
mean, stdv, n = 40.0, 2.0, 10000
raw_data = np.random.laplace(mean, stdv/np.sqrt(2), int(dwell))

# example of how to digitize data
start, stop = int(min(raw_data - 1)), int(max(raw_data + 1))
rng = stop - start
digitisation = 8192.0
bins = np.arange(start, stop, rng / digitisation)
# np.int16 is required, the library will refuse to write anything other
raw_data = np.digitize(raw_data, bins).astype(np.int16)

# The following are required meta data
channel_id = {
    'digitisation': digitisation,
    'offset': 0,
    'range': rng,
    'sampling_rate': 4000,
    'channel_number': 1,
    }
read_id = {
    'start_time': 0,
    'duration': len(raw_data),
    'read_number': 1,
    'start_mux': 1,
    'read_id': str(uuid4()),
    'scaling_used': 1,
    'median_before': 0,
}
tracking_id = {
    'exp_start_time': '1970-01-01T00:00:00Z',
    'run_id': str(uuid4()).replace('-',''),
    'flow_cell_id': 'FAH00000',
}
context_tags = {}

with Fast5.New(filename, 'w', tracking_id=tracking_id, context_tags=context_tags, channel_id=channel_id) as h:
    h.set_raw(raw_data, meta=read_id, read_number=1)

Bulk Files

The library exposes data within bulk .fast5 files through the BulkFast5 class:

from fast5_research import BulkFast5

filename = 'my_bulk.fast5'
channel = 100
samples = [1000, 100000]

with BulkFast5(filename) as fh:
    raw = fh.get_raw(channel, raw_indices=samples)
    multiplexer_changes = get_mux_changes_in_window(
        channel, raw_indices=samples)

The BulkFast5 class provides in-memory caching of many intermediate results, to optimize repeated calls to the same methods.