Fast5 Examples ============== The following code snippets demonstrate basic IO using key features of the API. Read Files ---------- The library provides the `Fast5` class which extends `h5py.File` with methods for acquiring common datasets and attributes from files without requiring knowledge of the file structure. To read a file and obtain a useful summary: .. code-block:: python from fast5_research import Fast5 filename='my.fast5' with Fast5(filename) as fh: raw = fh.get_read(raw=True) summary = fh.summary() print('Raw is {} samples long.'.format(len(raw))) print('Summary {}.'.format(summary)) Note that in this example the raw data will be provided in pA s. The library also allows writing of files which are conformant with Oxford Nanopore Technologies' software. Certain meta data are needed, which the library will enforce are present: .. code-block:: python import numpy as np from fast5_research import Fast5 filename='my_new.fast5' mean, stdv, n = 40.0, 2.0, 10000 raw_data = np.random.laplace(mean, stdv/np.sqrt(2), int(dwell)) # example of how to digitize data start, stop = int(min(raw_data - 1)), int(max(raw_data + 1)) rng = stop - start digitisation = 8192.0 bins = np.arange(start, stop, rng / digitisation) # np.int16 is required, the library will refuse to write anything other raw_data = np.digitize(raw_data, bins).astype(np.int16) # The following are required meta data channel_id = { 'digitisation': digitisation, 'offset': 0, 'range': rng, 'sampling_rate': 4000, 'channel_number': 1, } read_id = { 'start_time': 0, 'duration': len(raw_data), 'read_number': 1, 'start_mux': 1, 'read_id': str(uuid4()), 'scaling_used': 1, 'median_before': 0, } tracking_id = { 'exp_start_time': '1970-01-01T00:00:00Z', 'run_id': str(uuid4()).replace('-',''), 'flow_cell_id': 'FAH00000', } context_tags = {} with Fast5.New(filename, 'w', tracking_id=tracking_id, context_tags=context_tags, channel_id=channel_id) as h: h.set_raw(raw_data, meta=read_id, read_number=1) Bulk Files ---------- The library exposes data within bulk `.fast5` files through the `BulkFast5` class: .. code-block:: python from fast5_research import BulkFast5 filename = 'my_bulk.fast5' channel = 100 samples = [1000, 100000] with BulkFast5(filename) as fh: raw = fh.get_raw(channel, raw_indices=samples) multiplexer_changes = get_mux_changes_in_window( channel, raw_indices=samples) The `BulkFast5` class provides in-memory caching of many intermediate results, to optimize repeated calls to the same methods.