Blog posts and additional documentation

Hashtable and filtering

The basic inexact-matching approach used by the hashtable code is described in this blog post:

A test data set (soil metagenomics, 88m reads, 10gb) is here:

The filter-exact script can be used as a starting point.

Illumina read abundance profiles

khmer can be used to look at systematic variations in k-mer statistics across Illumina reads; see, for example, this blog post:

The fasta-to-abundance-hist and abundance-hist-by-position scripts can be used to generate the k-mer abundance profile data, after loading all the k-mer counts into a .kh file:

# first, load all the k-mer counts:
scripts/load-into-counting.py -k 20 -x 1e7 25k.kh data/25k.fq.gz

# then, build the '.freq' file that contains all of the counts by position
python sandbox/fasta-to-abundance-hist.py 25k.kh data/25k.fq.gz

# sum across positions.
python sandbox/abundance-hist-by-position.py data/25k.fq.gz.freq > out.dist

The hashtable method ‘dump_kmers_by_abundance’ can be used to dump high abundance k-mers, but we don’t have a script handy to do that yet.

You can assess high/low abundance k-mer distributions with the hi-lo-abundance-by-position script:

scripts/load-into-counting.py -k 20 25k.kh data/25k.fq.gz
python sandbox/hi-lo-abundance-by-position.py 25k.kh data/25k.fq.gz

This will produce two output files, <filename>.pos.abund=1 and <filename>.pos.abund=255.

comments powered by Disqus

Table Of Contents

Previous topic

khmer’s command-line interface

Next topic

Choosing hash sizes for khmer

This Page

Edit this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to Blog posts and additional documentation on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.