Command line scripts, scripts/, and sandbox/


This document applies through khmer/oxli 2.0/3.0 (see Roadmap to v3.0, v4.0, v5.0) - we will revisit when the Python API falls under semantic versioning for oxli 4.0.

khmer has two conflicting goals: first, we want to provide a reliable piece of software to our users; and second, we want to be flexible and enable exploration of new algorithms and programs. To this end, we've split our command line scripts across two directories, scripts/ and sandbox/. The former is the staid, boring, reliable code; the latter is a place for exploration.

As a result, we are committed to high test coverage, stringent code review, and Semantic Versioning for files in scripts/, but explicitly not committed to this for files and functionality implemented in sandbox/. So, putting a file into scripts/ is a big deal, especially since it increases our maintenance burden for the indefinite future.

We've roughed out the following process for moving scripts into scripts/:

  • Command line scripts start in sandbox/;
  • Once their utility is proven (in a paper, for example), we can propose to move them into scripts/;
  • There's a procedure for moving scripts from sandbox/ into scripts/.

Read on!

Sandbox script requirements and suggestions

All scripts in sandbox/ must:

  • be importable (enforced by test_import_all in
  • be mentioned in sandbox/README.rst
  • have a hash-bang line (#! /usr/bin/env python) at the top
  • be command-line executable (chmod a+x)
  • have a Copyright message (see below)
  • have lowercase names
  • use '-' as a word separator, rather than '_' or CamelCase

All new scripts being added to sandbox/ should:

  • have decent automated tests
  • be used in a protocol (see khmer-protocols) or a recipe (see khmer-recipes)
  • be pep8 clean and pylint clean-ish (see make pep8 and make_diff_pylint).

Standard and reserved command line options for scripts/

The following options are reserved, and the short option flag cannot be redefined in any official script.

  • -h|--help - this must always print out a descriptive usage statement
  • -v|--version - this must always print out the khmer version number
  • -x|--max-tablesize - this must always specify the approximate table size for storing sketches of k-mer hashes
  • -N|--n_tables - this must always specify the number of tables for storing sketches of k-mer hashes
  • -M|--max-memory-usage - this must always specify the maximum amount of memory to be consumed for storing sketches of k-mer hashes
  • -U|--unique-kmers - this must always specify the approximate number of unique k-mers in the data set
  • -k|--ksize - this must always specify the k-mer size to use
  • -q|--quiet - this must always indicate that the script's diagnostic output should be minimized or altogether eliminated

Additionally, all scripts in scripts/ should have the following options.

  • -h|--help
  • -v|--version
  • -f|--force - if applicable, override any sanity checks that may prevent the script from running

If an option is of type type=argparse.FileType('w') then you need to also specify a metavar for the documentation and help formatting. Example:

parser.add_argument('-R', '--report', metavar='report_filename',

A checklist for moving a script into the scripts/ directory from sandbox/

Copy or paste this checklist into the PR, in addition to the normal development/PR checklist:

- [ ] most or all lines of code are covered by automated tests (see output of ``make diff-cover``)
- [ ] ``make diff_pylint`` is clean
- [ ] the script has been updated with a ``get_parser()`` and added to doc/user/scripts.txt
- [ ] argparse help text exists, with an epilog docstring, with examples and options
- [ ] standard command line options are implemented
- [ ] version and citation information is output to STDERR (``)
- [ ] support '-' (STDIN) as an input file, if appropriate
- [ ] support designation of an output file (including STDOUT), if appropriate
- [ ] script reads and writes sequences in compressed format
- [ ] runtime diagnostic information (progress, etc.) is output to STDERR
- [ ] script has been removed from sandbox/README.rst
comments powered by Disqus