Command line scripts, scripts/, and sandbox/

Note

This document applies through khmer/oxli 2.0/3.0 (see Roadmap to v3.0, v4.0, v5.0) - we will revisit when the Python API falls under semantic versioning for oxli 4.0.

khmer has two conflicting goals: first, we want to provide a reliable piece of software to our users; and second, we want to be flexible and enable exploration of new algorithms and programs. To this end, we've split our command line scripts across two directories, scripts/ and sandbox/. The former is the staid, boring, reliable code; the latter is a place for exploration.

As a result, we are committed to high test coverage, stringent code review, and Semantic Versioning for files in scripts/, but explicitly not committed to this for files and functionality implemented in sandbox/. So, putting a file into scripts/ is a big deal, especially since it increases our maintenance burden for the indefinite future.

We've roughed out the following process for moving scripts into scripts/:

  • Command line scripts start in sandbox/;
  • Once their utility is proven (in a paper, for example), we can propose to move them into scripts/;
  • There's a procedure for moving scripts from sandbox/ into scripts/.

Read on!

Sandbox script requirements and suggestions

All scripts in sandbox/ must:

  • be importable (enforced by test_import_all in test_sandbox_scripts.py)
  • be mentioned in sandbox/README.rst
  • have a hash-bang line (#! /usr/bin/env python) at the top
  • be command-line executable (chmod a+x)
  • have a Copyright message (see below)
  • have lowercase names
  • use '-' as a word separator, rather than '_' or CamelCase

All new scripts being added to sandbox/ should:

  • have decent automated tests
  • be used in a protocol (see khmer-protocols) or a recipe (see khmer-recipes)
  • be pep8 clean and pylint clean-ish (see make pep8 and make_diff_pylint).

Command line standard options for scripts/

All scripts in scripts/ should have the following options, if they could apply:

  • --version - should always apply
  • --help - should always apply
  • --force - override any sanity checks that may prevent the script from running
  • --loadgraph and --savegraph - where appropriate (see khmer_args.py)

If an option is of type type=argparse.FileType('w') then you need to also specify a metavar for the documentation and help formatting. Example:

parser.add_argument('-R', '--report', metavar='report_filename',
    type=argparse.FileType('w'))

A checklist for moving a script into the scripts/ directory from sandbox/

Copy or paste this checklist into the PR, in addition to the normal development/PR checklist:

- [ ] most or all lines of code are covered by automated tests (see output of ``make diff-cover``)
- [ ] ``make diff_pylint`` is clean
- [ ] the script has been updated with a ``get_parser()`` and added to doc/user/scripts.txt
- [ ] argparse help text exists, with an epilog docstring, with examples and options
- [ ] standard command line options are implemented
- [ ] version and citation information is output to STDERR (`khmer_args.info(...)`)
- [ ] support '-' (STDIN) as an input file, if appropriate
- [ ] support designation of an output file (including STDOUT), if appropriate
- [ ] script reads and writes sequences in compressed format
- [ ] runtime diagnostic information (progress, etc.) is output to STDERR
- [ ] script has been removed from sandbox/README.rst
comments powered by Disqus