Guidelines for continued development

The khmer Getting Started documentation is intended for first-time contributors, and is designed to make that first contribution as easy and as painless as possible. For those with an interest in making continued contributions (or those with an obligation to maintain and contribute to the codebase), this document describes the coding guidelines we follow, as well as some tips that will improve the development process for everyone involved.

Beyond your first contribution

If your first contribution to khmer has been accepted and merged into the codebase, you may be wondering what to do next. You can poke around at additional “low-hanging fruit” issues, but if you’d like to sink your teeth into something more substantial, here are a few suggestions.

  • If you’re knowledgeable in C++ and/or Python and/or documentation and/or biology, we’d love to attract further contributions to khmer. Please visit the issues list and browse about and find something interesting looking.
  • One general thing we’d like to do is increase our test coverage. You can go find test coverage information at Codecov.io by scrolling to the bottom and clicking on individual files; or, ask us on khmer-project@idyll.org for suggestions.
  • Ask us! Ask khmer-project@idyll.org for suggestions on what to do next. We can suggest particularly ripe low-hanging fruit, or find some other issues that suit your interests and background.
  • You can also help other people out by watching for new issues or looking at pull requests. Remember to be nice and polite!

Programming languages

All Python code in khmer must run correctly in both Python version 2 and 3.

For C++ code, any feature in C++11 is fine to use. Specifically we support features found in GCC 4.8.2. Our automated tests use gcc 4.8.4 on linux. See https://github.com/dib-lab/khmer/issues/598 for an in-depth discussion. Please do not use features from C++14 or newer.

Code style standards

All plain-text files should have line widths of 80 characters or less unless that is not supported for the particular file format.

For C++, we use Todd Hoff’s coding standard, and astyle -A10 / “One True Brace Style” indentation and bracing. Note: @CTB needs Emacs settings that work for this.

Vim users may want to set the ARTISTIC_STYLE_OPTIONS shell variable to “-A10 –max-code-length=80” and run `:%!astyle` to reformat. The four space indentation can be set with:

set expandtab
set shiftwidth=4
set softtabstop=4

For Python, PEP 8 is our standard. The `pep8` and `autopep8` Makefile targets are helpful.

Code, scripts, and documentation must have their spelling checked.

Python-based codespell can be applied to multiple files easily. codespell can be installed via the following:

pip install codespell

To run codespell over only what has been changed on the branch my-branch:

git diff master..my-branch > diff_file
codespell diff_file

To run codespell over a single file:

codespell path/to/file

To make codespell fix the issues it finds automatically:

codespell -w path/to/file

Please note that as codespell works off of a listing of possible misspellings it may not catch all errors. If you find a spelling error that is not caught by codespell feel free to open a pull request at the project page to add it to the dictionary.

Vim users can run:

:setlocal spell spelllang=en_us

Use ]s and [s to navigate between misspellings and z= to suggest a correctly spelled word. zg will add a word as a good word.

GNU aspell can also be used to check the spelling in a single file:

aspell check --mode ccpp $filename

Cython Style

Cython code can become messy very quickly, and as such, we have guidelines for style and structure.

When wrapping code from liboxli:

  • extern definition should begin with Cp; for example, CpHashtable wraps oxli::Hashtable.
  • If the extension class wrapping the liboxli class stores it as a pointer, it should be named _this. If it wraps a stack object directly, it should be named _obj.

For imports,

  • libc cimports next,
  • then libcpp imports and cimports.
  • followed by cimports
  • and finally, regular imports.

Generally,

  • Pure C methods should be underscore prefixed.
  • cpdef methods do not need to be underscore prefixed.

Resolving merge conflicts

It is possible that when you do a git pull you will get a “merge conflict” – This is what happens when something changed in the branch you’re pulling in in the same place you made a change in your local copy. This frequently happens in the ChangeLog file.

Git will complain loudly about merges and tell you specifically in which files they occurred. If you open the file, you’ll see something vaguely like this in the place where the merge occurred:

<<<<<<< HEAD
Changes made on the branch that is being merged into. In most cases,
this is the branch that you have currently checked out
=======
Changes made on the branch that is being merged in, almost certainly
master.
>>>>>>> abcde1234

Though there are a variety of tools to assist with resolving merge conflicts they can be quite complicated at first glance and it is usually easy enough to manually resolve the conflict.

To resolve the conflict you simply have to manually ‘meld’ the changes together and remove the merge markers.

After this you’ll have to add and commit the merge just like any other set of changes. It’s also recommended that you run tests.

Virtual environments

The khmer package, like many software packages, relies on other third-party software. Some of this software has been bundled together with khmer and is compiled when you invoke make on the command line. But some of the software khmer depends on is distributed as Python packages separately from khmer.

Python virtual environments were designed to isolate a stable development environment for a particular project. This makes it possible to maintain different versions of a Python package for different projects on your computer.

The installation instructions in the Getting Started docs install the virtualenv command on your computer. After completing those instructions, you can create a virtual environment with the command:

virtualenv -p python2 env/

(You can substitute python3 for python2 if Python version 3 is installed on your system.) This command will create a new directory env/ containing your new virtual environment. The command:

source env/bin/activate

will activate the virtual environment. Now any Python packages that you install with pip or make install-dep will be installed into your isolated virtual environment.

Note that any time you create a new terminal session, using the virtual environment requires that you re-activate it.

Pull request cleanup (commit squashing)

Submitters are invited to reduce the numbers of commits in their pull requests either via git rebase -i dib/master or this recipe:

git pull # make sure the local is up to date
git pull dib master # get up to date
# fix any merge conflicts
git status # sanity check
git diff dib/master # does the diff look correct? (no merge markers)
git reset --soft dib/master # un-commit the differences from dib/master
git status # sanity check
git commit --all # package all differences in one commit
git status # sanity check
git push # should fail
git push --force # override what's in GitHub's copy of the branch/pull request

CPython Checklist

Here’s a checklist for new CPython types with future-proofing for Python 3:

- [ ] the CPython object name is of the form `khmer_${OBJECTNAME}_Object`
- [ ] Named struct with `PyObject_HEAD` macro
- [ ] `static PyTypeObject khmer_${OBJECTNAME}_Type` with the following
  entries
   - [ ] `PyVarObject_HEAD_INIT(NULL, 0)` as the object init (this includes
     the `ob_size` field).
   - [ ] all fields should have their name in a comment for readability
   - [ ] The `tp_name` filed is a dotted name with both the module name and
     the name of the type within the module. Example: `khmer.ReadAligner`
   - [ ] Deallocator defined and cast to `(destructor)` in tp_dealloc
     - [ ] The object's deallocator must be
       `Py_TYPE(obj)->tp_free((PyObject*)obj);`
   - [ ] Do _not_ define a `tp_getattr`
   - [ ] BONUS: write methods to present the state of the object via
     `tp_str` & `tp_repr`
   - [ ] _Do_ pass in the array of methods in `tp_methods`
   - [ ] _Do_ define a new method in `tp_new`
- [ ] PyMethodDef arrays contain doc strings
   - [ ] Methods are cast to `PyCFunctions`s
- [ ] Type methods use their type Object in the method signature.
- [ ] Type creation method decrements the reference to self
  (`Py_DECREF(self);`) before each error-path exit (`return NULL;`)
- [ ] No factory methods. Example: `khmer_new_readaligner`
- [ ] Type object is passed to `PyType_Ready` and its return code is checked
  in `MOD_INIT()`
- [ ] The reference count for the type object is incremented before adding
  it to the module: `Py_INCREF(&khmer_${OBJECTNAME}_Type);`.
comments powered by Disqus