.. vim: set filetype=rst

==============================
khmer's command-line interface
==============================

The simplest way to use khmer's functionality is through the command
line scripts, located in the scripts/ directory of the khmer
distribution.  Below is our documentation for these scripts.  Note
that all scripts can be given :option:`-h` which will print out
a list of arguments taken by that script.

Many scripts take :option:`-x` and :option:`-N` parameters, which drive khmer's
memory usage. These parameters depend on details of your data set; for more information
on how to choose them, see :doc:`choosing-table-sizes`.

You can also override the default values of :option:`--ksize`/:option:`-k`,
:option:`--n_tables`/:option:`-N`, and :option:`--min-tablesize`/:option:`-x` with
the environment variables `KHMER_KSIZE`, `KHMER_N_TABLES`, and
`KHMER_MIN_TABLESIZE` respectively.

1. :ref:`scripts-counting`
2. :ref:`scripts-partitioning`
3. :ref:`scripts-diginorm`
4. :ref:`scripts-read-handling`

.. note::
 
   Almost all scripts take in either FASTA and FASTQ format, and
   output the same.  Some scripts may only recognize FASTQ if the file
   ending is '.fq' or '.fastq', at least for now.

   Files ending with '.gz' will be treated as gzipped files, and
   files ending with '.bz2' will be treated as bzip2'd files.

.. _scripts-counting:

k-mer counting and abundance filtering
======================================

.. autoprogram:: load-into-counting:get_parser()
        :prog: load-into-counting.py

.. autoprogram:: abundance-dist:get_parser()
        :prog: abundance-dist.py

.. autoprogram:: abundance-dist-single:get_parser()
        :prog: abundance-dist-single.py

.. autoprogram:: filter-abund:get_parser()
        :prog: filter-abund.py

.. autoprogram:: filter-abund-single:get_parser()
        :prog: filter-abund-single.py

.. autoprogram:: trim-low-abund:get_parser()
        :prog: trim-low-abund.py

.. autoprogram:: count-median:get_parser()
        :prog: count-median.py

.. autoprogram:: count-overlap:get_parser()
        :prog: count-overlap.py

.. _scripts-partitioning:

Partitioning
============

.. autoprogram:: do-partition:get_parser()
        :prog: do-partition.py

.. autoprogram:: load-graph:get_parser()
        :prog: load-graph.py

See :program:`extract-partitions.py` for a complete workflow.

.. autoprogram:: partition-graph:get_parser()
        :prog: partition-graph.py

See 'Artifact removal' to understand the stoptags argument.

.. autoprogram:: merge-partitions:get_parser()
        :prog: merge-partition.py

.. autoprogram:: annotate-partitions:get_parser()
        :prog: annotate-partitions.py

.. autoprogram:: extract-partitions:get_parser()
        :prog: extract-partitions.py
 
Artifact removal
----------------

The following scripts are specialized scripts for finding and removing
highly-connected k-mers (HCKs).  See :doc:`partitioning-big-data`.

.. autoprogram:: make-initial-stoptags:get_parser()
        :prog: make-initial-stoptags.py

.. autoprogram:: find-knots:get_parser()
        :prog: find-knots.py

.. autoprogram:: filter-stoptags:get_parser()
        :prog: filter-stoptags.py

.. _scripts-diginorm:

Digital normalization
=====================

.. autoprogram:: normalize-by-median:get_parser()
        :prog: normalize-by-median.py

.. _scripts-read-handling:

Read handling: interleaving, splitting, etc.
============================================

.. autoprogram:: extract-long-sequences:get_parser()
        :prog: extract-long-sequences.py

.. autoprogram:: extract-paired-reads:get_parser()
        :prog: extract-paired-reads.py

.. autoprogram:: fastq-to-fasta:get_parser()
        :prog: fastq-to-fasta.py

.. autoprogram:: interleave-reads:get_parser()
        :prog: interleave-reads.py

.. autoprogram:: readstats:get_parser()
        :prog: readstats.py

.. autoprogram:: sample-reads-randomly:get_parser()
        :prog: sample-reads-randomly.py

.. autoprogram:: split-paired-reads:get_parser()
        :prog: split-paired-reads.py