Len
is the field's size, in bytes, and Off
is the field's zero-based
byte offset in the file/section.The header is in the format below, in file offset order. There is no magic string.
Field | Length | Value |
---|---|---|
Version | 1 | 0x04 (SAVED_FORMAT_VERSION ) |
File Type | 1 | 0x01 (SAVED_COUNTING_HT ) |
Use Bigcount | 1 | 1 if bigcounts is used, else 0 |
K-size | 1 | k-mer length, 1 <= k <= 32 |
Number of Tables | 1 | Number of Count-min Sketch tables |
All formats shall have the "magic string" OXLI
as their first bytes, after
any external compression/encoding (e.g. gzip encapsulation) is removed. Note
that this makes them incompatible with older versions of khmer.
(a.k.a CountingHash
, a Count-min Sketch)
The header is in the format below, again in the order of file offset.
Field | Len | Off | Value |
---|---|---|---|
Magic string | 4 | 0 | OXLI (SAVED_SIGNATURE ) |
Version | 1 | 4 | 0x04 (SAVED_FORMAT_VERSION ) |
File Type | 1 | 5 | 0x01 (SAVED_COUNTING_HT ) |
Use Bigcount | 1 | 6 | 0x01 if bigcounts is used, else 0x00 |
K-size | 4 | 7 | k-mer length, ht._ksize . [uint32_t ] |
Number of Tables | 1 | 11 | Number of Count-min Sketch tables,
ht._n_tables . [uint8_t ] |
Occupied Bins | 8 | 12 | Number of occupied bins |
Then follows the Countgraph's tables. For each table:
Field | Len | Off | Value |
---|---|---|---|
Table size | 8 | 0 | Length of this table, ht._tablesizes[i] .
[uint64_t ] |
Bins | N | 8 | This table's bins, length given by previous
field. [uint8_t ] |
Then follows a single value, the [uint64_t
] number of kmer: count
pairs. Then follows the Bigcount map, if this number is greater than zero. For
each kmer:
Field | Len | Off | Value |
---|---|---|---|
Kmer | 8 | 0 | Kmer's hash [HashIntoType/uint64_t ]. |
Count | 2 | 8 | Kmer's count [uint16_t ]. |
(a.k.a HashBits
, a Bloom Filter)
The header is in the format below, again in the order of file offset. Value macro definitions are given in parenthesis
Field | Len | Off | Value |
---|---|---|---|
Magic string | 4 | 0 | OXLI (SAVED_SIGNATURE ) |
Version | 1 | 4 | 0x04 (SAVED_FORMAT_VERSION ) |
File Type | 1 | 5 | 0x02 (SAVED_HASHBITS ) |
K-size | 4 | 6 | k-mer length, ht._ksize . [unsigned int ] |
Number of Tables | 1 | 10 | Number of Nodegraph tables. ht._n_tables .
[uint8_t ] |
Occupied Bins | 8 | 11 | Number of occupied bins |
Then follows the Nodegraph's tables. For each table:
Field | Len | Off | Value |
---|---|---|---|
Table size | 8 | 0 | Length of table, in bits (uint64_t ). |
Bins | N/8+1 | 8 | This table's bytes, length given by previous
field, divided by 8, plus 1 (uint8_t ). |