v1.1.alpha07: SQL overhaul, multiplicity and fixes
23 May 2011, by ErickWork continues on tuning up the core code.
The biggest change is that guppy now makes actual SQLite databases rather than a collection of SQLite commands.
This means database building is much much faster.
For those of you compiling the code, you will now need godi-sqlite3
.
We have also finished full support for multiplicity of placements now (i.e. > 1 sequence name per placement).
They are supported in the database code.
There is also a guppy redup
command for re-adding duplicate sequences to placefiles generated from deduplicated sequence files.
Deduplication will make your pipeline much faster, and it’s easy with seqmagick (the guppy redup
documentation has some details).
Also
- fixed all of the sequence parsers to be tail-recursive, so parsing large files no longer causes segfaults.
- better consistency of output flags across all guppy commands.
- renamed the
--normal
flag forguppy kr
to--gaussian
to avoid confusion with normalization. - shuffling for
guppy kr
is now much more memory efficient, and fixed bug that was throwing off significance estimation. guppy pca
now defaults to scaling eigenvalues to percent variance.- Re-added in JTT which had been mysteriously dropped.