Commandline Tools (CLT)

Version 0.3.2: March 8, 2016

Fixed a bug with installation on Windows due to old file paths lingering in changeo.egg-info/SOURCES.txt.

Updated license from CC BY-NC-SA 3.0 to CC BY-NC-SA 4.0.

MakeDb:

Updated igblast subcommand to correctly parse records with indels. Now igblast must be run with the argument outfmt "7 std qseq sseq btop".
Changed the names of the FWR and CDR output columns added with --regions to <region>_IMGT.
Added V_BTOP and J_BTOP output when the --scores flag is specified to the igblast subcommand.

CreateGermlines:

Fixed a bug producing incorrect values in the SEQUENCE field on the log file.

Version 0.3.1: December 18, 2015

MakeDb:

Fixed bug wherein the imgt subcommand was not properly recognizing an extracted folder as input to the -i argument.

Version 0.3.0: December 4, 2015

Conversion to a proper Python package which uses pip and setuptools for installation.

The package now requires Python 3.4. Python 2.7 is not longer supported.

The required dependency versions have been bumped to numpy 1.9, scipy 0.14, pandas 0.16 and biopython 1.65.

AnalyzeAa:

This tool was removed. This functionality has been migrated to the alakazam R package.

DbCore:

Divided DbCore functionality into the separate modules: Defaults, Distance, IO, Multiprocessing and Receptor.

DefineClones:

Added --sf flag to specify sequence field to be used to calculate distance between sequences.
Fixed bug in wherein sequences with missing data in grouping columns were being assigned into a single group and clustered. Sequences with missing grouping variables will now be failed.
Fixed bug where sequences with "None" junctions were grouped together.

GapRecords:

This tool was removed in favor of adding IMGT gapping support to igblast subcommand of MakeDb.

IgCore:

Remove IgCore in favor of dependency on pRESTO >= 0.5.0.

MakeDb:

Updated IgBLAST parser to create an IMGT gapped sequence and infer the junction region as defined by IMGT.
Added the --regions flag which adds extra columns containing FWR and CDR regions as defined by IMGT.
Added support to imgt subcommand for the new IMGT/HighV-QUEST compression scheme (.txz files).

Version 0.2.5: August 25, 2015

CreateGermlines:

Removed default '-r' repository and added informative error messages when invalid germline repositories are provided.
Updated '-r' flag to take list of folders and/or fasta files with germlines.

Version 0.2.4: August 19, 2015

MakeDb:

Fixed a bug wherein N1 and N2 region indexing was off by one nucleotide for the igblast subcommand (leading to incorrect SEQUENCE_VDJ values).

ParseDb:

Fixed a bug wherein specifying the -f argument to the index subcommand would cause an error.

Version 0.2.3: July 22, 2015

DefineClones:

Fixed a typo in the default normalization setting of the bygroup subcommand, which was being interpreted as 'none' rather than 'len'.
Changed the 'hs5f' model of the bygroup subcommand to be centered -log10 of the targeting probability.
Added the --sym argument to the bygroup subcommand which determines how asymmetric distances are handled.

Version 0.2.2: July 8, 2015

CreateGermlines:

Germline creation now works for IgBLAST output parsed with MakeDb. The argument --sf SEQUENCE_VDJ must be provided to generate germlines from IgBLAST output. The same reference database used for the IgBLAST alignment must be specified with the -r flag.
Fixed a bug with determination of N1 and N2 region positions.

MakeDb:

Combined the -z and -f flags of the imgt subcommand into a single flag, -i, which autodetects the input type.
Added requirement that IgBLAST input be generated using the -outfmt "7 std qseq" argument to igblastn.
Modified SEQUENCE_VDJ output from IgBLAST parser to include gaps inserted during alignment.
Added correction for IgBLAST alignments where V/D, D/J or V/J segments are assigned overlapping positions.
Corrected N1_LENGTH and N2_LENGTH calculation from IgBLAST output.
Added the --scores flag which adds extra columns containing alignment scores from IMGT and IgBLAST output.

Version 0.2.1: June 18, 2015

DefineClones:

Removed mouse 3-mer model, 'm3n'.

Version 0.2.0: June 17, 2015

Initial public prerelease.

Output files were added to the usage documentation of all scripts.

General code cleanup.

DbCore:

Updated loading of database files to convert column names to uppercase.

AnalyzeAa:

Fixed a bug where junctions less than one codon long would lead to a division by zero error.
Added --failed flag to create database with records that fail analysis.
Added --sf flag to specify sequence field to be analyzed.

CreateGermlines:

Fixed a bug where germline sequences could not be created for light chains.

DefineClones:

Added a human 1-mer model, 'hs1f', which uses the substitution rates from from Yaari et al, 2013.
Changed default model to 'hs1f' and default normalization to length for bygroup subcommand.
Added --link argument which allows for specification of single, complete, or average linkage during clonal clustering (default single).

GapRecords:

Fixed a bug wherein non-standard sequence fields could not be aligned.

MakeDb:

Fixed bug where the allele 'TRGVA*01' was not recognized as a valid allele.

ParseDb:

Added rename subcommand to ParseDb which renames fields.

Version 0.2.0.beta-2015-05-31: May 31, 2015

Minor changes to a few output file names and log field entries.

ParseDb:

Added index subcommand to ParseDb which adds a numeric index field.

Version 0.2.0.beta-2015-05-05: May 05, 2015

Prerelease for review.

alakazam R package

Version 0.2.3: February 10, 2016

General:

Fixed a bug wherein the package would not build on R < 3.2.0 due to changes in base::nchar().
Changed R dependency to R >= 3.1.2.

Version 0.2.2: January 29, 2016

General:

Updated license from CC BY-NC-SA 3.0 to CC BY-NC-SA 4.0.
Internal changes to conform to CRAN policies.

Amino Acid Analysis:

Fixed bug where arguments for the aliphatic() function were not being passed through the ellipsis argument of aminoAcidProperties().
Improved amino acid analysis vignette.
Added check for correctness of amino acids sequences to aminoAcidProperties().
Renamed AA_TRANS to ABBREV_AA.

Diversity:

Added evenness and bootstrap standard deviation to rarefyDiversity() output.

Lineage:

Added ExampleTrees data with example output from buildPhylipLineage().

Version 0.2.1: December 18, 2015

General:

Removed plyr dependency.
Added dplyr, lazyeval and stringi dependencies.
Added strict requirement for igraph version >= 1.0.0.
Renamed getDNADistMatrix() and getAADistMatrix() to getDNAMatrix and getAAMatrix(), respectively.
Added getSeqMatrix() which calculates a pairwise distance matrix for a set of sequences.
Modified default plot sizing to be more appropriate for export to PDF figures with 7-8 inch width.
Added multiggplot() function for performing multiple panel plots.

Amino Acid Analysis:

Migrated amino acid property analysis from Change-O CTL to alakazam. Includes the new functions gravy(), bulk(), aliphatic(), polar(), charge(), countPatterns() and aminoAcidProperties().

Annotation:

Added support for unusual TCR gene names, such as 'TRGVA*01'.
Added removal of 'D' label (gene duplication) from gene names when parsed with getSegment(), getAllele(), getGene() and getFamily(). May be disabled by providing the argument strip_d=FALSE.
Added countGenes() to tabulate V(D)J allele, gene and family usage.

Diversity:

Added several functions related to analysis of clone size distributions, including countClones(), estimateAbundance() and plotAbundance().
Renamed resampleDiversity() to rarefyDiversity() and changed many of the internals. Bootstrapping is now performed on an inferred complete relative abundance distribution.
Added support for inclusion of copy number in clone size determination within rarefyDiversity() and testDiversity().
Diversity scores and confiderence intervals within rarefyDiversity() and testDiversity() are now calculated using the mean and standard deviation of the bootstrap realizations, rather than the median and upper/lower quantiles.
Added ability to add counts to the legend in plotDiversityCurve().

Version 0.2.0: June 15, 2015

Initial public release.

General:

Added citations for the citation("alakazam") command.

Version 0.2.0.beta-2015-05-30: May 30, 2015

Lineage:

Added more error checking to buildPhylipLineage().

Version 0.2.0.beta-2015-05-26: May 26, 2015

Lineage:

Fixed issue where buildPhylipLineage() would hang on R 3.2 due to R change request PR#15508.

Version 0.2.0.beta-2015-05-05: May 05, 2015

Prerelease for review.

shazam R package

Version 0.1.2: February 20, 2016

General:

Renamed package from shm to shazam.
Internal changes to conform to CRAN policies.
Compressed and moved example database to the data object InfluenzaDb.
Fixed several bugs where functions would not work properly when passed a dplyr::tbl_df object instead of a data.frame.
Changed R dependency to R >= 3.1.2.
Added stringi dependency.

Distance Profiling:

Fixed a bug wherein distToNearest() did not return the nearest neighbor with a non-zero distance.

Targeting Models:

Performance improvements to createSubstitutionMatrix(),
createMutabilityMatrix(), and plotMutability().
Modified color scheme in plotMutability().
Fixed errors in the targeting models vignette.

Mutation Profiling:

Added the MutationDefinition objects MUTATIONS_CHARGE, MUTATIONS_HYDROPATHY, MUTATIONS_POLARITY providing alternate approaches to defining replacement and silent annotations to mutations when calling calcDBObservedMutations() and calcDBExpectedMutations().
Fixed a few bugs where column names, region definitions or mutation models were not being recognized properly when non-default values were used.
Made the behavior of regionDefinition=NULL consistent for all mutation profiling functions. Now the entire sequence is used as the region and calculations are made accordingly.
calcDBObservedMutations() returns R and S mutations also when regionDefinition=NULL. Older versions reported the sum of R and S mutations. The function will add the columns OBSERVED_SEQ_R and OBSERVED_SEQ_S when frequency=FALSE, and MU_FREQ_SEQ_R and MU_FREQ_SEQ_R when frequency=TRUE.

Version 0.1.1: December 18, 2015

General:

Swapped dependency on doSNOW for doParallel.
Swapped dependency on plyr for dplyr.
Swapped dependency on reshape2 for tidyr.
Documentation clean up.

Distance Profiling:

Changed underlying method of calcTargetingDistance to be negative log10 of the probability that is then centered at one by dividing by the mean distance.
Added symmetry parameter to distToNearest to change behavior of how asymmetric distances (A->B != B->A) are combined to get distance between A and B.
Updated error handling in distToNearest to issue warning when unrecognized character is in the sequence and return an NA.
Fixed bug in 'aa' model in distToNearest that was calculating distance incorrectly when normalizing by length.
Changed behavior to return nearest nonzero distance neighbor.

Mutation Profiling:

Renamed calcDBClonalConsensus to collapseByClone Also, renamed argument collapseByClone to expandedDb.
Fixed a (major) bug in calcExpectedMutations. Previously, the targeting calculation was incorrect and resulted in incorrect expected mutation frequencies. Note, that this also resulted in incorrect BASELINe Selection (Sigma) values.
Changed denominator in calcObservedMutations to be based on informative (unambiguous) positions only.
Added nonTerminalOnly parameter to calcDBClonalConsensus indicating whether to consider mutations at leaves or not (defaults to false).

Selection Analysis:

Updated groupBaseline. Now when regrouping a Baseline object (i.e. grouping previously grouped PDFs) weighted convolution is performed.
Added "imbalance" test statistic to the Baseline selection calculation.
Extended the Baseline Object to include binomK, binomN and binomP Similar to numbOfSeqs, each of these are a matrix. They contain binomial inputs for each sequence and region.

Targeting Models:

Added minNumMutations parameter to createSubstitutionMatrix. This is the minimum number of observed 5-mers required for the substituion model. The substitution rate of 5-mers with fewer number of observed mutations will be inferred from other 5-mers.
Added minNumSeqMutations parameter to createMutabilityMatrix. This is the minimum number of mutations required in sequences containing the 5-mers of interest. The mutability of 5-mers with fewer number of observed mutations in the sequences will be inferred.
Added returnModel parameter to createSubstitutionMatrix. This gives user the option to return 1-mer or 5-mer model.
Added returnSource parameter to createMutabilityMatrix. If TRUE, the code will return a data frame indicating whether each 5-mer mutability is observed or inferred.
In createSubstitutionMatrix and createMutabilityMatrix, fixed a bug when multipleMutation is set to "ignore".
Changed inference procedure for the 5-mer substitution model.
Added inference procedure for 5-mers without enough observed mutations in the mutability model.
Fixed a bug in background 5-mer count for the RS model.
Fixed a bug in IMGT gap handling in createMutabilityMatrix.
Fixed a bug that occurs when sequences are in lower cases.

Version 0.1.0: June 18, 2015

Initial public release.

General:

Restructured the S4 class documentation.
Fixed bug wherein example Influenza.tab file did not load on Mac OS X.
Added citations for citation("shazam") command.
Added dependency on data.table >= 1.9.4 to fix bug that occured with earlier versions of data.table.

Distance Profiling:

Added a human 1-mer substitution matrix, HS1FDistance, based on the Yaari et al, 2013 data.
Set the hs1f as the default distance model for distToNearest().
Added conversion of sequences to uppercase in distToNearest().
Fixed a bug wherein unrecongized (including lowercase) characters would lead to silenting returning a distance of 0 to the neared neighbor. Unrecognized characters will now raise an error.

Mutation Profiling:

Fixed bug in calcDBClonalConsensus() so that the function now works correctly when called with the argument collapseByClone=FALSE.
Added the frequency argument to calcObservedMutations() and calcDBObservedMutations(), which enables return of mutation frequencies rather the default of mutation counts.

Targeting Models:

Removed M3NModel and all options for using said model.
Fixed bug in createSubstitutionMatrix() and createMutabilityMatrix() where IMGT gaps were not being handled.

Version 0.1.0.beta-2015-05-30: May 30, 2015

General:

Added more error checking.

Targeting Models:

Updated the targeting model workflow to include a clonal consensus step.

Version 0.1.0.beta-2015-05-11: May 11, 2015

Targeting Models:

Added the U5NModel, which is a uniform 5-mer model.
Improvements to plotMutability() output.

Version 0.1.0.beta-2015-05-05: May 05, 2015

Prerelease for review.

tigger R package

See the TIgGER website.