Commandline Tools (CLT)
Version 0.3.2: March 8, 2016
Fixed a bug with installation on Windows due to old file paths lingering in changeo.egg-info/SOURCES.txt.
Updated license from CC BY-NC-SA 3.0 to CC BY-NC-SA 4.0.
MakeDb:
- Updated igblast subcommand to correctly parse records with indels. Now igblast must be run with the argument
outfmt "7 std qseq sseq btop"
. - Changed the names of the FWR and CDR output columns added with
--regions
to<region>_IMGT
. - Added
V_BTOP
andJ_BTOP
output when the--scores
flag is specified to the igblast subcommand.
CreateGermlines:
- Fixed a bug producing incorrect values in the
SEQUENCE
field on the log file.
Version 0.3.1: December 18, 2015
MakeDb:
- Fixed bug wherein the imgt subcommand was not properly recognizing an extracted folder as input to the
-i
argument.
Version 0.3.0: December 4, 2015
Conversion to a proper Python package which uses pip and setuptools for installation.
The package now requires Python 3.4. Python 2.7 is not longer supported.
The required dependency versions have been bumped to numpy 1.9, scipy 0.14, pandas 0.16 and biopython 1.65.
AnalyzeAa:
- This tool was removed. This functionality has been migrated to the alakazam R package.
DbCore:
- Divided DbCore functionality into the separate modules: Defaults, Distance, IO, Multiprocessing and Receptor.
DefineClones:
- Added
--sf
flag to specify sequence field to be used to calculate distance between sequences. - Fixed bug in wherein sequences with missing data in grouping columns were being assigned into a single group and clustered. Sequences with missing grouping variables will now be failed.
- Fixed bug where sequences with "None" junctions were grouped together.
GapRecords:
- This tool was removed in favor of adding IMGT gapping support to igblast subcommand of MakeDb.
IgCore:
- Remove IgCore in favor of dependency on pRESTO >= 0.5.0.
MakeDb:
- Updated IgBLAST parser to create an IMGT gapped sequence and infer the junction region as defined by IMGT.
- Added the
--regions
flag which adds extra columns containing FWR and CDR regions as defined by IMGT. - Added support to imgt subcommand for the new IMGT/HighV-QUEST compression scheme (.txz files).
Version 0.2.5: August 25, 2015
CreateGermlines:
- Removed default '-r' repository and added informative error messages when invalid germline repositories are provided.
- Updated '-r' flag to take list of folders and/or fasta files with germlines.
Version 0.2.4: August 19, 2015
MakeDb:
- Fixed a bug wherein N1 and N2 region indexing was off by one nucleotide for the igblast subcommand (leading to incorrect SEQUENCE_VDJ values).
ParseDb:
- Fixed a bug wherein specifying the
-f
argument to the index subcommand would cause an error.
Version 0.2.3: July 22, 2015
DefineClones:
- Fixed a typo in the default normalization setting of the bygroup subcommand, which was being interpreted as 'none' rather than 'len'.
- Changed the 'hs5f' model of the bygroup subcommand to be centered -log10 of the targeting probability.
- Added the
--sym
argument to the bygroup subcommand which determines how asymmetric distances are handled.
Version 0.2.2: July 8, 2015
CreateGermlines:
- Germline creation now works for IgBLAST output parsed with MakeDb. The argument
--sf SEQUENCE_VDJ
must be provided to generate germlines from IgBLAST output. The same reference database used for the IgBLAST alignment must be specified with the-r
flag. - Fixed a bug with determination of N1 and N2 region positions.
MakeDb:
- Combined the
-z
and-f
flags of the imgt subcommand into a single flag,-i
, which autodetects the input type. - Added requirement that IgBLAST input be generated using the
-outfmt "7 std qseq"
argument to igblastn. - Modified SEQUENCE_VDJ output from IgBLAST parser to include gaps inserted during alignment.
- Added correction for IgBLAST alignments where V/D, D/J or V/J segments are assigned overlapping positions.
- Corrected N1_LENGTH and N2_LENGTH calculation from IgBLAST output.
- Added the
--scores
flag which adds extra columns containing alignment scores from IMGT and IgBLAST output.
Version 0.2.1: June 18, 2015
DefineClones:
- Removed mouse 3-mer model, 'm3n'.
Version 0.2.0: June 17, 2015
Initial public prerelease.
Output files were added to the usage documentation of all scripts.
General code cleanup.
DbCore:
- Updated loading of database files to convert column names to uppercase.
AnalyzeAa:
- Fixed a bug where junctions less than one codon long would lead to a division by zero error.
- Added
--failed
flag to create database with records that fail analysis. - Added
--sf
flag to specify sequence field to be analyzed.
CreateGermlines:
- Fixed a bug where germline sequences could not be created for light chains.
DefineClones:
- Added a human 1-mer model, 'hs1f', which uses the substitution rates from from Yaari et al, 2013.
- Changed default model to 'hs1f' and default normalization to length for bygroup subcommand.
- Added
--link
argument which allows for specification of single, complete, or average linkage during clonal clustering (default single).
GapRecords:
- Fixed a bug wherein non-standard sequence fields could not be aligned.
MakeDb:
- Fixed bug where the allele 'TRGVA*01' was not recognized as a valid allele.
ParseDb:
- Added rename subcommand to ParseDb which renames fields.
Version 0.2.0.beta-2015-05-31: May 31, 2015
Minor changes to a few output file names and log field entries.
ParseDb:
- Added index subcommand to ParseDb which adds a numeric index field.
Version 0.2.0.beta-2015-05-05: May 05, 2015
Prerelease for review.
alakazam R package
Version 0.2.3: February 10, 2016
General:
- Fixed a bug wherein the package would not build on R < 3.2.0 due to changes in
base::nchar()
. - Changed R dependency to R >= 3.1.2.
Version 0.2.2: January 29, 2016
General:
- Updated license from CC BY-NC-SA 3.0 to CC BY-NC-SA 4.0.
- Internal changes to conform to CRAN policies.
Amino Acid Analysis:
- Fixed bug where arguments for the
aliphatic()
function were not being passed through the ellipsis argument ofaminoAcidProperties()
. - Improved amino acid analysis vignette.
- Added check for correctness of amino acids sequences to
aminoAcidProperties()
. - Renamed
AA_TRANS
toABBREV_AA
.
Diversity:
- Added evenness and bootstrap standard deviation to
rarefyDiversity()
output.
Lineage:
- Added
ExampleTrees
data with example output frombuildPhylipLineage()
.
Version 0.2.1: December 18, 2015
General:
- Removed plyr dependency.
- Added dplyr, lazyeval and stringi dependencies.
- Added strict requirement for igraph version >= 1.0.0.
- Renamed
getDNADistMatrix()
andgetAADistMatrix()
togetDNAMatrix
andgetAAMatrix()
, respectively. - Added
getSeqMatrix()
which calculates a pairwise distance matrix for a set of sequences. - Modified default plot sizing to be more appropriate for export to PDF figures with 7-8 inch width.
- Added
multiggplot()
function for performing multiple panel plots.
Amino Acid Analysis:
- Migrated amino acid property analysis from Change-O CTL to alakazam. Includes the new functions
gravy()
,bulk()
,aliphatic()
,polar()
,charge()
,countPatterns()
andaminoAcidProperties()
.
Annotation:
- Added support for unusual TCR gene names, such as 'TRGVA*01'.
- Added removal of 'D' label (gene duplication) from gene names when parsed with
getSegment()
,getAllele()
,getGene()
andgetFamily()
. May be disabled by providing the argumentstrip_d=FALSE
. - Added
countGenes()
to tabulate V(D)J allele, gene and family usage.
Diversity:
- Added several functions related to analysis of clone size distributions, including
countClones()
,estimateAbundance()
andplotAbundance()
. - Renamed
resampleDiversity()
torarefyDiversity()
and changed many of the internals. Bootstrapping is now performed on an inferred complete relative abundance distribution. - Added support for inclusion of copy number in clone size determination within
rarefyDiversity()
andtestDiversity()
. - Diversity scores and confiderence intervals within
rarefyDiversity()
andtestDiversity()
are now calculated using the mean and standard deviation of the bootstrap realizations, rather than the median and upper/lower quantiles. - Added ability to add counts to the legend in
plotDiversityCurve()
.
Version 0.2.0: June 15, 2015
Initial public release.
General:
- Added citations for the
citation("alakazam")
command.
Version 0.2.0.beta-2015-05-30: May 30, 2015
Lineage:
- Added more error checking to
buildPhylipLineage()
.
Version 0.2.0.beta-2015-05-26: May 26, 2015
Lineage:
- Fixed issue where
buildPhylipLineage()
would hang on R 3.2 due to R change request PR#15508.
Version 0.2.0.beta-2015-05-05: May 05, 2015
Prerelease for review.
shazam R package
Version 0.1.2: February 20, 2016
General:
- Renamed package from shm to shazam.
- Internal changes to conform to CRAN policies.
- Compressed and moved example database to the data object
InfluenzaDb
. - Fixed several bugs where functions would not work properly when passed a
dplyr::tbl_df
object instead of adata.frame
. - Changed R dependency to R >= 3.1.2.
- Added stringi dependency.
Distance Profiling:
- Fixed a bug wherein
distToNearest()
did not return the nearest neighbor with a non-zero distance.
Targeting Models:
- Performance improvements to
createSubstitutionMatrix()
,
createMutabilityMatrix()
, andplotMutability()
. - Modified color scheme in
plotMutability()
. - Fixed errors in the targeting models vignette.
Mutation Profiling:
- Added the
MutationDefinition
objectsMUTATIONS_CHARGE
,MUTATIONS_HYDROPATHY
,MUTATIONS_POLARITY
providing alternate approaches to defining replacement and silent annotations to mutations when callingcalcDBObservedMutations()
andcalcDBExpectedMutations()
. - Fixed a few bugs where column names, region definitions or mutation models were not being recognized properly when non-default values were used.
- Made the behavior of
regionDefinition=NULL
consistent for all mutation profiling functions. Now the entire sequence is used as the region and calculations are made accordingly. calcDBObservedMutations()
returns R and S mutations also whenregionDefinition=NULL
. Older versions reported the sum of R and S mutations. The function will add the columnsOBSERVED_SEQ_R
andOBSERVED_SEQ_S
whenfrequency=FALSE
, andMU_FREQ_SEQ_R
andMU_FREQ_SEQ_R
whenfrequency=TRUE
.
Version 0.1.1: December 18, 2015
General:
- Swapped dependency on doSNOW for doParallel.
- Swapped dependency on plyr for dplyr.
- Swapped dependency on reshape2 for tidyr.
- Documentation clean up.
Distance Profiling:
- Changed underlying method of calcTargetingDistance to be negative log10 of the probability that is then centered at one by dividing by the mean distance.
- Added
symmetry
parameter to distToNearest to change behavior of how asymmetric distances (A->B != B->A) are combined to get distance between A and B. - Updated error handling in distToNearest to issue warning when unrecognized character is in the sequence and return an NA.
- Fixed bug in 'aa' model in distToNearest that was calculating distance incorrectly when normalizing by length.
- Changed behavior to return nearest nonzero distance neighbor.
Mutation Profiling:
- Renamed calcDBClonalConsensus to collapseByClone Also, renamed argument collapseByClone to expandedDb.
- Fixed a (major) bug in calcExpectedMutations. Previously, the targeting calculation was incorrect and resulted in incorrect expected mutation frequencies. Note, that this also resulted in incorrect BASELINe Selection (Sigma) values.
- Changed denominator in calcObservedMutations to be based on informative (unambiguous) positions only.
- Added nonTerminalOnly parameter to calcDBClonalConsensus indicating whether to consider mutations at leaves or not (defaults to false).
Selection Analysis:
- Updated groupBaseline. Now when regrouping a Baseline object (i.e. grouping previously grouped PDFs) weighted convolution is performed.
- Added "imbalance" test statistic to the Baseline selection calculation.
- Extended the Baseline Object to include binomK, binomN and binomP Similar to numbOfSeqs, each of these are a matrix. They contain binomial inputs for each sequence and region.
Targeting Models:
- Added
minNumMutations
parameter to createSubstitutionMatrix. This is the minimum number of observed 5-mers required for the substituion model. The substitution rate of 5-mers with fewer number of observed mutations will be inferred from other 5-mers. - Added
minNumSeqMutations
parameter to createMutabilityMatrix. This is the minimum number of mutations required in sequences containing the 5-mers of interest. The mutability of 5-mers with fewer number of observed mutations in the sequences will be inferred. - Added
returnModel
parameter to createSubstitutionMatrix. This gives user the option to return 1-mer or 5-mer model. - Added
returnSource
parameter to createMutabilityMatrix. If TRUE, the code will return a data frame indicating whether each 5-mer mutability is observed or inferred. - In createSubstitutionMatrix and createMutabilityMatrix, fixed a bug when multipleMutation is set to "ignore".
- Changed inference procedure for the 5-mer substitution model.
- Added inference procedure for 5-mers without enough observed mutations in the mutability model.
- Fixed a bug in background 5-mer count for the RS model.
- Fixed a bug in IMGT gap handling in createMutabilityMatrix.
- Fixed a bug that occurs when sequences are in lower cases.
Version 0.1.0: June 18, 2015
Initial public release.
General:
- Restructured the S4 class documentation.
- Fixed bug wherein example
Influenza.tab
file did not load on Mac OS X. - Added citations for
citation("shazam")
command. - Added dependency on data.table >= 1.9.4 to fix bug that occured with earlier versions of data.table.
Distance Profiling:
- Added a human 1-mer substitution matrix,
HS1FDistance
, based on the Yaari et al, 2013 data. - Set the
hs1f
as the default distance model fordistToNearest()
. - Added conversion of sequences to uppercase in
distToNearest()
. - Fixed a bug wherein unrecongized (including lowercase) characters would lead to silenting returning a distance of 0 to the neared neighbor. Unrecognized characters will now raise an error.
Mutation Profiling:
- Fixed bug in
calcDBClonalConsensus()
so that the function now works correctly when called with the argumentcollapseByClone=FALSE
. - Added the
frequency
argument tocalcObservedMutations()
andcalcDBObservedMutations()
, which enables return of mutation frequencies rather the default of mutation counts.
Targeting Models:
- Removed
M3NModel
and all options for using said model. - Fixed bug in
createSubstitutionMatrix()
andcreateMutabilityMatrix()
where IMGT gaps were not being handled.
Version 0.1.0.beta-2015-05-30: May 30, 2015
General:
- Added more error checking.
Targeting Models:
- Updated the targeting model workflow to include a clonal consensus step.
Version 0.1.0.beta-2015-05-11: May 11, 2015
Targeting Models:
- Added the
U5NModel
, which is a uniform 5-mer model. - Improvements to
plotMutability()
output.
Version 0.1.0.beta-2015-05-05: May 05, 2015
Prerelease for review.