The clonal diversity of the repertoire can be analyzed using the general form of the diversity index, as proposed by Hill in:
Hill, M. Diversity and evenness: a unifying notation and its consequences.
Ecology 54, 427-432 (1973).
Coupled with resampling strategies to correct for variations in sequencing depth. This package provides two approaches to assessing diversity:
resampleDiversity
.testDiversity
.A small example Change-O tab-delimited database file is included in the alakazam
package. Diversity calculation requires the CLONE
field (column) to be present in the Change-O file, as well as an additional grouping column. In this example we will use the grouping columns SAMPLE
and ISOTYPE
.
library(alakazam)
## Loading required package: ggplot2
## Loading required package: plyr
# Load Change-O file
file <- system.file("extdata", "changeo_demo.tab", package="alakazam")
df <- readChangeoDb(file)
The function resampleDiversity
performs uniform resampling, without replacement by default, of the input sequences and recalculates the clone size distribution, and diversity, with each resampling realization. Diversity (D) is calculated over a range of diversity orders (q) to generate a smooth curve.
# Compare diversity curve across values in the "SAMPLE" column
# q ranges from 0 (min_q=0) to 32 (max_q=32) in 0.05 incriments (step_q=0.05)
# A 95% confidence interval will be calculated (ci=0.95)
# 2000 resampling realizations are performed (nboot=2000)
sample_div <- resampleDiversity(df, "SAMPLE", min_q=0, max_q=32, step_q=0.05,
ci=0.95, nboot=2000)
# Compare diversity curve across values in the "ISOTYPE" column
# Analyse is restricted to ISOTYPE values with at least 30 sequences by min_n=30
# Excluded groups are indicated by a warning message
isotype_div <- resampleDiversity(df, "ISOTYPE", min_n=30, min_q=0, max_q=32,
step_q=0.05, ci=0.95, nboot=2000)
## Warning in resampleDiversity(df, "ISOTYPE", min_n = 30, min_q = 0, max_q =
## 32, : Not all groups passed min_n=30 threshold. Excluded: IgD
# Plot a log-log (log_q=TRUE, log_d=TRUE) plot of sample diversity
# Indicate number of sequences resampled from each group in the title
sample_main <- paste0("Sample diversity (n=", sample_div@n, ")")
p1 <- plotDiversityCurve(sample_div, main_title=sample_main,
legend_title="Sample", log_q=TRUE, log_d=TRUE)
# Plot isotype diversity using default set of Ig isotype colors
isotype_main <- paste0("Isotype diversity (n=", isotype_div@n, ")")
p2 <- plotDiversityCurve(isotype_div, colors=IG_COLORS, main_title=isotype_main,
legend_title="Isotype", log_q=TRUE, log_d=TRUE)
The function testDiversity
performs resampling and diversity calculation in the same manner as resampleDiversity
, but only for a single diversity order. Significance testing across groups is performed using the delta of the bootstrap distributions between groups.
# Test diversity at q=0 (species richness) across values in the "SAMPLE" column
# 2000 bootstrap realizations are performed (nboot=2000)
sample_test <- testDiversity(df, 0, "SAMPLE", nboot=2000)
sample_test
## An object of class "DiversityTest"
## Slot "tests":
## test pvalue delta_median delta_mad delta_mean delta_sd
## 1 RL01 != RL02 0.425 4 4.4478 4.234 4.701521
##
## Slot "summary":
## group median mad mean sd
## RL01 RL01 59 2.9652 58.5895 3.073876
## RL02 RL02 54 4.4478 54.3555 3.626940
##
## Slot "groups":
## [1] "RL01" "RL02"
##
## Slot "q":
## [1] 0
##
## Slot "n":
## [1] 100
##
## Slot "nboot":
## [1] 2000
# Test diversity across values in the "ISOTYPE" column
# Analyse is restricted to ISOTYPE values with at least 30 sequences by min_n=30
# Excluded groups are indicated by a warning message
isotype_test <- testDiversity(df, 2, "ISOTYPE", min_n=30, nboot=2000)
## Warning in testDiversity(df, 2, "ISOTYPE", min_n = 30, nboot = 2000): Not
## all groups passed min_n=30 threshold. Excluded: IgD
isotype_test
## An object of class "DiversityTest"
## Slot "tests":
## test pvalue delta_median delta_mad delta_mean delta_sd
## 1 IgA != IgG 0 9.593185 2.898800 9.683691 2.857496
## 2 IgA != IgM 0 22.818547 4.334606 22.721867 4.270961
## 3 IgG != IgM 0 32.350908 3.403138 32.405558 3.405566
##
## Slot "summary":
## group median mad mean sd
## IgA IgA 13.736264 2.8002404 13.819548 2.7031102
## IgG IgG 4.019293 0.9048857 4.135858 0.9432202
## IgM IgM 36.764706 3.4067096 36.541415 3.2633647
##
## Slot "groups":
## [1] "IgA" "IgG" "IgM"
##
## Slot "q":
## [1] 2
##
## Slot "n":
## [1] 50
##
## Slot "nboot":
## [1] 2000