Introduction
pRESTO is a toolkit for processing raw reads from high-throughput sequencing of lymphocyte repertoires.Dramatic improvements in high-throughput sequencing technologies now enable large-scale characterization of immunoglobulin repertoires, defined as the collection of trans-membrane antigen-receptor proteins located on the surface of T and B lymphocytes. The REpertoire Sequencing TOolkit (pRESTO) is composed of a suite of utilities to handle all stages of sequence processing prior to germline segment assignment. pRESTO is designed to handle either single reads or paired-end reads. It includes features for quality control, primer masking, annotation of reads with sequence embedded barcodes, generation of single-molecule consensus sequences, assembly of paired-end reads and identification of duplicate sequences. Numerous options for sequence sorting, sampling and conversion operations are also included.
pRESTO Tools
-
AlignSets
Provides operations to multiple align sets of sequences sharing the same annotation. -
AssemblePairs
Assembles paired-end reads into a complete sequence. -
BuildConsensus
Constructs a consensus sequence from sets of sequences sharing the same annotation. -
ClusterSets
Clusters groups of sequences sharing an annotation into sub-clusters. -
CollapseSeq
Removes duplicate sequences. -
ConvertHeaders
Converts sequence headers into the pRESTO annotation format. -
EstimateError
Generates an estimate of the sequencing error rates for a data set using UID read group information. -
FilterSeq
Filters sequences to high-quality reads using a variety of criteria. -
MaskPrimers
Trims or masks primer and barcode sequences in multiplexed runs and annotates reads accordingly. -
PairSeq
Uniformly sorts paired-end read files and copies annotations between mate-pairs. -
ParseHeaders
Manipulates sequence annotations. -
ParseLog
Converts the log output of pRESTO scripts into data tables. -
SplitSeq
Performs sampling, sorting and subsetting of sequence files.
Citation
pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoiresVander Heiden JA*, Yaari G*, Uduman M, Stern JNH, O'Connor KC, Hafler DA, Vigneault F, Kleinstein SH
Bioinformatics 2014; doi: 10.1093/bioinformatics/btu138
Additional Ig Repertoire Tools
-
BASELINe
Bayesian estimation of antigen-driven selection.
-
Change-O
Clonal assignment, lineage reconstruction, diversity analysis, mutation profiling and selection analysis.
-
TIgGER
Personal genotyping assignment and novel polymorphism detection.
-
S5F
A 5-mer microsequence context model of somatic hypermutation targeting and substitution rates.