Sequence/reads file: FASTA/FASTQ (compressed or uncompressed) without spaces, parenthesis or other special symbols in the name
If reads have been already de-multiplexed into separate files (one file per sample), you can pack them into a single .zip or .tar.gz format file and use it as input.
Paired-end reads file (optional): FASTQ (compressed or uncompressed)
CDR3 region pattern:
Custom pattern in REGEX format to extract the CDR3 region from the TCR sequences.
If you choose a species, the internal pre-defined patterns will have priority respect to the custom one.

Choose species: Uses specific parameters and reference sequences for the selected species.

Amplicon data: It is very important to specify all the primer and tag sequences in 5'->3' sense.
Shortening primer sequences to 7-9 nts can increase the number of retrieved sequences (eg. GAGTGTCAT instead of GAGTGTCATTTCTCCAACGGGA).
Unique Molecular Identifier sequences (UMIs) must be indicated between parenthesis.

Alleles file (optional): FASTA format Max. 2 MB See example

Additional options:

Check reverse complementary: Checks the reverse complementary sequences if it fails to recognize the CDR3 region in the original orientation (it will double analysis time).
Number of reads to process: Maximum number of sequences/reads to be processed.
Number of UMIs to process: Maximum number of UMIs to be processed.
Remove singletons: Removes singletons (CDR3 variants with coverage 1).
Cluster CDR3 within UMIs: If at least half of the sequences within the same UMI cluster are similar (up to 2 substitutions), the consensus sequence is retrieved as a unique variant.
Cluster CDR3 errors: Groups together CDR3 sequences that differ in 1 or 2 substitutions. Use this option carefully because clusters may include several real CDR3 variants


