Amplicon Sequencing Data Set Examples:
Here you can download published amplicon sequencing data sets and use them to test AmpliSAT tools.
- Bai, Y., Ni, M., Cooper, B., Wei, Y. & Fury, W. Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads. BMC Genomics 15, 325 (2014). This data set contains genomic sequences from exon 2 and exon 3 regions from class I HLA-A and HLA-B loci in five human cell lines sequenced with Illumina MiSeq paired-end 2×250 cycles (EBI accession number PRJEB4744).
- Technology: Illumina
- Minimum amplicon depth: 500 reads
- Minimum amplicon frequency (filtering): 10%
- Discard chimeras: yes
Real allele sequences were assigned by Sanger sequencing in 2 independent laboratories.
To make data compatible with AmpliSAS input format, tag sequences were incorporated at primer ends for each sample file, and all samples have been merged into a single file.
Files:
Suggested analysis parameters:
- Biedrzycka et al. Reliable genotyping of a co-amplifying gene family can be achieved using ultra-deep sequencing: a case study of MHC class I diversity in the sedge warbler Acrocephalus schoenobaenus. Birds in the genus Acrocephalus have dozens of co-amplifying MHC class I genes. This study is aimed to resolve complex genotypes in the sedge warbler (Acrocephalus schoenobaenus) using ultra-deep sequencing.
- Replicate 1 merged reads file. (Raw R1 data+Raw R2 data)
- Replicate 2 merged reads file. (Raw R1 data+Raw R2 data)
- Replicate 3 merged reads file. (Raw R1 data+Raw R2 data)
- Replicate 1 amplicon data file.
- Replicate 2 amplicon data file.
- Replicate 3 amplicon data file.
- Original genotyping results (only allele number shown for different genotyping methods and amplicon coverages).
- Technology: Illumina
- Maximum number of alleles per amplicon: 60
- Minimum amplicon depth: 5000 reads
- Exact length required: yes
- Minimum dominant frequency: 10%
- Minimum amplicon frequency (filtering): 0.4%
- Discard chimeras: yes
The data consists on merged Illumina MiSeq paired-end reads (2×300) from exon 3 of MHC class I genes in 24 individuals sequenced independently in 3 replicates.
The repeatability of genotyping using four different genotyping approaches (AmpliSAS vs. AmpliLEGACY ones) and the effect of amplicon coverage were evaluated using this data.
Files:
Suggested analysis parameters:
- Stutz, W. E. & Bolnick, D. I. Stepwise Threshold Clustering: A New Method for Genotyping MHC Loci Using Next-Generation Sequencing Technology. PLoS One 9, e100587 (2014). This data set consists of genomic sequences of MHC class IIb loci, exon 2 region, from 301 samples of the non-model organism the threespine stickleback (Gasterosteus aculeatus), sequenced with 454 GS FLX Titanium technology.
- Technology: 454
- Minimum amplicon depth: 500 reads
- Exact length required: yes
- Minimum dominant frequency: 22%
- Minimum amplicon frequency (filtering): 4.5%
- Discard chimeras: yes
This data had previously been analyzed with the Stepwise Threshold Clustering (STC) genotyping algorithm, and the original raw SFF file is available from NCBI (accession number SRR1177032).
Files:
Suggested analysis parameters:
- Herdegen, M., Babik, W. & Radwan, J. Selective pressures on MHC class II genes in the guppy (Poecilia reticulata) as inferred by hierarchical analysis of population structure. J. Evol. Biol. 27, 2347–2359 (2014). Data obtained by sequencing MHC class II (exon 2) in 13 individuals of guppy (Poecilia reticulata) on Illumina MiSeq and Ion Torrent PGM platforms (2 experimental replicates).
- Sequence File (replicate 1).
- Sequence File (replicate 2).
- Amplicon data file.
- Alleles file.
- Original genotyping results (2 spreadsheets, 1 per replicate).
- Technology: Illumina or IonTorrent
- Minimum amplicon depth: 500 reads
- Exact length required: yes
- Minimum dominant frequency: 12%
- Minimum amplicon frequency (filtering): 3%
- Discard chimeras: yes
Alleles were assigned by manual curation of de-multiplexed sequences, without clustering, using the empirical threshold method (Radwan et al. 2012; Promerová et al. 2013). Using a representative sample of sequences, they determined that the lower threshold, below which vast majority of variants could be explained as 1-2 bp substitution artefacts, was 3%, and the upper threshold, above which such artefacts are not found, was 12%. During genotyping, after removing sequences with indels, variants with frequencies less than the threshold of 3% were removed. The remaining variants were screened for chimeras, as well as 1-2 bp substitutions of more common variants on a case-by-case basis; such variants were removed, except when they constituted >12% of the reads within an amplicon (see Herdegen et al. 2014 for details).
Illumina MiSeq files:
Ion Torrent PGM files:
Suggested analysis parameters:
- Ferrandiz-Rovira, M., Bigot, T., Allainé, D., Callait-Cardinal, M.-P. & Cohas, A. Large-scale genotyping of highly polymorphic loci by next-generation sequencing: how to overcome the challenges to reliably genotype individuals? Heredity (Edinb). 114, 485–93 (2015).. 454 data from 144 Alpine marmots (Marmota marmota) for 2 MHC class I loci (UB and UD) and 2 MHC class II loci (DRB1 and DRB2).
- Sequence File.
- DRB1 amplicon data file.
- DRB2 amplicon data file.
- UB amplicon data file.
- UD amplicon data file.
- Alleles file.
- Original genotyping results.
- Technology: 454
- Minimum amplicon depth: 12 reads
- Exact length required: yes
- Minimum dominant frequency: 25%
- Minimum amplicon frequency (filtering): 15%
- Discard chimeras: yes
(Data available at GitHub repository: alFinder program example).
Files:
Suggested analysis parameters:
Disclaimer
Your use of any of these tools is at your own risk. We do not give any representation or warranty nor assume any liability or responsibility for the data nor the results posted (whether as to their accuracy, completeness, quality or otherwise). Access to these data is available free of charge for ordinary use in the course of research. By visiting the site, you accept our use of cookies and you accept that your data and results will be stored in our server.