Tuesday, 10 March 2015

EXOME SEQUENCING MACHINES: ILLUMINA

robotic hand with DNA in a tube
The coding region (i.e. the part of the DNA which encodes for proteins, commonly known as exome) represents just 2% of the entire human genome, but it harbours more than 85% of all disease-causing mutations in humans. Exome sequencing is therefore the best and most cost-effective approach to investigate genetic diseases, especially where the cause (i.e. the gene) has not been discovered yet.

There is a bunch of companies producing sequencers capable to perform exome sequencing. Each machine comes with a list of add-on kits and services for (1) preparing the sequencing library (2) doing target enrichment and (3) performing the bioinformatic analysis of the data. 

Today the leader in this field is probably Illumina, but strong rivals such as Life Technologies, Pacbio RS, Complete Genomics ad others are also offering state of the art technology.

Illumina offers three different models of sequencers and three different enrichment kits. Choosing the optimal solution is largely depending on the purpose of testing (research vs clinic) and on the number of samples that need to be tested (research labs typically need machines with a higher output, whereas diagnostic labs are usually OK with a basis output level).

To sequence from dozens to hundreds of exomes per run, Illumina recommends to go for its flagship sequencer, the mighty HiSeq 2500 System; to sequence up to 12 exomes per run, the NextSeq 500 Sequencing System, presented as the first "high-throughput desktop" sequencer, is your machine. If you own a small diagnostic lab and need to run just one exome at a time (but maybe several multi-gene panels), then you could opt fot the smaller MiSeq

To perform exome sequencing Illumina offers three different enrichment and sequencing kits which can be used on all of its three sequencers. For clinical purposes there is the TrueSightTM One kit, which covers the coding region of 4,813 disease-associated genes selected from HGMD, OMIM and GeneTest.org. In total this kit covers about 12 millions of bases (12 Mb). For research purposes there are two additional kits: the Nextera Rapid Capture Exome and the Nextera Rapid Capture Extended Exome. These two kits are designed to cover 37 Mb from coding exons and 62 Mb from coding exons plus regulatory regions respectively. These kits are covering the majority of sequences contained in major databases like RefSeq, ENSEMBL and GENCODE among others. For any of these three kits the quantity of needed DNA is really low (50 ng per sample are enough) and the total preparation time is reported to be as low as 30 hours for up to 96 samples. 

Illumina's offer is then completed with tools for the bioinformatic analysis. These tools are organized in apps, which are accessible on a highly secured "cloud" system, called BaseSpace. For variant calling the Burrows-Wheeler Aligner and Genome Analysis Toolkit (BWA/GATK) method and the Illumina Isaac pipeline generate FASTQ data to produce results in VCF format. The Illumina Annotation Service finally makes these data usable also for non-bioinformaticians. To dive in an even deeper level of analysis, Illumina also offers the possibility to filter variants and analyze them based on the clinical/phenotipic information. The tools that allows this step is called NextBio Research and for academic, government or non-profit institution is also available in a free version.