Our poll “The hottest molbio topics: the next few years” is now over and I have invited notorious bloggers (and experts in the fields in question) to discuss and contextualize the results for our readers, in a way to present them in a more attractive way [See Which will be the hottest topic in molecular biology in a few years? The results].
As you may remember, last week David García from You'd Prefer An Argonaute started off the “The hottest molbio topics” Series with his post entitled "The Allure of Regulatory RNAs", a great article on a field which has had an exponential growth over the last few years.
This week, we've invited Dr Keith Robison to contribute to this Series.
Keith spent 10 years at Millennium Pharmaceuticals working with various genomics and proteomics technologies and 2 years at Codon Devices, working on a variety of protein and metabolic engineering projects as well as monitoring a high-throughput gene synthesis facility. He is currently at Infinity Pharmaceuticals.
Also, Keith blogs over at Omics! Omics!.
So, why did people vote for “New DNA sequencing technologies and postgenomics” in our poll? What makes this field so attractive? What are the basics of this fast-growing area of research?
This is what Keith has to say about it:
A scientific spectator's guide to next-generation sequencing
The Vancouver Winter Olympics are over but still fresh in mind. Somehow due to a combination of work and power failures and American television executive myopia, I didn’t see any of my previous favorite event (long track speedskating) nor the women’s figure skating long program, but I did find a new favorite race event (any cross-country ski event). The Winter Games come along only every 4 years but for the most part the events don’t change, so it’s easy to pick up on them each time. However, in order to understand the judged events, it is important to understand how each performance is scored. If you don’t, you may find yourself announcing platinum medals to runners up.
A crowded competitive arena more specific to biologists is the ongoing race to make obsolete whatever sequencing instruments are currently in use. These are also kno

So, what I’ll try to do here is give you a scorecard of how to grade the players. It’s not as systematic as the new figure skating rules, but there are a lot of angles to look at and I’ll try to touch on the key ones.
First, what are you trying to do? These instruments kick out huge numbers of sequences and have been applied to a wide variety of scientific problems: a complete census of all applications would run many pages and by the time you finished reading it, another application would have been published!
Nevertheless, there are a few major classes of applications. Genome sequencing can be roughly divided into de novo and resequencing, though in the middle are “reference-guided assemblies” (which refers to assembling a new genome using a related one as a rough guide). “Targeted resequencing” uses PCR or hybridization to defined sequences, in order to focus on small portions of a genome, such as coding exons (“exome”).
RNA-Seq, Digital Gene Expression (DGE) profiling and new versions of SAGE all attempt to look at the transcriptome and provide expression information with a sensitivity and precision superior to microarrays. RNA-Seq can also be used to identify new transcript isoforms or mutations in known transcriptomes or to work out novel transcriptomes.
Methyl-Seq reads out the methylation status of the genome. ChIP-Seq is a good example of functional genomics sequencing, in which some functional property (in this case transcription factor binding) is converted to short sequencing tags [See Analyzing the genome-wide chromatin landscape: ChIP-Seq]. There are easily a few dozen XXX-Seq flavors of functional genomics sequencing already. As we’ll see below, applications often change the point weighting for each attribute.

Cost can, however, also be looked at on three other somewhat intertwined metrics: cost per run, cost per sample and cost per nucleotide. Many of these instruments cost U$10K-U$20K in reagents per run. This delivers a ton of data, but limits the number of experiments you can do. The number of samples per run depends on your application. Sequencing a human-sized genome still requires many runs, but many other applications require only a slice of one run. Options for sample multiplexing with embedded sequence barcodes or subdivisions of the instrument surface (via gaskets or separate channels) enable reducing the cost per sample but do nothing for the cost per experiment – though you might find a friend to share with or be able through a service provider or core lab to share with strangers. Finally, there is the cost per base, generally expressed in a cost per human genome sequenced at approximately 40X coverage. To show one example of how these trade off, the new PacBio machine has a great cost per sample (~U$100) and per run (you can run just one sample) but a poor cost per human genome – you’d need around 12,000 of those runs to sequence a human genome (~U$120K). In contrast, one can buy a human genome on the open market for U$50K and sub U$10K genomes will probably be generally available this year.
Different instruments involve different times from end-to-end due to their technologies. SMRT is again a huge winner, with end-to-end times under a day. 454, which was the first instrument of this class on the market, requires less than a day per run but the sample preparation can add a few days to that. On the other hand, instruments such as the ABI SOLiD and Illumina GA series (see here) may have runs lasting almost 2 weeks, plus more days for sample prep. For most research projects, you are not in such a hurry that this is a problem. On the other hand, if you are trying to monitor a new microbial epidemic or pick the right toxic chemotherapy for a cancer patient, only very short runtimes will work.
Sample prep is also a key issue – and one I don’t (yet) have hands on experience. Each manufacturer & their academic fans will take shots at the other systems schemes. This is one attraction of some of the newest systems such as from PacBio – in which sample prep is supposedly very easy. There is also now an active community of companies selling alternative sample preparation methods .
Most of these instruments are characterized by very short read lengths in comparison to the 700-1000+ reads commonly seen with Sanger technology. There has been improvement in getting longer reads, but many are quite short. 454 has demonstrated high quality reads to around 700 and low quality to 1000 and was the king in this regard. Illumina is generally available up to 90 bases and the top centers push that to 120 or 150, whereas ABI SOLiD tops out at 50. Complete Genomics uses really short reads to sequence human genomes, though there is a clever trick there. Polonator uses a similar trick, though not as extensively. Helicos is somewhere in SOLiD’s neighborhood. The new king, though, is PacBio, with reads routinely in excess of a kilobase and often 3-5Kb long. That crown, however, may not be worn long – Life Technologies has announced a technology which in theory has reads limited only by the length of the DNA template. Length is critical to genome sequencing and RNA-seq experiments, but really short reads in huge numbers are what counts for DGE/SAGE and many of the functional tag sequencing methods. Technologies with really long reads tend not to give as many, and with all of them you can always choose a much shorter run to enable the machine to be turned over to another job sooner – if your application doesn’t need long reads.

But, that brings up accuracy. Keeping tabs on accuracy is particularly challenging since nobody talks about it unless they are proud of their numbers. SOLiD is pushing an error rate of 10^-6 (phred 60) and is now claiming that sample prep induced errors are starting to dominate actual sequencing errors. Other systems are worse, perhaps routinely delivering phred 20 or less. Getting lots of reads helps with consensus building; the random errors average out (any non-random errors might not, so this isn’t a panacea).
There is also the question of access. If you don’t have the cash to buy an instrument, nearly all of the established systems can be accessed via a service provider. Some of these providers are for-profit institutions while others are university core labs. It pays to shop around, as there is a wide variety of pricing offered, though unfortunately there is not (yet?) a New York Sequencing Exchange to enable facile matching of scientists and providers. An interesting variant on this strategy is Complete Genomics: they perform human genome sequencing as an end-to-end service and do not sell their technology.
Unlike the Olympics, the sequencing game is constantly changing. Several of the systems mentioned above are not yet released, and there are many companies working on either evolutionary or revolutionary new sequencing instruments.
So, going back to the original question that drove this post: why is this all so exciting? As suggested above, it can be viewed as a gigantic scientific competition, but there are some other key reasons to explain why progresses in sequencing technology are capturing such attention. First, the pace of advance is dizzying. Rarely has a scientific field sustained such a technological rush for so long. Second, in the field of genomics these new sequencers are game-changing. What previously took dedicated science factories years to do, can now be performed by small labs; with the availability of outsourced sequencing facilities you can truly sequence a genome from the comfort of your couch. A genome of great interest no longer needs to wait in line and be assigned a priority in a grueling competition. Finally and perhaps most importantly, these instruments are offering opportunities far beyond traditional genomics, even perhaps to perform experiments that have nothing to do with genomics. Our ability to explore many facets of living systems are being expanded by these advances, allowing us to contemplate experiments which before were pure fantasy. Crank your imagination up! The machines are no longer the bottleneck!
__
(Image credits: 1, 2, 3)

