MolBio Research Highlights: May 2010

Tuesday, May 25, 2010

"The Hottest molbio topics" Series: A scientific spectator's guide to next-generation sequencing

Posted by Alejandro Montenegro-Montero at 12:29 PM

Our poll “The hottest molbio topics: the next few years” is now over and I have invited notorious bloggers (and experts in the fields in question) to discuss and contextualize the results for our readers, in a way to present them in a more attractive way [See Which will be the hottest topic in molecular biology in a few years? The results].

As you may remember, last week David García from You'd Prefer An Argonaute started off the “The hottest molbio topics” Series with his post entitled "The Allure of Regulatory RNAs", a great article on a field which has had an exponential growth over the last few years.

This week, we've invited Dr Keith Robison to contribute to this Series.

Keith spent 10 years at Millennium Pharmaceuticals working with various genomics and proteomics technologies and 2 years at Codon Devices, working on a variety of protein and metabolic engineering projects as well as monitoring a high-throughput gene synthesis facility. He is currently at Infinity Pharmaceuticals.

Also, Keith blogs over at Omics! Omics!.

So, why did people vote for “New DNA sequencing technologies and postgenomics” in our poll? What makes this field so attractive? What are the basics of this fast-growing area of research?

This is what Keith has to say about it:

A scientific spectator's guide to next-generation sequencing

The Vancouver Winter Olympics are over but still fresh in mind. Somehow due to a combination of work and power failures and American television executive myopia, I didn’t see any of my previous favorite event (long track speedskating) nor the women’s figure skating long program, but I did find a new favorite race event (any cross-country ski event). The Winter Games come along only every 4 years but for the most part the events don’t change, so it’s easy to pick up on them each time. However, in order to understand the judged events, it is important to understand how each performance is scored. If you don’t, you may find yourself announcing platinum medals to runners up.

A crowded competitive arena more specific to biologists is the ongoing race to make obsolete whatever sequencing instruments are currently in use. These are also kno

wn as second generation and third generation instruments, though nobody can agree on the rules for declaring a system third generation. With several players in the market and new entrants leaping in, it can get confusing. Layer on that, many of the established players have multiple instruments for sale. If you are trying to buy an instrument or purchase services, it’s important to understand the field. Most critically, no instrument is clearly “best” for all applications – each has its strengths and weaknesses.

So, what I’ll try to do here is give you a scorecard of how to grade the players. It’s not as systematic as the new figure skating rules, but there are a lot of angles to look at and I’ll try to touch on the key ones.

First, what are you trying to do? These instruments kick out huge numbers of sequences and have been applied to a wide variety of scientific problems: a complete census of all applications would run many pages and by the time you finished reading it, another application would have been published!
Nevertheless, there are a few major classes of applications. Genome sequencing can be roughly divided into de novo and resequencing, though in the middle are “reference-guided assemblies” (which refers to assembling a new genome using a related one as a rough guide). “Targeted resequencing” uses PCR or hybridization to defined sequences, in order to focus on small portions of a genome, such as coding exons (“exome”).

RNA-Seq, Digital Gene Expression (DGE) profiling and new versions of SAGE all attempt to look at the transcriptome and provide expression information with a sensitivity and precision superior to microarrays. RNA-Seq can also be used to identify new transcript isoforms or mutations in known transcriptomes or to work out novel transcriptomes.

Methyl-Seq reads out the methylation status of the genome. ChIP-Seq is a good example of functional genomics sequencing, in which some functional property (in this case transcription factor binding) is converted to short sequencing tags [See Analyzing the genome-wide chromatin landscape: ChIP-Seq]. There are easily a few dozen XXX-Seq flavors of functional genomics sequencing already. As we’ll see below, applications often change the point weighting for each attribute.

On sheer cost, there are three parameters to consider. First, there is the instrument cost – and not just the machine itself but all the required sample prep hardware and particularly some beefy computers to deal with all the data. Instruments are tending to run from U$250K to U$1M, with easily U$100K-U$1M in additional accessories and computers. A trend lately has been for the established players to announce scaled-down versions of their top-end machines, though none of these seem likely to go below the $250K mark. One reason I’m excited about the new Pacific Biosciences SMRT system is that it appears to be nearly complete out of the U$700K box (as you’ll see, I’m excited about all of the systems, making me the Scott Hamilton of sequencing), though there’s still a need for a compute farm to slurp the biology out of all that data. Another emerging system which is creating a lot of buzz around baseline cost is Ion Torrent’s, both because their technology does not require sophisticated optics (unlike all of the other systems described here) and the instrument itself is proposed to sell for only U$50K. Ion Torrent is also promising that upgrading will no longer require major modifications (or replacement) of the instrument, but rather will all be contained in their consumables kit.

Cost can, however, also be looked at on three other somewhat intertwined metrics: cost per run, cost per sample and cost per nucleotide. Many of these instruments cost U$10K-U$20K in reagents per run. This delivers a ton of data, but limits the number of experiments you can do. The number of samples per run depends on your application. Sequencing a human-sized genome still requires many runs, but many other applications require only a slice of one run. Options for sample multiplexing with embedded sequence barcodes or subdivisions of the instrument surface (via gaskets or separate channels) enable reducing the cost per sample but do nothing for the cost per experiment – though you might find a friend to share with or be able through a service provider or core lab to share with strangers. Finally, there is the cost per base, generally expressed in a cost per human genome sequenced at approximately 40X coverage. To show one example of how these trade off, the new PacBio machine has a great cost per sample (~U$100) and per run (you can run just one sample) but a poor cost per human genome – you’d need around 12,000 of those runs to sequence a human genome (~U$120K). In contrast, one can buy a human genome on the open market for U$50K and sub U$10K genomes will probably be generally available this year.

Different instruments involve different times from end-to-end due to their technologies. SMRT is again a huge winner, with end-to-end times under a day. 454, which was the first instrument of this class on the market, requires less than a day per run but the sample preparation can add a few days to that. On the other hand, instruments such as the ABI SOLiD and Illumina GA series (see here) may have runs lasting almost 2 weeks, plus more days for sample prep. For most research projects, you are not in such a hurry that this is a problem. On the other hand, if you are trying to monitor a new microbial epidemic or pick the right toxic chemotherapy for a cancer patient, only very short runtimes will work.

Sample prep is also a key issue – and one I don’t (yet) have hands on experience. Each manufacturer & their academic fans will take shots at the other systems schemes. This is one attraction of some of the newest systems such as from PacBio – in which sample prep is supposedly very easy. There is also now an active community of companies selling alternative sample preparation methods .

Most of these instruments are characterized by very short read lengths in comparison to the 700-1000+ reads commonly seen with Sanger technology. There has been improvement in getting longer reads, but many are quite short. 454 has demonstrated high quality reads to around 700 and low quality to 1000 and was the king in this regard. Illumina is generally available up to 90 bases and the top centers push that to 120 or 150, whereas ABI SOLiD tops out at 50. Complete Genomics uses really short reads to sequence human genomes, though there is a clever trick there. Polonator uses a similar trick, though not as extensively. Helicos is somewhere in SOLiD’s neighborhood. The new king, though, is PacBio, with reads routinely in excess of a kilobase and often 3-5Kb long. That crown, however, may not be worn long – Life Technologies has announced a technology which in theory has reads limited only by the length of the DNA template. Length is critical to genome sequencing and RNA-seq experiments, but really short reads in huge numbers are what counts for DGE/SAGE and many of the functional tag sequencing methods. Technologies with really long reads tend not to give as many, and with all of them you can always choose a much shorter run to enable the machine to be turned over to another job sooner – if your application doesn’t need long reads.

To make up for the very short actual reads, there are tricks to getting multiple such reads from the same DNA fragment, a strategy generally known as paired reads and more specifically mate-pairs or paired-ends. As suggested by the word “pair”, to date these have involved getting two reads. Illumina actually reads both ends of the input molecule, reflected in nomenclatures such as 2x100. Generally these are nearly symmetric, but some clever folks have schemes which use a short read to tag a fragment and a long read from the other end to get the useful information. Systems such as the Polonator and Complete Genomics use a series of enzyme digestions to create multiple such tags – the sequencing equivalent of double-double and double-double-double jump combinations. There are losses, but in theory this paired tagging could go on for many cycles, unlike skaters who run out of precious momentum. Helicos (“dark fill”) and PacBio (“strobe sequencing), which are both working with single molecules, can generate multiple tags from the same molecule. In addition to ameliorating the issue of short read lengths, all of these multiple tag sequences can provide critical linear information for resolving repeats in both de novo genome sequencing and resequencing, as well as more ability to ascertain the exon combinations in RNA-Seq.

But, that brings up accuracy. Keeping tabs on accuracy is particularly challenging since nobody talks about it unless they are proud of their numbers. SOLiD is pushing an error rate of 10^-6 (phred 60) and is now claiming that sample prep induced errors are starting to dominate actual sequencing errors. Other systems are worse, perhaps routinely delivering phred 20 or less. Getting lots of reads helps with consensus building; the random errors average out (any non-random errors might not, so this isn’t a panacea).

There is also the question of access. If you don’t have the cash to buy an instrument, nearly all of the established systems can be accessed via a service provider. Some of these providers are for-profit institutions while others are university core labs. It pays to shop around, as there is a wide variety of pricing offered, though unfortunately there is not (yet?) a New York Sequencing Exchange to enable facile matching of scientists and providers. An interesting variant on this strategy is Complete Genomics: they perform human genome sequencing as an end-to-end service and do not sell their technology.

Unlike the Olympics, the sequencing game is constantly changing. Several of the systems mentioned above are not yet released, and there are many companies working on either evolutionary or revolutionary new sequencing instruments.

So, going back to the original question that drove this post: why is this all so exciting? As suggested above, it can be viewed as a gigantic scientific competition, but there are some other key reasons to explain why progresses in sequencing technology are capturing such attention. First, the pace of advance is dizzying. Rarely has a scientific field sustained such a technological rush for so long. Second, in the field of genomics these new sequencers are game-changing. What previously took dedicated science factories years to do, can now be performed by small labs; with the availability of outsourced sequencing facilities you can truly sequence a genome from the comfort of your couch. A genome of great interest no longer needs to wait in line and be assigned a priority in a grueling competition. Finally and perhaps most importantly, these instruments are offering opportunities far beyond traditional genomics, even perhaps to perform experiments that have nothing to do with genomics. Our ability to explore many facets of living systems are being expanded by these advances, allowing us to contemplate experiments which before were pure fantasy. Crank your imagination up! The machines are no longer the bottleneck!

__
(Image credits: 1, 2, 3)

Thursday, May 20, 2010

"The hottest molbio topics" Series : The Allure of Regulatory RNAs

Posted by Alejandro Montenegro-Montero at 8:09 AM

First is David García, a graduate student at MIT working in David Bartel’s Lab. He’s been studying miRNAs for three years now and blogs over at You'd Prefer An Argonaute.

So, why did people vote for Regulatory RNAs? Here’s David’s take on the subject:

The Allure of Regulatory RNAs

The upward bound of popularity of regulatory RNAs has seemed limitless. Continually they are found to be involved in diverse biological processes at a rate of discovery that for the moment is perpetually increasing. They’re here to stay--they are not a fad. (While the peak in frequency of their appearance in high impact journals may have passed, their total publication rate continues to rise.) I think you can put the field of regulatory RNAs alongside transcription, splicing, and translation—not yet in terms of size, of course, but in the sense that they’re giving rise to new projects, labs, and institutes focused on their mode of gene regulation. The field shows no signs of deflating anytime soon.

In this piece, I’ll offer thoughts on why I think regulatory RNAs are so popular. It won’t be a review of the literature—this would be futile given numerous excellent reviews already out there (see below for a sampling). If you wish to gleam more info from the blogosphere about regulatory RNAs, I recommend you hop on over to the blog that I run, You’d Prefer An Argonaute, which leans heavily towards discussion of small RNA biology.

To begin let’s define “regulatory RNAs.” They’re RNA molecules that regulate something, often gene expression, but other things too like transposable elements or protection from viruses. The term “regulatory RNAs” is often used synonymously with small RNAs, such as miRNAs, siRNAs, piRNAs, and CRISPR RNAs. Larger non-coding RNAs, such as linc RNAs, also bear consideration, although their functions aren’t well understood at the moment. Since much more is known about the pathways in which small RNAs participate, I’m limiting my focus to them in this piece, and will use the terms interchangeably. There are numerous classes of small RNAs. Some can be expressed in many cell types, like miRNAs, while others such as piRNAs are tissue specific.

Following are the principal reasons for why I think regulatory RNAs are so popular today: they’re “new”; they’re widespread in nature; they’re involved in diverse biological pathways; studying them benefits the general use of RNAi in the lab, a technique that itself has revolutionized laboratory biological science; they’re relatively easy to study; their study has emerged alongside novel high-throughput methods; and finally there’s an intrinsic attraction to studying RNA because of the many biological processes it innervates. I’ll continue simply by expanding these points. But in a nutshell, the combination of these factors has made regulatory RNAs super-popular.

Despite regulatory RNAs being a relatively fresh discovery, they’re found throughout nature. It’s not often that a new mode of gene regulation, present across domains, is uncovered. They’re in bacteria, archaea, and eukaryotes, this last domain that, after closest study, has brought to light numerous classes and pathways in which regulatory RNAs participate. It’s hard to imagine how their presence escaped us until so recently.

Implicit with the ubiquitousness of regulatory RNAs in life is their importance. Knocking out central components of their pathways in whole animals results in severe phenotypes. The phenotypes resulting from knockout of individual regulatory RNAs has depended on the organism/cell type, and the targets of the RNAs. Misregulation of regulatory RNAs has also been implicated in human disease, and some biomedical researchers now seek them as diagnostic tools or targets for intervention. They serve as important components of many pathways, including developmental timing, protection from foreign elements, regulation of transcriptional regulators, patterning of tissues, suppression of repetitive elements, and others. There are just too many to fully list here.

The discovery of small regulatory RNA pathways has also offered a huge bonus to all researchers: RNAi. RNA interference as an experimental tool is now mainstream in the lab. In the post-genome era it has offered the simple promise to knockdown any gene one desires. Even so, the straightforwardness of RNAi can still benefit from improvements, and therefore the study of endogenous small RNA pathways continues to yield greater gains in our ability to silence genes for research purposes.

Regulatory RNAs can be studied with relatively uncomplicated methods. Anyone with even a rudimentary understanding of RNA biochemistry, gene cloning, and basic computer skills can begin to investigate whether small RNAs may be relevant to whatever biological process they’re interested in—no specific expertise is required. This isn’t to say, however, that it’s any easier to make firm, interesting and new conclusions from studying small RNAs (although the literature has suffered on occasion, by lowering the bar for what constitutes a solid study in the small RNA field, yielding to its vogue).

Modern high-throughput sequencing technologies, sometimes coupled with biochemical manipulations, have also spurred regulatory RNAs. Being composed of RNA that targets other RNA, small RNAs and their targets are ideal subjects for new sequencing technologies. Big, expensive toys that yield mountains of data, represent the dreams of many scientists, and in the case of small RNAs, companies like 454 and Illumina are developing sequencing technologies that satisfy these geeky needs while also generating highly useful results. Newly emerging applications of next-generation sequencing technologies, like the HITS-CLIP and PAR-CLIP methods, are going to be very useful in resolving which portions of the transcriptome are bound by certain RNA-binding proteins under specific conditions. The researchers who have developed these protocols have shown off small RNA targets as ideal guinea pigs for these applications. Sometimes it can be hard to tell whether the biological questions drive the technology or the other way around.

My final reason for their popularity is that regulatory RNAs offer researchers another reason to study the greatest biomolecule: RNA. Ribonucleic acid is central to gene expression and regulation, and protein production; it can be an enzyme; it can encode a genome; and RNA or an RNA-like molecule most likely preceded the emergence of life on earth. RNA is the king of multitaskers in molecular biology. Who wouldn’t want to study it?

-David Garcia

Want to know more about regulatory RNAs? Here's an incomplete but nonetheless useful list of reviews about endogenous regulatory RNAs:

RNA interference
Hannon, GJ
Nature. 2002 Jul 11;418(6894):244-51.

MicroRNAs: genomics, biogenesis, mechanism, and function
Bartel, D.P.
Cell 116:281-297, 2004.

Post-transcriptional small RNA pathways in plants: mechanisms and regulations
Hervé Vaucheret
Genes & Dev. 2006. 20: 759-771.

The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race
Aravin AA, Hannon GJ, Brennecke J.
Science. 2007 Nov 2;318(5851):761-4.

MicroRNAs: Target recognition and regulatory functions
Bartel, D.P.
Cell 136:215-233, 2009.

Small RNAs as guardians of the genome
Malone CD, Hannon GJ
Cell. 2009 Feb 20;136(4):656-68.

CRISPR/Cas, the Immune System of Bacteria and Archaea
Philippe Horvath and Rodolphe Barrangou
Science Vol. 327. no. 5962, pp. 167 – 170, 2010

__
(Image credit)

Tuesday, May 18, 2010

Which will be the hottest topic in molecular biology in a few years? The results

Posted by Alejandro Montenegro-Montero at 11:00 AM

Some time ago, I posted a poll here at MolBio Research Highlights asking people to vote for the topic they thought was going to be hottest in molecular biology in a few years [See The hottest molbio topics: the next few years]. Some of the areas listed in that poll are hot even right now, but may continue to rise in the next couple of years.

The poll is now over and I wanted to share the results with you in an attractive way, so I decided to accompany these results with blog posts discussing them. These posts will be written by experts in the respective areas of research. In order to do this, I asked a few scientists, which are also notorious bloggers, to answer a simple question: “why do you think people voted for that specific field?”. Further, I encouraged them to extend their posts to include and discuss anything else they wanted, within that respective field.

So without further ado, here are the poll results:

Stay tuned for the posts discussing these results which will be appearing here soon. David García from You'd Prefer An Argonaute will be talking about “Regulatory RNAs” while Keith Robison from Omics! Omics! will be discussing “New DNA sequencing technologies and postgenomics”.

Do you want to talk about the other fields listed and get your post featured here in MolBio Research Highlights? Feel free to send me an email!

Monday, May 17, 2010

[Guest host] Picks of the week is back!

Posted by Alejandro Montenegro-Montero at 1:17 PM

Another week has gone by and some very interesting molbio blog posts have been aggregated to Researchblogging.org. Every week [see my opening post on the matter], although lately I just haven't had the time, I'll select some blog posts I consider particularly interesting in the field of molecular biology [see here to get a sense of the criteria that will be used], briefly describe them and list them here for you to check out. Every so often, Picks of the Week (PoW) is hosted elsewhere, and this week PoW has a fantastic host: Alexander Knoll who blogs over at Alles was lebt.

So go over to his blog and check this week's issue of Picks of the Week, entitled

"Mitochondrial Aging, progress on triple-negative breast cancer and the probability of a common ancestor are my MolBio Picks of the Week"

Saturday, May 15, 2010

Where you end up publishing your paper...

Posted by Alejandro Montenegro-Montero at 12:59 PM

...is not what's most important. Focus your time and energy on trying to do a good job and on being thorough and analytical about your research, rather than on worrying about metrics and whatnots....

I've just read a short article while browsing around the latest issue of LabTimes and wanted to share this short quote with you.

“Don’t worry about it, old chap. Your article is good, actually very good. And these days, good articles will be read – and cited – wherever they appear.”

Ralf Neumann (See the short story here)

This is actually very true in this day and age, when many life scientists find out about new articles through PubMed or other databases, rather than by browsing through the print issue of their favorite journal (or the corresponding electronic Table of Contents). At least in my case, I learn about new articles important for my research through customized PubMed searches (the results of which are sent to my email) and not by reading the Table of Contents of a few hand-picked journals. This would just be silly...

__
(Image credit)