Two recent papers published in Science are not only landmarks in their fields, but also feature our BluePippin automated DNA size selection platform. We’re honored to be included in these important publications.
The first paper, “Design and synthesis of a minimal bacterial genome,” comes from Clyde Hutchison, Craig Venter, and collaborators. The team built a bacterial genome containing just 473 genes determined to be necessary for life. The achievement followed a robust testing process in which each gene in the Mycoplasma mycoides genome was systematically altered to determine whether it was necessary for the organism’s survival. After stripping out all non-essential DNA, the scientists were left with a 531 Kb genome. Interestingly, the function of nearly a third of all genes included in the final genome has not been determined. The team used BluePippin’s High-Pass protocol with the PacBio RS II for de novo assembly of the artificial genome.
The second paper comes from scientists at the University of Washington, the McDonnell Genome Institute, and other organizations. “Long-read sequence assembly of the gorilla genome” used PacBio sequencing to improve assembly quality by 150x compared to previous drafts of the gorilla genome, closing 93 percent of gaps and adding a significant amount of new sequence. Scientists got the best view yet of structural variation, ancestral evolution, and genetic diversity within our primate cousin, and created a valuable resource that will allow the community to make even more discoveries — especially about the difference between humans and closely related primates. BluePippin helped the scientists maximize read lengths by removing smaller fragments prior to sequencing.
Taken together, these publications get us a few steps closer to understanding life at its most basic level, as well as what makes us human. We’re eager to see how the research community will build on these great advances in the future.
It’s that time of year again — the time our kids look at us, shake their heads, and ask, “There’s a day to celebrate DNA? Seriously?”
But for those of us in the industry, DNA Day is a big deal. April 25th was chosen to honor major milestones in our understanding of DNA (Watson and Crick’s publication on the double helix structure and the completion of the Human Genome Project), and for the community it’s a great day to reflect on the remarkable advances in this field. Here, we consider a few areas where progress is particularly impressive.
Diversity of DNA: Never in history have we had such a clear view of the genetics of organisms from extremophiles to extinct species, and everything in between. Cheap sequencing allows scientists to go far beyond model organisms, exploring genomes all across the tree of life. RAD-based sequencing approaches have made it more affordable to do massive-scale genotyping of non-model organisms as well. In addition, we’re engaged in the largest-scale studies the field has ever seen, with multiple efforts aiming to recruit 1 million people in cohorts that were until recently inconceivable.
How DNA functions: At last year’s inaugural Festival of Genomics, we listened with great interest as Harvard’s Ting Wu described compelling work to understand the function of DNA based on its folding patterns. Conventional wisdom had long suggested that unwieldy DNA strands scrunch themselves up however they can, but Wu and other scientists have shown that the folding pattern is instead precisely selected, with a significant impact on the downstream functions of that DNA. Findings like this remind us that we’re still at the beginning of the story of DNA, with many more chapters to go before we can truly say we understand it. In a recent paper we really enjoyed, scientists demonstrated that they could encode, encrypt, and extract short messages inserted into synthetic DNA.
How we treat DNA: Today, we think of treating DNA as a component of the NGS pipeline, with lots of effort to improve sample prep for everything from FFPE DNA samples to museum samples or precious clinical samples. But down the road, we may literally treat our DNA, using tools like CRISPR to edit out genetic problems from living people as a standard clinical treatment.
We hope you’ll be doing something fun to celebrate DNA Day this year. Follow along on Twitter with #DNADay16 to see how the community’s making it a special event.
It’s a study that would make John le Carré proud. DARPA-funded MIT scientists published results of a new method for encrypting messages in synthetic DNA for highly secure communication. It popped up on our radar because our BluePippin automated size selection instrument was used during sequence analysis.
In the PLoS One paper, “Multiplexed Sequence Encoding: A Framework for DNA Communication,” authors Bijan Zakeri, Peter Carr, and Timothy Lu describe new approaches to encoding, encrypting, and fragmenting messages across multiple plasmids. “With synthesis and sequencing speeds rising, and costs rapidly declining, DNA is an intriguing option for the transfer and storage of digital information,” they write.
The team designed QWERTY-style keyboards to easily convert English words into nucleic acids, being careful to assign codons in a way that would minimize homopolymers in the resulting DNA sequence, though they note that users would be able to shuffle codon assignments for their own preference or to increase security of the message.
Next, they created what they call a “secret-sharing system” that encrypts the message and splits it across several DNA molecules, requiring the recipient to use a combination key to reveal the message. “This approach can add an additional layer of protection for a communication and also provide opportunities to explore introducing tiers of complexity within a communication that is afforded by the unique makeup of DNA as a chemical polymer for information storage,” the scientists write. (In a step we really enjoyed, the team also took the opportunity to encode decoy messages into the DNA.)
For the final part of the process, the team came up with a new approach to extract the original message. “We investigated a new method that allows for the multiplexed sequencing of multiple DNA molecules with a common primer, where regions within distinct DNA molecules that have matching information can be identified from a single sequencing reaction via chromatogram patterning,” they report. They validated the whole process by encoding watermarks, messages, and a combination key into six synthetic DNA strands, honoring the cryptography field by using an important World War II communication.
The scientists note that this work demonstrates proof of concept, and that they plan to follow up with additional innovations in future efforts.
Since RAD-seq was first developed, we’ve seen a number of new versions and approaches from an enthusiastic scientific community. The latest was recently published in PLoS One and demonstrates a RAD-based method suitable for analyzing degraded DNA, an essential step for studying samples stored in museum and other collections.
“Hybridization Capture Using RAD Probes (hyRAD), a New Tool for Performing Genomic Analyses on Collection Specimens” comes from lead authors Tomasz Suchan and Camille Pitteloud at the University of Lausanne and their collaborators in Russia, Poland, and the UK. The project was launched to overcome the challenges of using traditional RAD-seq methods, which require longer DNA fragments than are typically available in museum samples. “Museum collections … have not necessarily ensured optimal conditions for DNA preservation,” the authors write. “As a result, many museum specimens yield highly fragmented DNA — even for relatively recently collected samples, limiting their use for molecular ecology, conservation genetics, phylogeographic and phylogenetic studies.”
Their solution is a method called hyRAD, for hybridization RAD, which starts by using double-digest RAD-seq to produce DNA fragments from fresh samples of the species of interest. Those fragments then become capture probes for use with the degraded DNA samples. “Our method thus combines the simplicity and relatively low cost of developing RAD-sequencing libraries with the power and accuracy of hybridization-capture methods,” the team reports. “This enables the effective use of low quality DNA and limits the problems caused by sequence polymorphisms at the restriction site.”
The scientists tested this protocol on eight samples of Lycaena helle butterflies, followed by a validation project on 49 samples of the Palearctic grasshopper Oedaleus decorus. Like other RAD methods, they used Pippin Prep for size selection prior to sequencing.
“Not relying on the presence of restriction site, the method presented here should be also useful for broader phylogenetic scales, allowing sequencing homologous loci from more divergent taxa, which would not be possible to retrieve using classical RAD-seq approaches,” the scientists conclude.
AGBT is behind us, which means the Sage Science team is officially back to the land of fleece and flannel. We had a great time at the conference and especially enjoyed seeing all the attendees making the most of our selfie sticks on the dance floor at the closing party!
The final AGBT sessions were every bit as interesting as the rest of the meeting. Nick Loman’s talk describing the use of Oxford Nanopore MinIONs during the recent Ebola epidemic in West Africa was an amazing glimpse of the kind of field-based sequencing we’ve dreamed of for a long time. His observation that the weak link in the system was the need for a constant Internet connection (required for the sequencer’s base-calling software) underscores the basic logistical challenges we face in achieving our ultimate goal of being able to sequence anything, anywhere, anytime.
The presentation from HudsonAlpha’s Shawn Levy continued the trend of 10x Genomics data, one of the major themes of this year’s AGBT. His emphasis on the importance of phasing and of finding complex events and structural variants mirrors a growing recognition in the community that short-read data will have to be supplemented by other data sources — be it long-read sequencing, Hi-C data, synthetic long reads, genome maps, or something else — for maximum benefit. Dovetail Genomics, which uses a Hi-C approach, was mentioned in several talks at the conference and really seems to be gathering steam in the field.
Normally we’d have a whole year to rest up for the next AGBT, but this September the organization will host its inaugural precision health meeting. We’re eager to see the speaker list and agenda, and maybe even get to experience the Scottsdale, Ariz., meeting in person!