A team of scientists from the Icahn School of Medicine at Mount Sinai, Weill Cornell Medical College, Cold Spring Harbor Laboratory, European Molecular Biology Laboratory, and other institutions published the first analysis of a diploid human genome produced by combining single-molecule technologies.
Lead authors Matthew Pendleton, Robert Sebra, Andy Pang, and Ajay Ummat, along with their colleagues, report that integrating results from different technology platforms led to significant improvements in contiguity, with scaffold N50 values nearly 30 Mb. The high-quality assembly also allowed the team to find complex structural variants that can’t be detected in assemblies produced from short-read data.
The scientists used SMRT® Sequencing from Pacific Biosciences as well as genome maps from BioNano Genomics. “Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality,” they report.
In the study, which sequenced the well-characterized NA12878 genome, scientists used BluePippin to perform size selection prior to SMRT Sequencing. By removing DNA fragments smaller than 7 Kb, the team generated extraordinarily long reads with the PacBio platform. “Without selection, smaller 2000 – 7000 bp molecules dominate the zero-mode waveguide loading distribution, decreasing the sub-readlength” that can be achieved with the sequencer, the authors write in the supplementary materials.
For more, check out the full paper here: “Assembly and diploid architecture of an individual human genome via single-molecule technologies.”
Last week we attended the first-ever Festival of Genomics, a new series of meetings taking place in Boston, San Mateo, and London. This conference was held in Boston’s biggest convention center, and featured a music festival kind of approach, with four stages of concurrent sessions in addition to plenty of other activities.
The Sage Science team was out in force, and we participated in many of those activities. Our CSO Chris Boles gave a talk in the Tech Forum, sharing details of a new product in development we’re calling the SageHLS. Built to help scientists generate ultra long DNA fragments for the new breed of technologies that need them — from optical mapping to single-molecule sequencing — the SageHLS will also help streamline the library prep process. More details will be available later this year.
Another element of the circus-like atmosphere was Race the Helix, a fundraising event for the Greenwood Genetic Center in which teams have 20 minutes on a treadmill to run as far as they can. Our own Alex Vira suited up and ran with the PacBio team, winning an impressive second place in a field of competitor teams. We’re proud to have helped raise money for a good cause!
Some 1,200 people registered for the conference, and the plenary talks were frequently standing room only. Great presentations came from Ting Wu, Craig Venter, Heidi Rehm, Diana Bianchi, and a host of others. We really enjoyed the concurrent session focused on long-read sequencing that included Mike Snyder, Chad Nusbaum, Dick McCombie, and a few other terrific speakers. One of the truly unique things about the event was an evening play about clinical genomics featuring a number of brave scientists, including Eric Green, Andy Faucett, and others. Who stole the show? Naturally, it was George Church in the role of God.
The festival heads west to San Mateo this fall, with a winter performance in London. We look forward to seeing how the organizers from Front Line Genomics continue to innovate at this fun meeting!
If you haven’t listened yet to the Mendelspod interview with Bobby Sebra from the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai, we can’t recommend it highly enough. And that’s not because we happen to be sponsoring this podcast series on DNA sequencing — it’s because Sebra offers up some really interesting perspectives on a range of topics.
For example, he talks with Mendelspod’s Theral Timpson about the institute’s Resilience Project, which is just now kicking into high gear. Sebra outlines efforts to scale up the sequencing facility to meet the needs of this massive project, which aims to scan the DNA of healthy people to find naturally occurring biological mechanisms that might help them escape the effects of disease-causing variants.
Sebra is the institute’s director of technology development, so of course the interview includes great information about his view of the different sequencing platforms and how he chooses which platform to use for which project (for example, short reads for resequencing, and long reads for reference-quality genomes). His take is that scientists get the best results by using multiple platforms to generate complementary data.
Our favorite part was the discussion of sample prep, which Sebra notes is becoming a bigger challenge for genomic scientists with the growing need for larger DNA fragments for long-read and single-molecule platforms. “The quality of your input material needs to be better,” Sebra says, calling for novel methods in DNA extraction and processing. While his team can currently make a 20 Kb to 50 Kb library with enough input material, he says the dream is being able to make these extremely large-fragment libraries from vanishingly small input.
Sebra covers several other compelling topics in the 27-minute podcast, such as his response to the accusation that the genomics revolution has fallen flat, what’s exciting in clinical genomics, the need for single-cell sequencing, and his experience with data from BioNano Genomics, 10X Genomics, and Oxford Nanopore. Be sure to check it out.
And if you missed the first installment in the series, here’s the podcast with Rod Wing at the Arizona Genomics Institute.
This week we’re traveling to Baltimore for the annual east coast user group meeting for Pacific Biosciences customers. We’re a sponsor of the event and look forward to the great scientific presentations these meetings have become known for. Click here to check out the latest resources and protocols for size-selecting PacBio libraries using Sage instruments.
This year for the first time there will be a half-day sample prep workshop, including talks on handling ultra-long DNA fragments, among others. We’re eager to see how PacBio users have made the most of BluePippin and SageELF, our automated DNA size selection platforms for long DNA fragments, in their research pipelines.
Held at the University of Maryland’s campus on June 17, the meeting will feature speakers from the National Institute of Standards and Technology, the United States Army Medical Research Institute of Infectious Disease, Baylor College of Medicine, Cold Spring Harbor Laboratory, and many others. We’re particularly looking forward to talks on fusion isoforms in breast cancer, de novo metagenomics, and diagnostic assays.
PacBio customers tend to be intrepid when it comes to trying new protocols and coming up with new methods, especially regarding sample prep. Each year at this meeting we’ve gotten a new glimpse into low-input sequencing or other technical achievements, so we anticipate great presentations from people who continue to push the envelope with long-read sequencing.
If you’ll be at the event, we hope you stop by our table and check out BluePippin and SageELF. If not, we’ll be tweeting and blogging, so stay tuned!
Regular readers of the Sage Science blog know that we can never resist a good methods paper. We enjoyed this publication from the phyloinformatics group at RIKEN detailing a modified protocol for generating high-quality mate-pair libraries while significantly reducing costs.
Lead author Kaori Tatsumi and colleagues focused on Illumina’s Nextera Mate Pair Sample Prep Kit, a workhorse for this application. They rely on mate-pair sequencing to improve the contiguity of de novo assemblies, particularly for non-model organisms. The Nextera kit “has significantly reduced the difficulty, preparation time and cost of preparing mate-pair libraries,” the scientists report, but “there remain opportunities to improve the efficiency and reduce costs for this preparation technique.”
The team tried several tactics to achieve this goal. They started with reducing the amount of enzyme used in the workflow, and also concocted their own homemade buffer. Tests showed that these worked at least as well as the original protocol. “The use of a reduced volume of enzyme and self-made buffer allows for a higher number of tagmentation reactions to be attempted using variable conditions,” the scientists write. “In particular, this modification, which yields larger DNA molecules, is advantageous in preparing mate-pair libraries spanning large distances (>10 kb), which tends to be hindered by low yields.”
They also swapped the manufacturer’s recommended order of the strand displacement step and the size selection step, sizing DNA first on the BluePippin and then performing strand displacement. “Reversing the order of strand displacement and size selection enables significantly smaller volumes of strand displacement polymerase and buffer for reduced amounts of size-selected DNA, which allows for a larger number of preparations (up to 3-fold) than the standard protocol,” Tatsumi et al. found. The revised process still yielded enough library volume for the rest of the pipeline.
Finally, they added more shearing steps to improve accuracy of reads recognized as mates, and adjusted the number of cycles of sequencing on an Illumina HiSeq. Compared with libraries prepared according to the original protocol, their method resulted in much longer scaffold lengths and a higher percentage of genes covered completely in the assembly. The revised protocol also costs significantly less than standard preparation.
Kudos to the RIKEN team for their hard work on a cool new method!