Hamid Ashrafi is working to breed higher-quality blueberries that are amenable to mechanical harvest, larger in size, tastier, and have a longer shelf life. As an assistant professor at North Carolina State University, Ashrafi is bringing genomic tools to a long-running blueberry breeding program at the school, integrating the classical breeding with modern breeding.
Blueberries present a real challenge for genome sequencing and assembly: they naturally occur in diploid, tetraploid, and even hexaploid genomes. A draft genome assembly exists, though it isn’t publicly available, and Ashrafi and his colleagues at Kannapolis campus are trying to improve it with new sequencing tools like PacBio, 10x Genomics, Dovetail Genomics, and BioNano Genomics. He is also studying the plant’s transcriptome, which has not been covered extensively before.
Ashrafi relies on core facilities to perform the sequencing, but prefers to handle sample prep in his own lab to reduce the sample preparation turnaround time as well as to train students and postdocs. For size selection, he chose the BluePippin and SageELF automated platforms from Sage Science because they could handle the large fragments needed for long-read sequencing libraries. Recently, he has been using the new 30 Kb protocol for PacBio libraries and has been pooling fractions for Iso-Seq analysis with the SageELF.
The SageELF, which separates an entire sample by size into 12 contiguous fractions, is a good fit for genome and transcriptome sequencing with PacBio. “It reduces the amount of work that you do,” Ashrafi says. “When you make one library, you can fractionate all of it. You can define which fractions you want and combine them, and you only have to run it one time.”
For example, he might split fractions into groups of 10-20 Kb, 20-30 Kb, and 30+ Kb for genome sequencing so the downstream data represents the whole blueberry tissue sample. For Iso-Seq analysis of gene expression, Ashrafi likes to combine fractions into a few bins, which helps boost library yield for deeper sequencing coverage. “Instead of running Iso-Seq for each of the fractions,” Ashrafi says, “you can combine fractions and have enough DNA to run more SMRT Cells.”
Now that he’s become an expert in size selection for long-read sequencing, Ashrafi says his next step is to begin deploying BluePippin for short-read libraries as well.
We released our SageELF instrument two years ago, and seeing how scientists have adopted it for various NGS pipelines has been a wonderful journey. If you haven’t noticed these great uses, we’ll get you up to speed with this quick recap.
SageELF is a unique size-selection platform for scientists who need something more sophisticated than the traditional options. It takes a DNA sample and separates it by size into 12 contiguous fractions; the high yield makes the instrument an especially nice fit for precious samples. Users can then advance the optimally sized fraction for analysis, or pool multiple fractions for a more customized approach. SageELF can also resolve large DNA thanks to its built-in pulsed-field electrophoresis technology.
One of the first applications we saw was in mate-pair sequencing, driven by experts like Darren Heavens at The Genome Analysis Centre. He led a team that developed a new protocol for generating long mate-pair (LMP) libraries using the SageELF (check it out in Biotechniques or read our blog post). The method saves time and money and decreases the amount of input DNA needed.
“Using the SageELF streamlines the library construction process, allowing LMP libraries >10 kb to be constructed in under 2 days with <10 µg input material,” the TGAC scientists reported. “For many genome projects, multiple insert size LMP libraries are required, and the ability to construct up to 12 discretely sized libraries for a combined reagent cost of $1270 compared with the reagent cost of $715 for a single insert size LMP library highlights the potential cost savings.”
Heavens also came up with a method to analyze copy number variation more reliably with SageELF. His team separates PCR products with the instrument, and then sequences the largest fraction to determine the highest copy numbers present in the sample. “That gives us the true copy number,” he says. “The duplicated genes themselves are so similar that if you don’t have the full-length fragment, they just collapse down in the assembly.”
More recently, we’ve seen adoption of the SageELF among PacBio users working with the Iso-Seq method. The contiguous fractions allow for pooling of samples prior to sequencing, which helps scientists build the ideal library for their full-length isoform studies.
Other labs are just getting started with their SageELF instruments, and we can’t wait to see the creative uses they discover for it!
It’s June, and you know what that means: genomics conference season is back! The Sage team will be attending several events this month and we hope to see you at least once.
We kick off next week with PacBio’s annual East Coast User Group Meeting, held in Baltimore June 8th on the University of Maryland campus. We look forward to this event each year because it’s a great glimpse of the cutting-edge science happening around long-read sequencing. This year, there will be a half-day sample prep workshop before the general meeting, and we couldn’t be more excited if we tried. (Hey, we’re sample prep people. Don’t judge.) PacBio users are doing all sorts of cool things in this area, from lowering input requirements to incorporating our SageELF and pooling size fractions for customized pipelines — it’ll be great to see what they’ve accomplished now. If you’re attending the meeting, be sure to track us down and ask about the Iso-Seq method promo we’re launching at the event.
Next up is the annual meeting of the American Society for Microbiology, taking place June 16-20 in our hometown of Boston and featuring an opening keynote from Bill Gates. When we’re not glued to the stage learning about the growing Zika epidemic, we’ll be camped out in booth #306 in the exhibit hall. Stop by and we’ll be happy to discuss your microbial research and help you consider whether automated DNA size selection would make a difference in your work.
Finally, the month wraps up with the second Boston-based Festival of Genomics, June 27-29. This new series of festivals has been such a great surprise: cool science and a different approach, perhaps most obvious this year from the fact that registration is free for everyone. (Last year it was most obvious from the treadmill placed at the entrance; if you missed it, check out this blog post with our favorite Sage photo ever.) We’ll be in booth #214 in the lab zone, happy to field questions or make suggestions about your DNA sequencing workflow.
Assuming we survive it all, we’re awfully glad that July starts with a holiday!
More and more scientists are using their PacBio systems for transcriptome studies, generating full-length isoforms with the Iso-Seq method. The number of novel transcripts discovered and the implications for alternative splicing are a not-so-subtle reminder that we still have a lot to learn about gene expression.
Full RNA transcripts have lengths up to 10kb, with the largest proportion typically falling in the 3-5 kb range. Since SMRT sequencing can read the transcript from beginning to end, PacBio recommends binning the transcripts into four size ranges for comprehensive isoform surveys; 0.8-2 kb, 2-3 kb, 3-5 kb, and 5-10 kb.
PacBio provides two template preparation protocols that feature our DNA size selection instruments: a BluePippin guide and a SageELF guide. The BluePippin collects a single size fraction from each of four samples per run, while the SageELF collects 12 contiguous size fractions from a sample and can process one or two samples per run.
Here are a few details to illustrate the differences between the platforms.
- Also validated for use with long-read and Roche/NimbleGen SeqCap template protocols
- Simplified workflow to collect one fraction from each sample
- Size cut-offs are more accurate and reproducible
- Requires >5 ug of starting DNA
- Final library size bins have more continuous overlapping, improving bioinformatics analysis
- More user flexibility for combining the size bins
- Unused fractions can be recovered and saved
- Requires 3-5 ug of starting DNA
We’re pleased that these two platforms have been helpful in the Iso-Seq workflow. A newly released 10-40 kb fractionation protocol for the SageELF should make it even more useful for long-range pipelines.
If you’re a PacBio user interested in trying out the SageELF for Iso-Seq size selection, let us know.
The Sage Science R&D team has been hard at work on our newest tool, to be released later this year. The HLS platform, which we first described at the Festival of Genomics meeting in Boston last year, is our answer to the growing need to generate high molecular weight DNA fragments directly from blood or cell suspensions for long-range sequencing.
As the sequencing community shifts its focus from short-read to long-range information — from single molecule long reads or synthetic long reads — the pressure is on for sample prep processes to adjust accordingly. Sample prep pipelines that work for 200-base fragments simply can’t scale to handle 50 kb fragments. We believe that new approaches are needed to enable workflows with high molecular weight DNA, and that’s where the HLS platform comes in.
Here’s how it works: we load samples into a gel, where we perform cell lysis, enzyme processing, and contaminant removal. Thanks to electrophoresis, this all moves much faster than it would on a regular gel, and the megabase-scale DNA is large enough to be stuck in the agarose. After purification, the DNA is lightly cleaved, allowing it to be retrieved from the gel in an automated elution process.
To see how the HLS prototype performs, check out this poster describing an experiment with human cultured cells and goat whole blood. DNA fragments extracted with the HLS were often tens of kilobases, or even megabases, long.