Darren Heavens has witnessed a fascinating transition at The Genome Analysis Centre as the Norwich, UK-based institute shifted from data-generation mode to data-analysis mode. When the center launched more than five years ago, there was a fairly even split between laboratory-based scientists and bioinformaticians, Heavens says; today, there are about 15 laboratory scientists and nearly 70 bioinformaticians. The focus is on generating great data that lets the bioinformatics experts perform the highest-quality analyses.
Heavens, a team leader in the Platforms and Pipelines group, spends a lot of his time figuring out how to make the data produced at TGAC more amenable to bioinformatics crunching. One of the newest weapons in his arsenal is the SageELF, an automated system that produces 12 contiguous fractions from a DNA sample.
His prior experience with Sage Science instruments came from the BluePippin, which he began using for size selection of NGS libraries after a TGAC bioinformatician presented data on the variability of insert sizes in libraries he was trying to assemble. “He did the data analysis and found that BluePippin sizing improved his outputs no end,” Heavens recalls.
So it was a no-brainer for Heavens to try out the new SageELF, which he’s been using for a few months now. “It’s great because it gives us the chance to make multiple libraries from one sample,” he says, noting that this helps keep reagent and other costs in check. For experiments requiring a very specific insert size, Heavens likes to run a sample on the SageELF and map the fractions to assembly data to determine which best meets the criteria before going ahead with the rest of the experiment.
His team uses the instrument for long mate-pair NGS projects, restriction-digest sequencing, and sequencing projects focused on copy number variation. For CNVs, Heavens and his colleagues came up with a protocol using SageELF to separate PCR products; they then sequence the largest fraction to get an accurate view of the highest copy numbers present in the sample. “That gives us the true copy number,” he says. “The duplicated genes themselves are so similar that if you don’t have the full-length fragment, they just collapse down in the assembly.” The protocol, which they developed for a project for one client, was so successful that several other clients have now come to TGAC asking for the same method for their samples, Heavens says.
The biggest advantage of SageELF compared to other fractionation methods is its recovery, according to Heavens. His team gets 40% to 45% recovery from input material with the platform, while “with a manual approach you’d be lucky to get 10% to 15% recovery,” he says. “For us that’s a big plus.” He notes that scientists working with precious samples might find SageELF particularly useful for making the most of input DNA.
Heavens says setup and training were simple and straightforward, and that his team is now running the SageELF at or near capacity, which equates to two runs per day of two cassettes each. Since each cassette yields 12 fractions, that’s 48 fractions each day that the TGAC team could potentially use for sequencing. “It has opened up so many avenues for us,” Heavens says.
In a new BMC Genomics paper, scientists from Baylor College of Medicine describe a new method for accurate, affordable interrogation of structural variants across the human genome. We’re delighted to see that automated DNA size selection tools from Sage Science contributed to this important approach.
In the paper, lead authors Min Wang and Christine Beck, along with collaborators from Baylor’s genome center, cite the need for a method like this based on the difficulties of using next-gen sequencing for structural variant analysis. Short-read technologies generally produce sequence data that doesn’t span the variants, making it impossible to align and assemble them accurately. Long-read technology has shown great promise, but has been too expensive for large-scale, genome-wide analyses, the authors note.
So they developed a target-capture approach to enrich for structural variants at particular chromosomal locations. With oligo capture, they target specific insert sizes using the Pippin Prep for fragments up to 1 Kb and BluePippin for anything larger. After library prep is completed, the selected DNA is sequenced on a PacBio instrument. The process is known as the PacBio-LITS (large-insert targeted capture-sequencing) method and is especially noteworthy because it’s the first report of targeted sequencing for libraries with insert sizes greater than 1 Kb.
In this method, size selection is an essential step to the success of the pipeline. “Manual gel-extraction methods involving agarose gel electrophoresis can be used, but we have chosen Sage Science’s Pippin and BluePippin platforms to perform target size selection for improved accuracy and sample recovery,” the authors write, adding that they use “range mode” to preserve DNA complexity from the sample.
The Baylor team presents data from a study of three samples from patients with Potocki–Lupski syndrome. Scientists used PacBio-LITS to analyze structural rearrangements associated with the disease, looking particularly at breakpoint junctions of low-copy repeats (LCR). “We successfully identified previously determined breakpoint junctions … and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints,” the authors write.
The team posits that beyond structural variation, this new method could also be useful for validating indels and phasing haplotypes.
This week was a busy and educational one for the Sage Science team — we got to attend both the Association of Biomolecular Resource Facilities meeting in St. Louis and the Experimental Biology meeting here in Boston. We had booths at both exhibit halls, and we thank the many scientists who stopped by to learn more about our newest products, SageELF for protein fractionation and the PippinHT high-throughput automated DNA size selection instrument.
ABRF is an event for technology lovers and always gives us a chance to hang out with the savvy scientists who run core labs, vet new instruments, and develop meticulous methods to keep experiments operating smoothly. Standards were in the spotlight this year: in one session, Sarah Munro from the National Institute of Standards and Technology gave a talk; her work for the External RNA Controls Consortium as well as the newer Genome in a Bottle consortium has been very impressive. In another session, members of the ABRF team that performed a valuable study of next-gen and third-gen sequencing platforms presented their findings. If you missed their paper, check it out here. We also enjoyed the talk from Vanderbilt’s Daniel Liebler, who spoke about proteomics and cancer and the need to understand protein interactions. His report that mRNA levels don’t accurately predict protein expression was intriguing, but it was sobering to hear him say that funding for proteomics — a field that will be critical for precision medicine and other clinical advances — has dwindled.
If you weren’t at ABRF, check out the poster we presented there: “The ELF preparative electrophoresis system for size-based proteome fractionation.” It shows data from an E. coli protein extract, using SageELF to automatically separate and collect 12 contiguous size fractions in a short period of time. The SageELF can be used for automated 1D gel fractionation of proteins to increase the sensitivity of peptide detection in complex mixtures; it’s a great alternative to labor-intensive SDS-PAGE gels.
While Sage staffers were living it up in St. Louis, those of us at the Experimental Biology conference were getting a crash course in the latest and greatest in biochemistry. We attended the meeting with particular interest in the American Society for Biochemistry and Molecular Biology (ASBMB), one of the groups represented at the conference.
The award lectures were truly fantastic. Jack Dixon from the University of California, San Diego, spoke about how novel kinases are involved in phosphorylating secreted proteins, and Kathleen Matthews from Rice University, who talked about protein biochemistry, earned appreciation from graduate students with her call for mentoring to improve research success. Some attendees told us that this year’s ASBMB program was one of the best ever. We just wish we’d had more time to absorb all of the great science in the extensive poster hall.
Now it’s back to the office, where we’ll be able to put everything we’ve learned to work!
We couldn’t help noticing that “long reads” kept popping up in presentations and posters at AGBT, and we certainly weren’t alone. Aside from longtime long-read provider Pacific Biosciences and synthetic long-read service Moleculo, acquired by Illumina in 2012, new companies such as 10X Genomics and Dovetail Genomics were touting the value of this kind of information at AGBT.
We’re already seeing sessions on long-read sequencing on the agendas of other upcoming conferences, leading to our theory that 2015 will go down in sequencing history as the Year of Long Reads. It’s no wonder demand for this kind of data is soaring: after years of using short-read sequencers to analyze genomes, scientists are just now realizing how much information about structural variants, haplotype phasing, and other long-range, clinically relevant elements is inaccessible with short reads alone.
There are a couple of different approaches to long-read data. Single-molecule sequencing platforms, like those available through PacBio and Oxford Nanopore Technologies, generate truly long reads on their own. Users of both platforms have presented individual reads running well into tens of kilobases, a far cry from the few hundred bases we’re used to from Illumina and Ion Torrent sequencers. Assembling those long reads can lead to megabase-plus contigs.
But since the vast majority of sequencing data currently available has been produced with short-read technologies, there’s also a huge appetite for bolt-on products that can pull long-range information out of short-read data. Like their older sibling Moleculo, upstarts 10X Genomics and Dovetail Genomics focus on altering library prep in a short-read workflow to allow analytical tools to connect the sequence data into much longer blocks. These synthetic long reads have been shown to elucidate larger elements like structural variants without switching sequencing platforms.
Both approaches suggest an exciting trend that will let us get more out of each genome we sequence. Here at Sage Science, we’re pleased to report that our BluePippin automated DNA size selection platform can be used with either of these approaches to maximize the length of reads generated or synthesized. For an example of how BluePippin works with synthetic reads, check out this blog post; learn more about BluePippin with long-read sequencing in these app notes. And check back soon for new info on how the PippinHT can be used with long-read workflows too!
The Sage team has attended AGBT for years, and the 2015 meeting reminded us just how lucky we are to be part of this amazing community. For those of us who remember the first Marco conference in 2000, it is truly awe-inspiring to see just 15 years later that genomics is being used to treat, and even cure, patients around the world. We were humbled by the rapid and remarkable advances this community has enabled.
Some of our favorite talks this year focused on the human microbiome. Michael Fischbach from the University of California, San Francisco, spoke about naturally occurring molecules produced by the microbes that live in and on us. So many of these natural products are antibiotics that Fischbach joked the organisms had made an end-run around the FDA, finding a way to get these molecules into our systems without regulatory approval or a physician’s prescription. He noted that there’s still a lot to learn about the molecules that our microbes are synthesizing — it seems certain that discovering this information could have a major impact on how we view human health.
Rob Knight from the University of California, San Diego, presented work showing changes in microbiome from infancy onward; the profile evolves until age 2.5, at which point it has matured into the same profile seen in adults. He told attendees that despite the inability of genome-wide association studies to turn up reliably predictive genetic markers of obesity, analyzing the microbiome can reveal whether a person is lean or obese with 90 percent accuracy. Clearly, there’s a lot of uncharted territory in how our microbes are contributing to — or in some cases completely defining — various phenotypes.
There was also strong clinical content at AGBT, with impressive presentations describing how sequencing was used to diagnose patients or to suggest treatment options that are not the standard of care for a given condition. Steve McCarroll from Harvard Medical School gave a talk about how a collection of blood samples for a schizophrenia study led to the unexpected discovery of markers indicating early stages of blood cancer, long before the cancer could be diagnosed with traditional methods.
We can’t review all of the amazing talks and posters here, but suffice it to say, it was really great to witness the innovation, intelligence, and ingenuity driving the genomics community. Many thanks to the scientists who stopped by our suite to learn more about Sage Science, and we’re already looking forward to next year’s AGBT.