In a highly accessed paper in BMC Medical Genomics, scientists from McGill University and EMBL tested several steps to find the most robust pipeline for discovering small non-coding RNAs (sncRNAs) that might be useful as biomarkers. As part of this effort, they evaluated several sizing options for microRNAs.
“Biomarker discovery: quantification of microRNAs and other small non-coding RNAs using next generation sequencing” comes from lead author Juan Pablo Lopez, senior author Carl Ernst, and several collaborators. The team sequenced 45 samples with Illumina platforms and validated the sequence data with qRT-PCR. “Our results show that good quality sequencing libraries can be prepared from small amounts of total RNA and that varying degradation levels in the samples do not have a significant effect on the overall quantification of sncRNAs via NGS,” the authors report.
Size selection of small RNAs has long been a challenge in workflows like these, as microRNAs and other sncRNAs tend to be very close in size to the adapters and other artifacts that must be removed to get the best results. In this study, scientists compared the Pippin Prep from Sage Science to Novex TBE PAGE gels and AMPure XP beads. Noting that their goal was to evaluate pros and cons of each technique, rather than choosing the best, they report, “We were able to obtain good quality sequencing libraries for all samples, but nonetheless, we found significant differences across purification methods.”
One of those differences was library yield. “The four libraries purified using [Pippin] also showed single peaks corresponding to miRNAs, but these libraries contained more than 50 times more product after purification, as compared to the Novex gel method,” the team writes.
After sequencing, the team assessed reads produced from each sizing method, as well as from a control library with no purification step. “Libraries prepared using [Pippin] gave the highest number of total reads with an average of 11.8 M reads per sample, while with the others we obtained only an average of 8.8 M (Novex), 9.1 M (AMPure) and 8.5 M (no purification),” Lopez et al. write. They add that Pippin sizing also identified more distinct miRNAs than any other protocol, and had the highest specificity to miRNAs.
Based on that, the team recommends Pippin Prep for medium-size projects. Our PippinHT was released after this project was completed and is a good option for scientists interested in the same high-quality, automated approach with significantly higher throughput.
We’re packing our bags for Baltimore, land of crab cakes and this year’s annual meeting of the American Society of Human Genetics. With some 6,500 scientists expected to attend, the conference is one of the largest in the field — and with that comes a remarkable array of talks, posters, educational workshops, and much more.
The meeting will kick off with a splash: a presidential symposium featuring Francis Collins, David Hunter, Naomi Wray, and Marylyn Ritchie. The speakers will talk about precision medicine, large-scale genomic studies, and integration of electronic health records for clinical impact. ASHG always does a great job honoring the field’s best and brightest; this year, awards will be given to Emmanuelle Charpentier, Kay Davies, Jennifer Doudna, Leonid Kruglyak, and Hunt Willard, among many others.
If you’re attending the meeting, we hope you’ll have a chance to check out poster #1936, “An integrated method for extraction of high-molecular-weight DNA and preparation of genomic sequencing libraries using agarose gels” (Wed, Oct. 7th, 5 pm – 7 pm, clinical genetic testing section). From our R&D team, the poster presents information on a tool under development that we think will be particularly helpful for scientists and clinical researchers in genomics. The HLS enables fully automated, rapid purification of high molecular weight DNA directly from blood or cells. This HMW DNA (>50 Kb) is increasingly important for long-read sequencing and other applications. The poster shows how we accomplish this, as well as some milestones, such as the recovery of DNA fragments as large as 800 Kb.
The Sage Science team will be exhibiting in booth #1016, where we’d be happy to talk to you about how automated DNA size selection can help you produce better results for your genome sequencing projects.
If you’re looking for out-there ideas in genomics, there’s no better place to start than with Chris Mason’s lab at Weill Cornell Medical College. We were delighted that Mason was featured in the latest podcast from Mendelspod and its host Theral Timpson. From swabbing subway stations to tracking gene expression in astronauts, this podcast is truly riveting.
Mason’s lab achieved celebrity status in its hometown of New York City when the staff kicked off Pathomap, an effort to survey the microbes in places like subway stations. “We tried to get a complete molecular map of the city,” Mason tells Timpson. It was a “big discovery effort to build a baseline microbiome” — and one that produced “an interesting, inspiring, and … in some cases controversial bit of research.” After sequencing hundreds of samples, the team found that half of the DNA collected didn’t match any known organism.
Another lab project involves a longitudinal study of identical twins — one on land, and one spending a year in space. Mason’s team collected data for six months before the astronaut’s flight, is in the middle of 12 months of data from space, and will continue for six months after the twin returns. RNA analysis has already proven interesting. “The expression changes dramatically as soon as you get into space,” Mason says, who is particularly intrigued by the epitranscriptomic changes that his team is tracking.
Speaking of space, Mason is working to launch an Oxford Nanopore sequencer; currently, his collaborators are testing it in zero-gravity simulators here on Earth. He tells Timpson that data from the MinIon with the latest chemistry is promising, showing lower error rates and less GC bias than earlier versions. “It’s pretty compelling,” he says.
But Mason is not one to choose a single technology: he encourages scientists to validate data with orthogonal platforms whenever possible. “Every technology has a little bit of a blind spot,” he says.
The wide-ranging interview with Mason also covers synthetic biology, a theory that biologists are where physicists were in the late 18th century, commentary on long-read solutions such as PacBio and 10X, and his goal of engineering microbiomes to make it possible for humans to travel in space or colonize other planets.
PacBio users have been regularly serving up new microbial genome assemblies, and we’re glad to see that they’re using our BluePippin automated DNA size selection instrument to get the best results.
These are just some of the genome announcements published in the last few months:
A pathogen affecting economically important crops, such as melons and gourds, which had not previously been sequenced. Scientists present a draft sequence containing seven contigs and many phage or prophage elements.
Clostridium sporogenes DSM 795T
Researchers published this first whole genome sequence of this bacterium, a nontoxigenic relative of Clostridium botulinum. The genome was finished into a single contig of about 4 Mb and contains dozens of identical sequence copies greater than 1,000 bases.
A member of a group of sulfur-oxidizing bacteria, Sedimenticola thiotaurini strain SIP-G1 was sequenced and presented as a closed genome assembly. Scientists identified pathways not found in other members of this genus.
Scientists sequenced and annotated Microcystis aeruginosa NIES-2549, a freshwater cyanobacterium. The genome is almost 4.3 Mb and was sequenced to help understand the species’ ability to produce hepatotoxic cyanotoxins, which cause major environmental damage.
Escherichia coli O96:H19
This E. coli strain was responsible for a foodborne outbreak in Milan last year in which the organism’s pathogenicity was far more severe than usual. The published genome sequence is fully closed and allows scientists to study its acquired virulence.
In a Biotechniques paper this month, scientists from The Genome Analysis Centre describe a new method for mate-pair sequencing that saves time and money while decreasing the amount of input DNA required. The method is based on SageELF, which automatically generates 12 contiguous fractions of DNA from a single sample.
Led by Darren Heavens, the authors report that length and quantity of input DNA have been problematic factors in the preparation of long mate-pair (LMP) libraries for next-gen sequencing. To address that issue, they adjusted the sample prep protocol to use SageELF instead of conventional gel-based sizing, and then chose the fraction that best met their target fragment length.
“Using the SageELF streamlines the library construction process, allowing LMP libraries >10 kb to be constructed in under 2 days with <10 µg input material,” the scientists write. “For many genome projects, multiple insert size LMP libraries are required, and the ability to construct up to 12 discretely sized libraries for a combined reagent cost of $1270 compared with the reagent cost of $715 for a single insert size LMP library highlights the potential cost savings.” The protocol was developed to optimize the Nextera-based long mate-pair kit for library construction. In addition to the initial round of size selection with SageELF, the scientists conduct another sizing step on the BluePippin prior to Illumina sequencing to ensure selection of DNA fragments best suited for the platform. The protocol pays off by saving time and money in library prep, as well as by reducing the need for larger volumes of input DNA. It also leads to better sequencing results. “Accurately determining the size and span of the inserts for mate pair libraries simplifies the scaffolding problem, enabling the assembly of longer, more precise sequences with fewer non-determined bases (runs of N bases), empowering all subsequent downstream analysis,” the scientists report. Check out the full paper: “A method to simultaneously construct up to 12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost.”
And for more on the TGAC team, check out this brief profile.