What’s this concoction? The promise and power of metagenomics

by Dr Daniel Swan, NGS Services Manger, NCIMB

At NCIMB, we routinely identify bacteria that have been found in pharmaceutical, food and other industrial production environments. Accurate identification may help the companies involved trace the source of microbial contamination, and confirm whether repeat incidences arise from the same issue. Often these jobs arrive as pure cultures on plates, and we can pick off colonies to isolate the single species present.

However, sometimes we’re faced with a more challenging question. For example, we were recently asked to identify the components of a slime from inside a bioreactor. We may also be asked to analyse microbial populations from settings within the natural environment such as subsea structures.

Within bioreactors, unwanted slimes are generally the result of microbial contamination. There could be one contaminant, or a complex mixed community which has formed a biofilm – a three-dimensional conglomeration that includes not only a mixed community of microbial cells, but also polysaccharides, proteins and lipids. Plating out this kind of slime to isolate individual species for identification can be a very time consuming and ultimately frustrating process. There is no guarantee that all the species present in mixed environmental samples will grow under laboratory conditions.

It’s for projects like this that we turn to metagenomics. This approach is also known by other names – microbial community analysis or microbiome profiling are just two. This use of different terminology can be confusing, however, whichever term is used, the principles and the advantages of this method remain the same.

For the more straightforward identification of pure bacterial cultures we use Sanger sequencing for identification. This process involves sequencing a section of one gene, known as the 16S gene (or D2 LSU for fungal identification). The section is normally the first 500bp, but we can also sequence the whole gene, if required for better taxonomic resolution. But that does require us to have a colony or pure culture to work from in the first place, and as already described, when it comes to biofilms, obtaining those colonies can be challenging.

Metagenomics does away with this intermediate culturing stage, and moves straight from sample to DNA extraction. The quantities of DNA extracted are often very small, especially from environmental samples in cleanroom environments, or from seawater for instance. However, these small quantities of DNA are amplified using polymerase chain reaction (PCR) to create sufficient  amounts of the sequences of interest. Much like Sanger sequencing, for metagenomics analysis, we target the 16S gene. Most of the differences between the 16S genes of different bacterial species are concentrated in what are referred to as the variable regions, of which there are nine. As most metagenomic sequencing is performed on Illumina sequencers, which have a maximum read length of 300bp, a variable region, or combination of adjacent variable regions, is chosen for PCR amplification.

This method can be applied to many different scenarios – from a contaminated bioreactor or wastewater monitoring, to monitoring of oilfield systems, and analysing the bacteria in your gut.

The first round of PCR amplifies the majority of 16S sequences present in the starting material to produce a mixed PCR product that represents the mix of bacterial species in your sample. Each fragment of DNA is subsequently sequenced and analysed independently, allowing us to look at each sequence generated and attach a label to it. This is a taxonomic label – in other words a species identifier. This labelling process allows the number of times the sequence occurs to be assessed, and in this way we can understand not just what species are present, but also the abundance of each.

These two measures open the door to other analyses that can help us to gain a much deeper understanding of the situation being studied. The richness of the diversity of species can be calculated. We can look at sampled sites over time, or over treatments, and determine which species are changing between sample groups – effectively looking for differential abundance. We can group samples by their statistical properties – allowing us to classify samples against known types in order to assess where they might have come from.

This method can be applied to many different scenarios – from a contaminated bioreactor or wastewater monitoring, to monitoring of oilfield systems, and analysing the bacteria in your gut. It can be used to analyse the mycobiome (all the fungi present), for invasive species testing (via the CO1 gene instead of the 16S gene), for understanding food spoilage, and even for testing for adulterated meat in the supply chain.

Metagenomics really is a game-changing technology and whether 16S, or other marker genes are used, it can deliver an unparalleled insight into a system under study. However, the process needs to be carefully controlled due to the confounding factors that can enter the experimental system. Running negative controls and replication is advised, and even the selection of the variable region can have unexpected impacts on your downstream data. NCIMB can help you navigate these challenges with our decades of microbiology experience.

If this sounds like something you’d like to explore with us, please contact Dr Daniel Swan (e: enquiries@ncimb.com; t +44 (0) 1224 711100) or visit our metagenomics page for more information.

About the author

Dr Daniel Swan joined NCIMB in 2017. Daniel is an experienced bioinformatician who has been generating and analysing DNA sequence data since 1995. He is responsible for developing and maintaining the analysis

infrastructure that supports NCIMB’s NGS platforms as well as providing NGS and bioinformatics consultation for R&D partners. He is passionate about sequencing all of the reference strains in the NCIMB collection.