Metagenomics, metataxonomics or 16S amplicon sequencing? An introduction to microbiome analysis.

by Julie MacKinnon, Microbiome Services Manager

High throughput sequencing of the 16S ribosomal RNA gene has become an increasingly popular analysis in recent years. One of the most high-profile applications of this technology has been in analysing and understanding the human gut microbiome and this is a service that we provide at NCIMB to support our customers’ research and development work. Analysing stool samples may not sound like the most pleasant occupation, but the research being undertaken in this area, for example, in understanding the relationship between microbiome, nutrition and human health is so fascinating, that it has become a surprisingly attractive area of work!

However, while many people now think of the term “microbiome” as being synonymous with the human gut microbiome, the definition is in fact much broader than that, and really applies to any population of microorganisms inhabiting a specific environment.

Consequently, in addition to sequencing the human gut microbiome, we have received samples for high throughput 16S sequencing from a highly diverse selection of industries and environments, including soils, oil and gas production facilities and raw materials for food production. This reflects the fact that in many different industry sectors, people are realising that an understanding of microscopic communities can yield information that can be used to inform decision making – relating to, for example, corrosion prevention, food quality or soil health.

When we receive enquiries, people often ask if we undertake “16S sequencing”, but several other phrases are sometimes used to describe this kind of analysis. For example, some of the most used terms are 16S metagenomics, next generation sequencing and amplicon sequencing. Not all the customers we deal with who are seeking this kind of analysis are themselves microbiologists or molecular biologists, and the terminology can quickly become confusing – so this post aims to demystify some of the most commonly used terms! 

16S Sequencing

The term “16S sequencing” simply refers to DNA sequencing of the 16S ribosomal RNA gene.This gene is approximately 1500 base pairs long.Ribosomes exist within all cells, and their function is to translate the instructions encoded within DNA to assemble proteins.  In bacteria, ribosomes are comprised of a large subunit and a small subunit. 16S ribosomal RNA is part of the small subunit, and the gene that encodes 16S ribosomal RNA – 16SrDNA – is often sequenced to identify bacteria. At NCIMB we undertake 16S rDNA sequencing by two different methods for different purposes: Sanger sequencing and high throughput/ next generation sequencing.

this is a hierarchy diagram. Sequencing Technology is at the top. beneath that is Sanger sequencing: typically used for sequencing a small section of the genome, and Next generation technology: can be used to sequence large quantities of data very quickly.  Beneath Sanger sequencing is 16S sequencing: for identification of bacteria; D2LSU sequencing: for fungal identification and ITS sequencing: for fungal identification. Beneath next generation technology is strain characterisation: in depth knoweldge of a single strain and metagenomics: understanding microbial communities.
Sanger vs next generation sequencing and some examples of their application

Sanger sequencing

This method, also known as the “chain termination method” was developed by Frederick Sanger in 1977.  At NCIMB, we use this sequencing method to identify individual isolates of bacteria, by sequencing either the first 500 base pairs of the 16S ribosomal RNA gene, or the full gene. This analysis is often requested by pharmaceutical manufacturers carrying out environmental monitoring of their production facilities, or food and drink manufacturers who have isolated a contaminant from a production line.

High throughput sequencing/ next generation sequencing

High throughput sequencing, also known as “next generation sequencing”, is a more recently developed approach to sequencing than Sanger sequencing, that offers a much higher throughput, and this is the technique that has revolutionised microbiome research.  This higher throughput, which is achieved through massively parallel sequencing technology, has been applied to the sequencing of whole genomes of individual organisms, as well as sequencing to understand the make-up of microbial communities.

Metagenomics, 16S metagenomics and metataxonomics

Strictly speaking “metagenomics” is the analysis of the whole genome of all organisms within a sample, and this can be undertaken using high throughput sequencing. However, the taxonomic makeup of bacteria within microbiome samples can be studied by sequencing a section of the genome – part of the same gene that is sequenced for identification of bacterial isolates – 16S rDNA. Sometimes this is referred to as 16S metagenomics, or metataxonomics. Although the target is the 16S gene, the sections of the gene that are typically sequenced for this purpose – the “amplicons” are a little different to that used for identification of isolates.

Amplicon sequencing

An amplicon is a section of DNA or RNA that is amplified – most commonly by polymerase chain reaction (PCR) – an analytical technique that the general population became very familiar with during the COVID pandemic! Before sequencing can be undertaken, the genetic material must be amplified, to create a sufficient quantity to work with. PCR is used to target and amplify the section of interest. When we undertake high throughput 16S sequencing to reveal the taxonomic make-up of microbiome samples, we can look at two different regions of the 16S gene, or “amplicons”:

  • V1/V2 (variable region one and variable region 2)
  • V3/V4 (variable region 3 and variable region 4).

The 16S ribosomal gene includes nine variable regions interspersed between conserved regions. For comparison the first 500 base pairs sequenced for the purpose of isolate identification includes V1, V2 and part of V3.

Simplified representation of the16S ribosomal gene, which is shown as a  bar. Variable regions of different lengths are labelled as V1-V9, and are interspersed between conserved regions of different lengths shown as grey blocks within the bar.
Simplified representation of the 16S ribosomal gene: this diagram shows nine variable regions labelled V1-V9 interspersed between conserved regions shown in grey.

What the results look like

As you might expect, microbiome sequencing produces large amounts of data. However, we present the results of high throughput 16S sequencing in three different easy to understand formats – tables, bar plots and sunburst plots. The results show the complexity of the make-up of the samples at genus level, and the tables give the relative abundance of the particular genus within the singular sample and the combined abundance across a group of samples.  

KingdomPhylumClassOrderFamilyGenusSample 1 abundanceSample 1 relative abundanceSample 2 abundanceSample 2 relative abundance
BacteriaFirmicutesClostridiaClostridialesClostridiaceaeClostridium sensu stricto 13512.43%9404.99
Extract from an example results table. Full table shows total combined abundance of all OTUs

Of course, some environmental samples may include genera for which no sequence data yet exists, but the tabulated data also presents the kingdom, phylum, class, order and family, depending on the published data available.

The bar plots allow for an easy visual comparison of the difference in genera between samples – for example either samples taken from different locations or changes that have occurred over time.

This stacked bar plot shows the relative abundance of genera found in two samples. Each genera is represented by a different coloured layer within the bar. The x axis denotes the two samples: Sample 1 and sample 2. The y axis is abundance shown as a percentage. The bar chart shows shows the abundance of 20 genera present in the sample.  Four are named as "unknown genus". The bar plot serves to illustrate an example of two samples containing 20 genera in different proportions. It is plotted using the data contained in the table above.
Example of a bar plot. The relative abundance of each genus is shown as a percentage of the total operational taxonomic units present in the sample.

Interactive sunburst plots can also be provided, and this format is useful for visualising the hierarchical relationships between organisms present in the sample analysed.

Sunburst plots show hierarchical taxonomic data in the form of concentric circles. The central ring represents the kingdom present, in this case bacteria, and moving outwards from the centre the rings represent phyla, class, orders, families and genera present in the sample. Each ring is divided into coloured sections proportionate to abundance of e.g. each phyla present: in this example, Firmicutes account for more than 75%, followed by Actinobacteria, Bacteroidota and Verrucomicrobiota.
Example of a sunburst plot: the inner most ring shows the kingdom, the outermost shows genera.

For more information about NCIMB’s high throughput 16S sequencing services visit our 16S metagenomics webpage . We also offer a wider range of gut microbiome services.

This image is a black and white portrait of Julie MacKinnon, Microbiome Services Manager at NCIMB
Julie MacKinnon, Microbiome Services Manager

In many different industry sectors, people are realising that an understanding of microscopic communities can yield information that can be used to inform decision making