Strain comparison and differentiation can be valuable in a range of circumstances. NCIMB’s Identification Services Manager, Vikki Warren takes a look at recent developments in the methods available for doing this and why whole genome sequencing is NCIMB’s preferred approach.
Environmental monitoring is an important part of quality control in pharmaceutical manufacturing environments. When it comes to investigating incursions or process contaminants, comparing them to previous isolates and tracing their source, strain comparison and differentiation can provide valuable information over and above species level identification. If several different strains are present in a manufacturing facility – and this could easily be the case with many of the more common environmental isolates – investigation results may be misleading or inconclusive if strain typing and comparison has not been undertaken.
It can also be a very important process with respect to the industrial use of bacteria. For example, companies may specify the presence of a particular strain of bacteria in their products, or use a specific strain within a patented process. In these circumstances, testing may be required to ensure that there is no strain drift and that the correct strain is being used or is present in the product. In the case of patented processes that involve bacteria, it is also vital that the strain can be accurately identified if the patent is contested, or infringement is suspected.
At NCIMB, we identify bacteria by sequencing the first 500 bp of the 16S rRNA gene, and searching for matches using the MicroSEQ database. 16S sequencing is quite a flexible approach and sequencing the full 16S gene, rather than just the first 500 bp, provides additional data that is sometimes required for an accurate species level identification, but can also provide additional useful data that can highlight differences between bacterial isolates. Sometimes this additional information is enough to clarify whether environmental isolates identified as being the same species are in fact the same strain, and therefore likely to have arisen from the same source. However, full 16S gene sequencing results are not always conclusive for this purpose.
Multilocus sequence typing (MLST)
Another approach that we have used in the past, is MLST. This is a method that was first proposed more than 20 years ago, and much of the initial interest in it was focused on its value as a means of pathogen outbreak tracking. However, it can be applied to other situations such as environmental monitoring, and at NCIMB we have, for example, successfully used the method to compare several different isolates of the common environmental isolate Staphylococcus epidermidis.
MLST uses the sequences of internal fragments of seven essential, single-copy, housekeeping genes –i.e. genes required for processes that are essential for cell operation – to characterise isolates. The seven fragments sequenced are each approximately 450 – 500 base pairs long – so around the same length as the section of the 16S gene that we sequence for identification purposes. Different strains of the same bacterial species show enough variation within each of these housekeeping genes i.e. at each of the seven loci, to create a vast number of distinct allelic profiles. Similarly to 16S rDNA sequencing, the development of this technique has been underpinned by online data sharing. Central databases have been established, to which users can submit strain information and new allele sequences. Strains are identified by comparing the sequence profiles obtained with previously published data.
However, one of the issues with MLST is the cost of the sequencing required. It is generally undertaken by Sanger sequencing – in other words the same sequencing technology used for 16S microbial identification of isolates. While Sanger technology is well suited to sequencing part, or all, of a single gene, when multiplied up to the requirement to sequence seven distinct loci, it does become a more expensive option.
Whole genome sequencing
In recent years, advances in sequencing technology have led to what is known as “next generation sequencing” becoming much more affordable and available, paving the way to whole genome sequencing of bacterial isolates as a more cost-effective means than MLST for comparing isolates of the same species, and this is now our preferred approach, for a number of different reasons.
In addition to potentially lower costs, another benefit of whole genome sequencing is that no MLST scheme is required. The whole genome sequences of isolates can simply be compared with each other to identify differences. There are different analyses that can be run to compare the sequences – for example, average nucleotide identity (ANI) quantifies genetic distance between genomes, and determines how closely related they are. Results are given in the form of a percentage that can be used to indicate if two genomes are the same genus, species or strain. We also run analyses that look for genes that are shared between the genomes, and highlight those that are not, and we can look for small nucleotide variations between genomes.
Although many MLST schemes have been, and continue to be uploaded and shared publicly, because the initial focus of the method was on tracking pathogens, schemes for environmental isolates may still be harder to find. Another factor to consider is that because in comparing whole genome sequences, you aren’t limited to the seven housekeeping genes of the MLST scheme, it is more likely any strain differences will be detected – in other words, the more data you have for comparison the better. But since the whole genome data obviously includes the seven MLST loci, the data can also be used for MLST analysis if required, which may be useful if, for example, MLST data exists for previous isolates.
It is a relatively straightforward, but very flexible approach as the whole genome data is also available for any bioinformatics analysis that might be required such as screening for functional genes of interest. It’s also more consistent – you are using exactly same method for every species so there are no complicating factors with different primer sets or the quality of sequences that might be obtained from the different primers.
A more cost-effective approach
It might at first seem counterintuitive that sequencing more of the genome provides a more cost-effective approach to strain typing, but it is really a reflection of the different technologies now available for routine analysis, how they work, and the relationship between number of samples and cost per sample.
With next generation sequencing, the cost per sample is much lower if you run a lot of samples at the same time, and the so total cost of sequencing e.g. 20 samples, may not be much more than the cost of sequencing five. This could be helpful during an investigation as many more isolates can be included for only a small additional cost – the cost of sequencing is much less likely to become a limiting factor. In contrast, if you are using MLST, the total cost multiplies up in a more linear way, and companies may feel they need to be more selective about the strains they choose for comparison. For large numbers of samples, the whole genome sequencing approach will be much quicker too, because all samples are sequenced in parallel.
So in conclusion, a whole genome sequencing approach to strain comparison has a number of advantages – it is quicker, cheaper per isolate, allows for the inclusion of more isolates in investigations, and is more likely to identify any strain differences than the alternative approach of MLST.
About the author
Vikki Warren leads a team of scientists responsible for delivering NCIMB’s sequencing and identification services as well as sequencing new deposits to the UK’s National Collection of Industrial Food and Marine Bacteria. She has extensive experience of using the well-established Sanger sequencing method for identification of bacterial and fungal environmental isolates, and has been central in the development NCIMB’s new microbial profiling and whole genome sequencing services, which are based on next generation sequencing with the Illumina platform. Vikki holds a BSc (Hons) degree in Applied Biosciences and Management, and an MSc in Instrumental Analytical Techniques; DNA Analysis, Proteomics and Metabolomics from the Robert Gordon University in Aberdeen.
“In addition to potentially lower costs, another benefit of whole genome sequencing is that no scheme is required. Sequences of isolates can simply be compared with each other to identify differences.”