Microbes Rule the Waves: Is there a perfect variable region for 18S rRNA gene sequencing?

Monday, 9 October 2017

Is there a perfect variable region for 18S rRNA gene sequencing?

When investigating eukaryotic gene sequences, the 18S-rRNA gene is often used as a metabarcoding marker. However, there are many different variable regions that can be used to sequence the gene (V1-V9, excluding V6). Tanabe et al., (2016) investigated whether any of these regions were more suitable for Massively Parallel Sequencing (MPS – also known as High-Throughput Sequencing (HTS)) than the others.

Investigations into the 18S-rRNA gene worked by grouping the variable sites into regions commonly used in microbiology research: V1-3, V4-5 and V7-9. Through data collected from online sequence databases they aimed to investigate: the number of sequences deposited in International Nucleotide Sequence Databases (INSD’s), the amplification success rate and the amplicon sequence variability. Sea surface water samples were then collected from the Sea of Okhotsk to test for: the taxonomic composition difference between amplified sequences and template sequences, the OTU accumulation curve, the OTU taxonomic composition and the taxonomic identification power in the 3 variable regions.

The results of these investigations have shown that the V1-3 region has the highest variability, making it the most suitable region for environmental monitoring (based on the use of Roche 454 Pyrosequencing). This region also has significantly higher identification power compared to V4-5, however there was no significant difference between V1-3 and V7-9. The authors theorise that the V1-3 region will become more useful once more sequences are deposited into the databases, though they do not explain their reasoning for believing this. V1-3 is also the longest of the three regions, and because of this is too long for Illumina MiSeq/HiSeq – instead, the V4-5 region should be used as it has a shorter read length. Nevertheless, the end sentence of this paper states that: ‘we also encourage to use multiple regions for reducing PCR biases’ – a very vague ending that seems to negate the findings of the paper. So, we are still left wondering whether any of the 18S-rRNA variable regions are more suited for gene sequencing than others.

This study managed to deduce which variable regions are most appropriate for Roche 454 Pyrosequencing, which is most commonly used when identifying plankton in environmental samples, and hinted at which region would suit Illumina. A quick Google search shows that Roche 454 Pyrosequencing machines were discontinued in 2013 (3 years before this paper was written) - indicating that, as it was one of the first HTS methods that followed Sanger Sampling, the method may be becoming outdated. Therefore, a more comprehensive study of which variable region would suit a wider range of High Throughput Sequencing techniques is clearly needed.

Reference paper:

Tanabe, A., Nagai, S., Hida, K., Yasuike, M., Fujiwara, A., Nakamura, Y. and Katakura, S. (2016). Comparative study of the validity of three regions of the 18S-rRNA gene for massively parallel sequencing-based monitoring of the planktonic eukaryotic community. Molecular Ecology Resources (16) 402-414. http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12459/epdf

5 comments:

Unknown13 October 2017 at 09:38
Hi Megan,

This is a very interesting review. I didn't realise that there were so many regions in the gene (I just knew of the V9 region).

You mentioned that the data from the paper was collected from online sequence databases, but do you know what eukaryotic organisms' sequences they used?

Just curious.

Thanks,
Ankitha
ReplyDelete
Replies
Megan16 October 2017 at 09:49
Hi Ankitha,

I thought it was really interesting too - although it still doesn't clear up how different variable regions are chosen for each study, I assume it depends on the particular organism used.

It doesn't specify which organisms' sequences are used, simply that they are Eukaryotic organisms. They do include a small amount of information about the database used, such as: it contained 10,134,209 nucleotide sequences and they excluded tetrapods and terrestrial plants.

Sorry I couldn't be of more help!
Megan
ReplyDelete
Replies
Alessandro Cavallo17 October 2017 at 00:17
Hello Megan,

Very interesting read, thank you for posting this. I agree with your conclusion: more comprehensive studies are needed in order to dissect the effects of primer choice and sequencing platform in metabarcoding studies. This has been done for the 16S rDNA barcode: interestingly, primer choice seems to have a more important effect than sequencing platform (Tremblay et al., 2015).
It’s clear that ideally, primers should be chosen on the basis of their discriminatory power and their potential to amplify an accurate pool of barcodes. In the study you summarised, the V1-V3 18S rDNA-amplifying primers appeared to be the most suitable, however their target region was the least represented in the databases: do you think this might mean that future studies could opt for a pragmatic approach, by sacrificing the choice of an ideal primer set for the promise of a larger sequence database to compare their results to?

Thanks,
Alessandro

Tremblay, J., Singh, K., Fern, A., Kirton, E., He, S., Woyke, T., Lee, J., Chen, F., Dangl, J. & Tringe, S. (2015) 'Primer and platform effects on 16S rRNA tag sequencing'. Frontiers in Microbiology, 6 (771).
ReplyDelete
Replies

Add comment

Comments from external users are moderated before posting.

Note: only a member of this blog may post a comment.