When investigating eukaryotic gene sequences, the 18S-rRNA gene is
often used as a metabarcoding marker. However, there are many different
variable regions that can be used to sequence the gene (V1-V9, excluding V6).
Tanabe et al., (2016) investigated whether any of these regions were more
suitable for Massively Parallel Sequencing (MPS – also known as High-Throughput
Sequencing (HTS)) than the others.
Investigations into the 18S-rRNA gene worked by
grouping the variable sites into regions commonly used in microbiology
research: V1-3, V4-5 and V7-9. Through data collected from online sequence
databases they aimed to investigate: the number of sequences deposited in
International Nucleotide Sequence Databases (INSD’s), the amplification success
rate and the amplicon sequence variability. Sea surface water samples were then
collected from the Sea of Okhotsk to test for: the taxonomic composition
difference between amplified sequences and template sequences, the OTU
accumulation curve, the OTU taxonomic composition and the taxonomic
identification power in the 3 variable regions.
The results of these investigations have shown
that the V1-3 region has the highest variability, making it the most suitable
region for environmental monitoring (based on the use of Roche 454 Pyrosequencing).
This region also has significantly higher identification power compared to
V4-5, however there was no significant difference between V1-3 and V7-9. The authors
theorise that the V1-3 region will become more useful once more sequences are
deposited into the databases, though they do not explain their reasoning for believing
this. V1-3 is also the longest of the three regions, and because of this is too
long for Illumina MiSeq/HiSeq – instead, the V4-5 region should be used as it
has a shorter read length. Nevertheless, the end sentence of this paper states
that: ‘we also encourage to use multiple regions for reducing PCR biases’ – a very
vague ending that seems to negate the findings of the paper. So, we are still
left wondering whether any of the 18S-rRNA variable regions are more suited for
gene sequencing than others.
This study managed to deduce which variable
regions are most appropriate for Roche 454 Pyrosequencing, which is most
commonly used when identifying plankton in environmental samples, and hinted at
which region would suit Illumina. A quick Google search shows that Roche 454
Pyrosequencing machines were discontinued in 2013 (3 years before this paper
was written) - indicating that, as it was one of the first HTS methods that
followed Sanger Sampling, the method may be becoming outdated. Therefore, a
more comprehensive study of which variable region would suit a wider range of
High Throughput Sequencing techniques is clearly needed.
Reference paper:
Tanabe, A., Nagai, S., Hida, K., Yasuike, M., Fujiwara, A.,
Nakamura, Y. and Katakura, S. (2016). Comparative study of the validity of
three regions of the 18S-rRNA gene for massively parallel sequencing-based
monitoring of the planktonic eukaryotic community. Molecular Ecology Resources (16) 402-414. http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12459/epdf
Hi Megan,
ReplyDeleteThis is a very interesting review. I didn't realise that there were so many regions in the gene (I just knew of the V9 region).
You mentioned that the data from the paper was collected from online sequence databases, but do you know what eukaryotic organisms' sequences they used?
Just curious.
Thanks,
Ankitha
Hi Ankitha,
ReplyDeleteI thought it was really interesting too - although it still doesn't clear up how different variable regions are chosen for each study, I assume it depends on the particular organism used.
It doesn't specify which organisms' sequences are used, simply that they are Eukaryotic organisms. They do include a small amount of information about the database used, such as: it contained 10,134,209 nucleotide sequences and they excluded tetrapods and terrestrial plants.
Sorry I couldn't be of more help!
Megan
Hello Megan,
ReplyDeleteVery interesting read, thank you for posting this. I agree with your conclusion: more comprehensive studies are needed in order to dissect the effects of primer choice and sequencing platform in metabarcoding studies. This has been done for the 16S rDNA barcode: interestingly, primer choice seems to have a more important effect than sequencing platform (Tremblay et al., 2015).
It’s clear that ideally, primers should be chosen on the basis of their discriminatory power and their potential to amplify an accurate pool of barcodes. In the study you summarised, the V1-V3 18S rDNA-amplifying primers appeared to be the most suitable, however their target region was the least represented in the databases: do you think this might mean that future studies could opt for a pragmatic approach, by sacrificing the choice of an ideal primer set for the promise of a larger sequence database to compare their results to?
Thanks,
Alessandro
Tremblay, J., Singh, K., Fern, A., Kirton, E., He, S., Woyke, T., Lee, J., Chen, F., Dangl, J. & Tringe, S. (2015) 'Primer and platform effects on 16S rRNA tag sequencing'. Frontiers in Microbiology, 6 (771).
Hi Alessandro,
DeleteI think there is definitely a benefit to using regions that are more represented in the database, to allow for better comparison. This is probably very appealing to scientists when carrying out their studies!
However, if more studies looked at the V1-V3 region the number of sequences in the databases would increase - this would allow more accurate comparison of sequences that use a more suitable primer.
Hopefully in the future the different variable regions will be more evenly represented in the sequence databases!
Megan
Hello Megan,
DeleteThat is a good point: let's hope future studies take the results of this paper as an invitation to expand the databases with the less represented barcodes.
Best,
Alessandro