MOLECULAR IDENTIFICATION OF LOCAL VERTEBRATE SPECIES USING CYTOCHROME OXIDASE SUBUNIT I (COI) GENE

The aim of this study is to determine a molecular tool for identification of local vertebrate species using mtDNA COI gene. Polymerase Chain Reaction (PCR) using universal primers complementary to the conserved region of the mitochondrial DNA (mtDNA) cytochrome oxidase subunit I (COI) gene fragment, was performed on DNA of blood samples of 30 local animals in Malaysia. DNA of hosts was amplified by PCR and the products were visualized on gel electrophoresis. Twenty two sequences (73.3%) were obtained and compared with sequences registered in GenBank and BOLD Systems databases. The BLAST results for fifteen samples (68%) showed sequences were in congruence with morphological identification at 92% to 100% accuracy while seven sequences had no significant similarity. These results suggest that COI-based PCR is a reliable identification tool for vertebrates and can be applied for epidemiological studies on blood meal analysis of arthropod in Malaysia.


INTRODUCTION
Mammals rank amongst the most studied animal groups, thus resulted in their taxonomy and species diversity well documented in literature (Wilson & Reeder 2005). However, field identification for species of many small mammals remains difficult, in large part because of morphological variation during their development (Francis 2008, Francis et al. 2010). In vector-borne diseases, rodents are important vertebrate host animals for many zoonoses that threaten public health (Luis et al. 2013, Vazirianzadeh & Rahdar 2013. For example, ticks (Acari: Ixodidae) play an important role in the epidemiology of disease transmission to human and animals (Gariepy et al. 2012). Ticks also rely heavily on animals especially rodents as their host to complete their life cycles (Ahantarig et al. 2008). A profound knowledge of vector host preference is important in understanding natural disease transmission cycle (Ngo & Kramer 2003) and to minimize the risk of human infections (Christine 2011). It is important to identify natural hosts' species as fast and accurate possible for an effective outbreak management and control.
In the past, identification of the host species based on morphological taxonomic keys was commonly used for a wide variety of purposes (Kon et al. 2007, Teletchea et al. 2008). This technique is difficult because it requires animal trapping and animal maintenance in the laboratory (Wodecka et al. 2014) if further morphological or diagnosis required. Moreover, not many scientists are trained and interested in taxonomic nowadays leading to decrease number of taxonomists (Tahseen 2014). A technique is therefore needed to further provide rapid and accurate identification for use by the non-experts (Hebert et al. 2003). A standard molecular identification technique is necessary as a complement to morphological methods in order to reduce uncertainties in the identification of vertebrate. The availability of DNA sequence data of various vertebrates has opened the door for molecular based species identification approaches, such as Polymerase Chain Reaction and DNA sequencing (Alcaide et al. 2009). This alternative method of PCR-based identification is more convenient and easier to perform than the taxonomic keys (Tiwary et al. 2012, Ernieenor et al. 2013). The approach can also provide genetic references to validate field identifications made by researchers with limited taxonomic background (Borisenko et al. 2008, Lu et al. 2012).
Many molecular identification genes have been developed for species identification of hosts including mitochondrial gene regions of cytochrome b, 16S rRNA and 12S rDNA (Pun et al. 2009, Tillmar et al. 2013). Today, one standardized molecular identification approach termed DNA Barcoding has been extensively used in recent years by using cytochrome oxidase subunit I (COI) as a target gene. The features such as rapid accumulation of mutations and negligible recombination rate (Panday et al. 2014) making it particularly valuable to differentiate between species. The existence of large databases and the continued development of a comprehensive DNA barcode sequence also allow the specific identification of a bigger number of animal species (Alfonsi et al. 2013).
The objective of this study was to determine a molecular tool for identification of local vertebrate species using mtDNA COI gene. Those gene sequences were then compared with available sequences in GenBank and BOLD Systems.

Collection of Blood
Blood was collected from a total of 30 local animals (Table 1) comprising of 18 known species of captive animals reared in National Zoo Kuala Lumpur, Laboratory Animal Research Unit in Institute for Medical Research Kuala Lumpur and wild small animals caught in Pantai Kelanang, Selangor. Two to three ml of blood was taken from each animal by veterinarians using appropriate humane procedure. Then it was transferred in EDTA anticoagulant tubes and sent to the Institute for Medical Research (IMR) for further analysis. The samples were kept frozen at -20 °C until further processed for DNA amplification.

Dilution of Blood
Ten µl of freeze-thawed blood of each species of animals were mixed with 90 µl of sterile double distilled water to produce a 1:10 dilution and used as a template in PCR amplification (Ernieenor et al. 2012).

PCR Amplification of COI Gene and Gel Electrophoresis
DNA of hosts was amplified using PCR with universal primers complementary to the conserved region of mtDNA COI gene. The primers LCOI1490 (5'-GGT CAA CAA ATC ATA AAG ATA TTG G -3') and HCO2198 (5'-TAA ACT TCA GGG TGA CCA AAA AAA TCA-3') amplified 658 bp of the COI gene (Folmer et al. 1994

DNA Purification and DNA Sequencing
Each DNA fragment was excised from the gel using sterile, sharp gel cutter and purified using 5 Prime PCR Agarose Gel Extract Mini Kit (Hamburg, Germany) according to the manufacturer's protocol. DNA sequencing in both directions was done in the presence of the ABI PRISM ready reaction big dye terminator cycle sequencing kit (Applied Biosystems, Forster City, California, USA), following the manufacturer's instructions.

Gene Sequence Analysis
The obtained sequences were compared with available sequences in GenBank database using the Basic Local Alignment Search Tool (BLAST) program and BOLD-IDS tool from BOLD Systems. N o v e m b e r 1 5 , 2 0 1 5

RESULTS
A total of 30 blood samples collected from captive, laboratory and small wild animals were amplified using PCR. A 658 bp fragment of mitochondrial COI gene were successfully amplified and visualized with 1.2% agarose gel electrophoresis in 22 blood samples (Figure 1). No PCR product was obtained with eight specimens despite repeated amplification and gel electrophoresis. Negative control (double distilled water) yielded no PCR product implying that only host's DNA patterns were detected in the amplifying specimens. All 22 (73.3%) positive samples were sequenced and compared against GenBank and BOLD search engine. Fifteen sequences (68%) provided correct species identification results when using GenBank BLASTn and incongruence with the morphological identification. Seven sequences give no significant similarities. The capacity of species identification through BOLD resulted only 12 (54.5%) sequences generates a correct identification at species level while three sequences without identification. Sequences identification of the mtDNA COI gene through GenBank revealed that the sequence showed similarity ranging from 92-99%. Twelve out of 15 sequences matched corresponding sequences of GenBank species with above 95% similarity while three sequences (CW1, CW2, and LC2) give below 95% similarity. In BOLD search, 12 sequences were successfully identified to species-level with a maximum identity value greater than 98%. The results of blasting against corresponding sequences in databases were shown in Table 2. Eight new gene sequences which had 99% similarity were submitted for registration at GenBank/BOLD and awaiting corresponding accession numbers. A fragment of 658 bp of the COI gene was amplified for the animal species identification due to its widest taxonomic representative (Malakar et al. 2012) in nucleotide database after Cytb gene. Like other protein-coding genes, its 3 rd position nucleotide show high incidence of base substitutions, leading to a higher rate of molecular evolution (Dawnay et al. 2007) that is three times greater than ribosomal DNA. Despite the high rate of nucleotide substitution, the COI marker can discriminate between species boundaries, even among cryptic species which is congruent with morphology-based taxonomy (Cywinska et al. 2006).
Pairwise comparison of the sequences of amplified mtDNA COI gene with available sequences in databases revealed nucleotide similarities ranging from 92-100%. Fifteen sequences had exact species match at the highest percent similarity and seven sequences did not have any significant similarities; those were ML 2, MY, RB 1, SH, DL, RB 2, and KGT 23. The occurrence of not significant match probably due to the too many noise in sequencing which resulted to low complexity or short query sequence (Altschul et al. 1990, Ernieenor et al. 2013) after blasting in international databases and low concentration of purified PCR product (Agostino 2012).
Throughout the study, the identification for three sequences (LC 1, LC 2 and CW 1) that were unable to match with any records in BOLD database however gave exact match with high similarity using available sequences in GenBank. This phenomenon might be due to lack of reference sequences or no COI sequence for this local animal species (Christine 2011) but yet are available in the DNA barcoding system. Parson et al. (2000) also supported that if a species was not found in the databases, the biological significance of the matching result strongly depended on the availability of closely related species, sometimes distantly related taxon which also resulting in lower sequence similarity values.
A small (2%) DNA sequence variation was observed between wild and captive leopard cat (LC 1 and LC 2). Both sequences provided exact match species identification corresponding to Prionailurus bengalensis with the database entry in GenBank. The small difference similarity is probably caused by single nucleotide substitutions between two sequences which can attributed to individual variation (Garamszegi et al. 2009). The COI sequence of laboratory guinea pig (GP 1 and GP 2) displayed maximum homology (99%) with the Cavia porcellus in GenBank and BOLD database. This finding is in agreeable with facts that the abundance and adequate reference sequence of guinea pigs and other laboratory animals in the international databases could result a high homology (Altschul et al. 1990).
Samples from three Malayan tigers in this study displayed high similarity with the Panthera tigris and Panthera pardus (97% and 92%, respectively) COI mitochondrial genome in the GenBank. It may be due to the fact that genetic sequences among those two species are closely phylogenetically related to each other. A lack of differentiation between closely related species has been reported in other studies for the COI gene (Hajibabaei et al. 2007) and also the Cytb gene (Branicki et al. 2003).
The gene sequences for Pig Tailed Macaque (PTM 1 and PTM 2) samples have high similarity values (99%) with Macaca nemestrina sequence in both databases while the PTM 3 samples showed 97-98% similarity. The nucleotide differences between samples in this study and those in databases can be attributed to individual variation, especially within species demonstrating large range of distribution (Taberlet et al. 1992). Furthermore, Avise and Walker (1999) in their study on some vertebrates concluded that less than 1% or 2% sequence divergence was typical of phylogeographical unit within a species.
A very high similarity (99% and 99.8%) with subject sequence in Genbank and BOLD accession was obtained for a single elephant (Elephas maximus) sample. This is not expected as there are only two different genera of elephant left in the world and only five deposited COI sequences of Elephas maximus in Genbank which contributed to small individual variation.
The COI sequence of two White Handed Gibbon samples; WHG (K) and WHG (P) is highly similar (99%) to the corresponding accession Hylobates lar in GenBank. The successful molecular identification of this species is valuable since the taxonomic identification may be complicated due to sexual dichromatism and variation in coat color (Mootnick 2006). Distributions of gibbons are scarce worldwide and their numbers are declining. Evolutionary relationships of the gibbons' taxa have long been a focus of study due to their high taxonomic diversity and conservation importance (Mootnick 2006, Meyer et al. 2012. Moreover, nearly all taxa of gibbons have been classified as endangered at either the species or subspecies level (Chan et al. 2010).
This study demonstrates that COI gene enables accurate animal species identification where adequate reference sequence data exists. If DNA sequences for potential animals are not represented in the database, sequenced identification may be misidentified. DNA sequences analysis also relies heavily on the robustness of available sequence for the target species in the gene library. In comparison between two reference databases, it revealed that GenBank can identify more query sequences than BOLD. This can be due to the fact that GenBank presents a most comprehensive database than BOLD in term of recent N o v e m b e r 1 5 , 2 0 1 5 and specific database (Ratnasingham & Hebet 2007). These two databases also use different algorithms to calculate the similarity between query and reference sequences which can generate discrepancies in identification.

CONCLUSION
This study highlighted the expanding use of COI gene as a genetic marker for species identification of vertebrates especially mammals that served as ticks' host. It can be used as a tool to support morphology-based taxonomy in the absence or scarcity of experts. Addition of local vertebrate sequences in gene library is particularly important for research on arthropod-borne diseases that aims to use genetic barcodes of COI gene in blood meal studies. Therefore, a larger numbers and variety species of local vertebrates should be collected over the long term to obtain more sequence information for deposition in reference databases.