Key words DNA barcoding - dietary supplements - herbal medicine - mini-barcodes
Introduction
DNA barcodes, a term introduced by Hebert et al. [1 ], describe short genomic regions from the nuclear and/or organelle genome used to
distinguish animal, plant, fungal, and bacterial species. The use of these short genomic
regions for biological species discrimination is called DNA barcoding. Applications
include tracking the illegal trade of endangered species of both animals and plants
(i.e., biopiracy) [2 ], [3 ], [4 ], [5 ], forensic analysis [6 ], [7 ] identifying invasive species [8 ], [9 ], plant identification at any stage of the life cycle (juvenile or mature) [10 ], identifying complex food webs by studying species diversity in the gut contents
of animals [11 ], analyzing herbivoreʼs diet components [12 ], checking adulterations and substitutions in food products [13 ], and authentication of herbal medicine and identification of botanical ingredient
adulterants [14 ], [15 ], [16 ], [17 ].
The herbal products industry is a multibillion-dollar industry and an important part
of the worldʼs economy. In a study funded by the Natural Products Foundation, estimates
suggest that the total economic contribution of the dietary supplement industry to
the U. S. economy is more than three times the annual consumer sales, or $61 billion
dollars per year [18 ]. However, as the popularity of herbal dietary supplements has increased, so have
reports of adulteration; this admixture, or substitution, of herbal products/supplements
with materials of substandard quality is a growing concern since it may lead to decreased
efficacy and the occurrence of serious adverse events. Price pressure, increased demand,
limited availability of medicinal herbs, and greed of unscrupulous suppliers are some
of the reasons for the intentional substitution of botanical ingredients. Adulteration
is often carried out using a closely related species having similar active ingredients
(although in some cases, the use of closely related species is acceptable in pharmacopeial
monographs) to fool the traditionally used authentication methods, but can occur with
completely different lower-cost substitutes as well. Adulteration can also be accidental,
for example, due to the complex nomenclature of medicinal herbs or the unintentional
misidentification of plant species collected in the wild. Worldwide, a variety of
different names to indicate a specific plant species is currently in use: pharmaceutical
names, scientific binomials, older scientific names called synonyms, and common names
(see review by Barnes et al. [19 ]). Confusion often arises when different plants have the same common names, which
may lead to situations where the local gatherer collects the wrong material. For example,
the name fang ji is used for two Chinese herbs – han fang ji (Stephania tetranda ) and guang fang ji (Aristolochia fangchi ). Both materials are used in traditional Chinese medicine (TCM) for the treatment
of similar ailments, but A. fangchi roots contain aristolochic acids that are nephrotoxic and can cause urothelial carcinoma
[20 ]. Notably, reports from Belgium in the 1990s detailed 128 cases of aristolochic acid
nephropathy due to the ingestion of an herbal weight loss product where the roots
of S. tetranda were replaced by A. fangchi roots [21 ], [22 ]. Another example is the confusion of Eleutherococcus gracilistylus (the bark of which is known with herb traders as wu jia pi ) with Periploca sepium (known as bei wu jia pi ) [23 ]. Hence, the correct identification and authentication of botanicals is important
for the safety and efficacy of herbal products. In the USA, it is mandatory for any
dietary supplement manufacturing company to conduct at least one appropriate test
to verify the components of the dietary material under the current Good Manufacturing
Practice (cGMP) regulations of the US Food and Drug Administration.
Botanical taxonomy, macroscopic, organoleptic, microscopic, and chemical identification
methods are typically employed for the authentication of herbal materials. Like all
methodologies, they each have their advantages and disadvantages. Ideally, a plant
is identified by an experienced botanist in its whole form in the environment where
it grows. However, in the global herbal trade, most commodities are sold as cut, powdered,
or extracted materials that may be difficult to differentiate. Macroscopic identification
based on morphological characters requires an intact plant or plant parts for identification,
and that the morphological characteristics available are unique for the plant species
of interest. The technique requires an experienced taxonomist and can often be challenging
because these morphology-based procedures are usually time consuming and may not always
provide resolution to the species level. Macroscopic identification is not suitable
for powdered and extracted herbal materials. Botanical microscopy is helpful for cut
and powdered ingredients, and allows for the detection of inorganic materials such
as sand and salts (which are sometimes added to increase the weight of a material).
However, as with macroscopic identification, it may be difficult to determine the
identity to the species level. Chemical identification methods are suitable for cut,
powdered, and processed material, but they also require expertise as well as complex
and often expensive instrumentation, and the phytochemical composition can be affected
by geographic location, seasonal variations, storage conditions, and processing method
[24 ]. Recently, DNA barcoding has been proposed as another possible method for identification
and authentication of medicinal plants in herbal products [25 ], [26 ].
The concept of the barcode is an analogy to the combination and spacing of black and
white lines that can be found on the package of almost any commercial item. This “barcode”
is a representation of the unique item. It can be scanned and compared to a database
to identify the item and consequently its price. Barcodes of plants do not consist
of black and white lines, but, instead, consist of a unique combination of nucleotides
of roughly 300–1000 base pairs (bp) in length that are ideally specific to a plant
of interest. These DNA barcodes are agreed-upon DNA sequences from either the nuclear
or organelle (chloroplast, mitochondria) genome. In the strictest term, “DNA barcoding”
refers to the technique where millions of copies of these diagnostic sequences are
made using universal primer sets in a polymerase chain reaction (PCR), and which are
subsequently identified by a sequencing method. The universal primer sets used for
PCR were designed to bind to conserved flanking regions of the diagnostic/species
identifying sequence found in most plants. The unique nucleotide sequence (DNA barcode)
between the flanking regions can be determined and used to compare to voucher sequences
stored in either in-house or openly accessible databases, such as GenBank, using the
BLAST (Basic Local Alignment Search Tool) algorithm [27 ]. The open databases are collections of sequences submitted by individual researchers,
labs, and institutions but are not curated, and the accuracy has not been verified
or confirmed. Frequently, several different sequences are reported from different
sources with multiple possible barcode regions of varying lengths. For example, Table 1 S , Supporting Information, depicts publicly available barcode sequences and their sources
[28 ]. The data used to standardize the targeted barcodes comes from a range of different
sources with different levels of certainty. Some of the barcodes come from wild species
from unknown locations, without vouchers, the collector or the date of collection
identified, the part of the plant may not be known, and the number of nucleotides
sequenced in the barcode region may be different between samples of the same species
(Table 1 S , Supporting Information). Given that a 99–100 % match is used to confirm the species,
a match of 98 % or less gives only confidence to the genus level [29 ], the integrity and reliability of the reference materials are crucial to the accuracy
of the technique. The best gene region for barcoding of land plants is a matter of
debate, and molecular biologists have used a number of different single loci, or combinations
of loci, to authenticate plant materials [30 ]. A recent development is the generation of “mini-DNA barcodes” ([Fig. 1 ]). They represent only a fraction of the agreed upon barcodes and are usually 100–200
bp in length. Amplification of mini-barcodes is done by primer sets that are less
universal and more specific to either the genus of a plant or, if possible, unique
to only one plant species of a genus.
Fig. 1 Advantages of using mini-barcodes for amplification of DNA in processed plant materials.
To amplify a full-length barcode (300–1000 bp in length), high-quality DNA as a template
is required. When damaged/fragmented DNA serves as a template for PCR, a full-length
barcode may not be amplified. Hence, the lack of a PCR product may lead to the wrong
conclusion that the DNA template is absent. The use of primer combinations that result
in a very small PCR product (100–200 bp), a mini-barcode, is a better approach to
detect DNA that may be damaged/fragmented, e.g., found in processed plant materials.
(Color figure available online only.)
Though DNA barcoding is independent of morphological characteristics and physical
and seasonal variations, it has limitations as a stand-alone method for authentication
of processed botanicals. Hence, the current review highlights the potentials and pitfalls
of DNA barcoding with an emphasis on DNA barcoding methodology used for the identification
of herbs and future directions for authentication of herbal ingredients using different
techniques.
DNA Barcoding for Identification of Herbal Materials
DNA Barcoding for Identification of Herbal Materials
DNA extraction methods
DNA barcoding can be performed on herbal material only when a minimum quantity and
quality of DNA is present. A number of extraction methods and commercial kits are
available to extract high-quality DNA from plants [31 ], [32 ], [33 ], [34 ], [35 ], [36 ], [37 ], [38 ], [39 ]. However, there is no universal method that can be applied to the isolation of DNA
in herbal materials. DNA extraction from herbal materials should be performed shortly
after plant collection to avoid DNA degradation due to DNA damaging storage conditions
(UV, light, temperature, bacteria, fungi) and under good laboratory practice to avoid
cross-contamination with other samples. DNA-based authentication methods work best
with freshly collected whole plant material. However, the herbal materials used for
dietary supplements are generally collected, dried, and stored for various periods
of time before they are used for the preparation of the herbal product. Plant part,
storage time, storage conditions, and processing methods affect the quality and quantity
of DNA. In addition, plant metabolites such as polysaccharides, flavonoids, polyphenols,
and terpene lactones may hinder DNA isolation. The polysaccharides and certain secondary
metabolites reportedly coprecipitate with the DNA and prevent the complete dissolution
of DNA, thus requiring modified protocols for the isolation of DNA from materials
containing polysaccharides and problematic secondary metabolites [40 ], [41 ], [42 ], [43 ]. The most widely used approaches to extract genomic DNA are the cetyl trimethyl
ammonium bromide (CTAB) method [33] and commercial DNA extraction kits [37 ]. However, CTAB/kit methods are not successful in isolating DNA from plants or plant
parts that contain high amounts of secondary metabolites. Roots, rhizomes, and tubers
may contain particularly high levels of polysaccharides and polyphenols that must
be removed using protocols with added high concentrations of CTAB, polyvinylpyrrolidone
(PVP), and β -mercaptoethanol (β -Me) during the early stages of DNA extraction [41 ], [44 ], [45 ], [46 ]. High-quality DNA may be obtained from leaves and flowers because of the low levels
of interfering metabolites and fibers. Especially the DNA obtained from fresh and
young leaves and flowers can be used to prepare a crude extract, and this solution
can be used in direct PCR to amplify the target DNA barcode [47 ]. However, some plant leaves like tomato, cotton, and tea contain high concentrations
of polyphenols and tannins, which also hinder DNA isolation and PCR amplification,
thus rendering them unsuitable for direct PCR. Therefore, modified DNA extraction
methods were developed for the isolation of DNA from tissues or plants containing
high amounts of phenolic compounds and tannins [45 ], [48 ]. Similarly dried stems, roots, and fruits may not be suitable for direct PCR as
they contain secondary metabolites that inhibit PCR amplification. While fresh tissues
obtained right after the collection of medicinal herbs are preferred, most of the
herbal materials are available for analysis either in dried or powdered form, or,
after further processing, as capsules, tablets, or liquids. In modern phytomedicines,
the plant DNA is often either removed or degraded during the manufacturing processes
of herbal products. Hence, the DNA extracted from capsules, tablets, and liquid extracts
appear often as a smear on an analytical agarose gel due to fragmentation, or no band
appears due to complete removal of DNA ([Fig. 2 ]). Wallace et al. [49 ] tested 95 natural health products using a standard DNA barcoding technique with
multiple markers and primer sets. The authors were unable to retrieve DNA barcodes
from 25 % of the tested plant products of which 2 were plant roots and 14 were capsules,
tablets, or caplets. Their results demonstrated that DNA amplification could be easily
accomplished in botanical materials (33 in a total of 35 samples of teas and roots)
as compared to pharmaceutical formulations (19 out of 33 samples of tablets, capsules
and caplets). The difference in the results could be attributed to the fact that herbal
product formulations often contain excipients (such as fillers, diluents, binders,
glidants, lubricants, pigments, and stabilizers) that may affect the DNA extraction,
or that the primer sets failed to amplify the targeted region. Costa et al. [50 ] evaluated the possible effect of four different pharmacological excipients on DNA
isolation. All the tested excipients, talc, silica, iron oxide, and titanium dioxide,
exhibited adsorbent properties that affected the extraction of DNA from natural products
[50 ]. However, with slight modifications in the DNA extraction protocol, it is sometimes
possible to obtain useful DNA as a template for PCR from a powder, capsules, or tablets
[51 ]. In contrast to this, the processes by which tinctures and extracts are made, which
can include extensive heat treatment, filtration, extractive distillation, or supercritical
fluid extraction [52 ], thus degrading or removing the plant DNA completely, often make these materials
unsuitable for DNA barcoding analysis [53 ].
Fig. 2 Comparison of high-quality DNA (large size and usually received from fresh plant
material) and genomic DNA from processed plant material, e.g., from dietary supplements,
which is often damaged (appear as a smear on gel; Lanes 6, 11, 16), present in low
quantities, or absent altogether and can often not be detected on an agarose gel.
(Color figure available online only.)
Loci selected as DNA barcodes for herbal materials
The selection of a universal barcode region for identification of all land plants
has proven to be quite challenging. Though the technique has successfully been used
in the identification of animal species using cytochrome oxidase I (COI ) from the mitochondrial genome as a universal barcode [1 ], barcoding plants is more difficult for many reasons. The slow evolutionary rate
of the plant mitochondrial genome means that the mitochondrial gene regions, including
the COI region, do not sufficiently distinguish plants species. Therefore, relatively fast-evolving
plastid and nuclear genomes were proposed as alternative barcodes for plants [54 ], [55 ], [56 ], [57 ]. The most common regions are matK, rbcL , ITS, ITS2, psbA-trnH, atpF-atpH, ycf5, psbK-I, psbM, trnD, coxI, nad1, trnL-F, rpoB, rpoC1, and rps16
[54 ], [55 ], [56 ], [57 ]. These regions have a relatively fast evolutionary rate compared to mitochondrial
genes, can distinguish the species based on differences in the genetic code, and have
conserved regions flanking the ends of the DNA sequence for the binding of universal
primers. None of the individual plant DNA barcodes described to date have both differentiating
regions and universal primer regions. Hence, a multilocus plant barcode with combinations
of two or three loci was recommended [55 ], [56 ]. The Consortium for the Barcode of Life (CBOL) Plant Working group [56 ] suggested matK + rbcL as the preferred plant barcode combination. The matK locus is difficult to amplify in some plant genera since the universal primer-binding
site is not perfectly conserved and, therefore, may not be useful in a two-locus combination
in some plants. Another suggestion was the rbcL + trnH-psbA combination, but due to the high variability of the trnH-psbA sequence, it was difficult to align, and thus the two-locus barcode approach was
found to be problematic for some of the plants. To overcome this problem, a tiered
approach was suggested by Newmaster et al. [54 ]. The method utilizes the easily amplifiable and alignable rbcL region as a scaffold on which data from highly variable non-coding regions such as
ITS2 or the trnH-psbA region are employed for identification of plant species. Using this tiered approach,
approximately 75–80 % of the tested plant species can reportedly be barcoded [54 ], [58 ].
Numerous studies have been published to identify medicinal plants using various suggested
barcode loci. Techen et al. [30 ] reviewed the various barcode loci and methods used for the identification of medicinal
plants. The most extensive study of DNA barcoding of medicinal plants was accomplished
by Chen et al. [20 ]. Seven DNA regions, psbA-trnH, matK, rbcL, rpoC1, ycf5 , ITS (consisting of both ITS1 and ITS2), and ITS2, were evaluated for identification
of more than 6600 samples of medicinal plants (fresh leaves) and their closely related
species [20 ]. Their data suggested that the nuclear ITS2 locus could identify 92.7 % of the tested
species (8557 medicinal plants and closely related samples belonging to 5905 species
from 1010 diverse genera of 219 families in 7 phyla-angiosperms, gymnosperms, ferns,
mosses, liverworts, algae, and fungi) and proposed ITS2 as the core barcode for medicinal
plants [20 ]. Subsequently, ITS2 has been tested across a broad range of plant taxa with a large
sample size and confirmed as an effective barcode for plants by the China Plant Barcode
of Life (BOL) Group [59 ]. Chen and colleagues subsequently built a TCM barcode platform, called the Traditional
Chinese Medicine Database using a two-locus barcode system containing ITS2 and psbA-trnH sequences [47 ]. This database contains barcodes belonging to more than 23 000 medicinal plant species
and known adulterants [47 ]. Additional investigations on different plant groups confirmed the effectiveness
of ITS2 for the identification of medicinal plant species. For example, 24 medicinal
plants from the Fabaceae family and their adulterants were identified using ITS2,
with a success rate of 37–97 % [60 ]. Sun and Chen [61 ] successfully used the ITS2 barcode to distinguish 19 cortex herbs listed in the
Chinese Pharmacopoeia, and Pang et al. [62 ] also reported ITS2 as a suitable barcode for the identification of herbal materials.
Zhu et al. [63 ] recommended ITS2 locus to discriminate between Glehniae radix and its common adulterants. The major advantage of ITS2 as a barcode for the identification
of herbal supplements is its short length (200–230 bp on average). In most herbal
products and dietary supplements, the DNA is highly degraded into pieces of less than
500 nucleotides in length due to various processing methods and, consequently, universal
long barcodes ranging from 600 to 800 bp, on average, may not be amplified. Hence,
short-length barcodes that can be easily retrieved from dried, powdered form, or sometimes
even from extracts were recommended [51 ]. Despite this advantage, the ITS2 locus was not found to be suitable for global
identification of plants because of the presence of multiple copies of ITS2 within
one individual in all plant species. The multiple ITS2 copies, which are not always
homogenized by concerted evolution, led to the incorrect identification of species
due to their similarity with the copies of the more closely related species. Furthermore,
the same problem might arise in hybrids due to the biparental inheritance of ITS2.
Another disadvantage of ITS is the technical problems in amplification and sequencing
that can arise due to the presence of DNA from other species (e.g., fungi, which coexist
with plants as endophytes and/or mycorrhizal symbionts) [64 ], [65 ]. Recently, Cheng et al. [66 ] reported that the most frequently used primer pairs (ITS1 + ITS4) for ITS were originally
designed for fungi [67 ]. The less plant-specific primer sets led to low PCR and sequencing success rates
in some of the plant groups, for instance, < 50 % PCR success for algae [68 ], [69 ] and ferns [20 ], 57.6 % for gymnosperms [70 ], and 88.0 % for angiosperms [70 ]. Therefore, Cheng et al. [66 ] designed universal plant-specific ITS primers for both plant DNA barcoding and plant
systematics.
Recently, the term “mini-barcodes” was introduced for short-length DNA markers used
for the identification of botanical ingredients from processed herbal supplements
[51 ], [71 ]. Mini-barcodes are short (< 200 bp) sequences of DNA from standardized matK and rbcL barcode regions, which have been used, e.g., to identify and authenticate herbal
dietary supplements made from saw palmetto (Serenoa repens ) fruit [71 ], Ginkgo biloba
[51 ] leaf, and devilʼs claw (Harpagophytum procumbens and Harpagophytum zeyheri ) root and rhizome [72 ] herbal dietary supplements available in the markets of North America. The advantages
of mini-barcodes are the easy retrieval of DNA markers even from processed dietary
materials due to their small amplicon length, and their ability to distinguish closely
related species because of the genus/species specificity.
PCR amplification
PCR, first devised by Mullis [73 ], is a molecular method by which a single copy or a few copies of a piece of DNA
is amplified and thousands to millions of copies of a particular DNA sequence are
generated. The method employs a heat-stable DNA polymerase, the nucleotides, template
DNA, and DNA oligonucleotides (also called DNA primers). The general barcoding technique
uses universal primers for rapid identification of plant species [57 ], [58 ], [74 ], [75 ]. The universal primer sets recommended for barcoding of plant species are selected
to amplify DNA from four genomic regions, namely ITS/ITS2 from the nuclear genome,
and matK, rbcL , and trnH -psbA from the chloroplast genome. Inherent biases in the analysis can lead to false positives
and false negatives [76 ]. Biases can occur when the sequence at one of the universal priming sites varies
sufficiently to prevent efficient annealing, when different species have a different
number of copies of the chosen region leading to over or underestimation unrelated
to the actual composition of plant material, when the target DNA is degraded or fragmented,
or from the manner of extraction and preparation of the DNA for sequencing [65 ], [76 ]. Soares et al. [77 ] and Costa et al. [78 ] demonstrated that the reliability of DNA barcode methods in complex plant samples
varied depending on which of the commercial DNA extraction kits was used for the initial
sample preparations. The efficiency of the PCR reaction itself can impose a bias on
the barcoding results. Differences in the melting temperatures of the primers could
lead to a reduced amplification rate and the affinity of universal primers to template
DNA of all known and unknown organisms, and a balanced melting temperature of the
primer pairs are two important criteria to produce robust amplification [79 ]. Further, the presence of inhibitory secondary metabolites and inactive ingredients
in tablets and capsules can also reduce the efficiency of PCR amplification or lead
to false negative results. The use of excipient materials made from wheat, rice, or
soy is common in manufacturing processes of the herbal dietary supplement and pharmaceutical
industries [50 ]. Small amounts of starch are often required in order to optimize the formulation
and manufacture of pills, capsules, and tablets. Multiple nucleotide sequences may
be obtained upon the sequencing of herbal supplements due to the presence of excipients
or if the herbal product contains more than one plant species. Little [51 ] used digital PCR [80 ] for the verification of the presence of ginkgo DNA in herbal supplements that, due
to the overpowering amount of excipient, only produced sequences from excipient materials
or that produced sequencing chromatograms with multiple/overlaying signals indicating
a mixed DNA sample.
In digital PCR, the samples are diluted (1 : 5–1 : 50 000) in a suitable buffer to
an extent that there is approximately one DNA template molecule per µL. The goal is
to dilute the DNA to the extent that a few samples contain only molecules of low abundant
DNA. The term “low abundant DNA” refers to DNA that originated either from the medicinal
plant material present within a large amount of DNA from filler material (e.g., rice
flour) or DNA derived from small amounts of adulterating plant material. By preparing
several PCRs using one molecule/µL DNA solution as a template, chances are that DNA
molecules of low abundance get amplified and, consequently, detected. The number of
PCRs required depends on the expected frequency of the DNA to be detected. For a sample
containing 10 % of the low abundant DNA, 2 out of 20 PCRs may result in the amplification
of low abundant DNA. The analysis showed 9 (24.3 %) of the 37 herbal supplements required
digital PCR to separate excipient DNA from possible ginkgo DNA. In the study, digital
PCR produced amplicons of ginkgo, rice, and an unidentifiable species [51 ].
Sequencing methods
The conventional method used for generating DNA sequence data to obtain a barcode
from PCR amplicons is Sangerʼs di-deoxy method of sequencing [81 ]. Sangerʼs sequencing technology is capable of generating sequencing reads of up
to 1000 bases and has been the approach used for DNA sequencing in most of the DNA
barcode analyses published to date. The inherent limitations of Sanger-based DNA sequencing
are low throughput and the requirement for high concentrations of DNA (100–500 ng)
to avoid biases and errors [82 ]. Moreover, the method provides two sequencing signal patterns, or electropherograms,
for each sequence generated [83 ]. Hence, the Sanger sequencing method is suitable for herbal materials that contain
only a single medicinal plant. If the herbal preparation (e.g., a dietary supplement)
contains multiple plant species or excipients, co-amplification of barcode sequences
from other material than the intended one can occur due to the nature of the universal
primers during the PCR amplification step. This leads to the production of multiple/overlaying
sequencing peaks and, consequently, a failure of sequencing because the correct DNA
sequence of the barcode cannot be determined ([Fig. 3 ]). Moreover, multiple sequences may also create confusion in the identification of
the “true” barcode and other sequences. Additionally, many plants have symbiotic associations
with bacteria, algae, or fungi [64 ], [65 ]. The occurrence of multiple copies of fungal ITS barcodes creates difficulties in
direct Sanger sequencing. Most of these situations can lead to ambiguity or false
information when the Sanger sequencing method is employed in generating DNA barcodes,
and may result in repeated or failed sequencing attempts.
Fig. 3 Electropherograms showing sequencing signals obtained with Sanger sequencing. The
overlaying peaks (A ) make it difficult to determine the real sequence. It is an indication that the sequenced
template consists of mixed DNA. Additional steps (digital PCR or cloning) are recommended
to identify the DNA source(s). Single peaks (B ) with a low background are desired to determine a sampleʼs DNA sequence. (Color figure
available online only.)
The poor read quality can be improved by processes upstream to sequencing, for example
cloning in a suitable bacterial or microbial host. The DNA fragments are ligated with
a vector and cloned in bacteria. Several bacterial clones are then sequenced to identify
the different DNA sequences. However, cloning introduces biases against extreme base
composition (e.g., stretches with high guanine and cytosine contents), inverted repeats,
and genes not accepted by the bacterial cloning host [84 ]. To overcome the limitations of Sanger-based sequencing for DNA barcoding of processed
or mixed samples, a high-throughput sequencing method called next-generation sequencing
(NGS) has been used [85 ]. The NGS technology allows parallel sequencing of multiple DNA fragments from various
DNA templates in a single reaction [85 ]. It can generate up to one million DNA sequences that are up to 700 bases in length
in a single sequencing run, though the base length is highly variable depending on
the NGS platform/technology being used. The NGS platforms were originally developed
to generate DNA sequence information from whole genomes or large environmental samples.
For example, the whole chloroplast sequence of Ceratophyllum demersum was obtained by Moore et al. [86 ] using the 454 Life Sciences sequencing platform and complete plastomes of 37 Pinus species were assembled by Parks et al. [87 ] on a multiplex Illumina sequencing platform. The NGS method was also useful to verify
the contents of multiple ingredient herbal products by parallel sequencing their barcodes
[88 ]. This method prevents overlaying sequence peaks as found in Sanger sequencing and
therefore facilitates the sorting of DNA barcodes, and, consequently, the identification
of mixed plant material [65 ], [89 ]. NGS proved to be superior to Sanger sequencing in a comparative analysis that included
15 commercial dietary supplements made from crude powdered material (7), extracts
(7), or a mixture of both (1). Reproducible Sanger sequencing using the rbcL and ITS2 gene regions was achieved in four dietary supplements containing crude powdered
material. None of the extracts provided sequences of the labeled ingredient using
Sanger sequencing, but excipient DNA was detected in two supplements instead. The
NGS method using the ITS2 locus yielded results in eight supplements, including three
extracts. The use of the ITS2 gene region for the three valerian (Valeriana officinalis ) root samples was unsuccessful, possibly due to the intraspecific variation of the
plant, which is known to have variable gene size and ploidy levels depending on the
population [65 ]. The NGS method is generally a less expensive method as compared to Sanger sequencing
in terms of per base sequencing cost, however, the cost may increase if only a few
samples are analyzed in a single run. Furthermore, there is an additional cost for
bioinformatics due to the large amount of data obtained from NGS.
The NGS “meta-barcoding” method combines DNA barcoding and high-throughput DNA sequencing
to mass analyze DNA barcodes from sediments or environmental, ancient/historical,
or processed samples [90 ], [91 ]. Coghlan et al. [92 ] used a meta-barcoding technique for the detection of plant and animal DNA from highly
processed TCM. A high-throughput NGS screen was used for 15 complex TCM samples. The
NGS generated over 49 000 sequence reads and, according to the BLAST results, the
analysis showed that the reads belonged to 68 plant families, including two genera
containing possibly toxic species. Some of the TCM samples also contained traces of
CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora)-listed
animal and plant genera such as the Asiatic black bear (Ursus thibetanus ), the Saiga antelope (Saiga tatarica ), and Asian ginseng (P. ginseng ) [92 ].
Accuracy of DNA Barcoding Techniques for Authentication of Botanicals in Herbal Products
Accuracy of DNA Barcoding Techniques for Authentication of Botanicals in Herbal Products
Most of the botanical ingredient DNA barcoding studies published to date focused on
the identification of a universal marker or suitable barcode locus/loci for herbal
raw material authentication. Only a small number of published papers discussed the
use of DNA barcoding for the identification and authentication of botanicals from
finished herbal products and dietary supplements. Srirama et al. [17 ] reported an investigation in which 25 Phyllanthus samples used as raw herbal drugs in the Indian market were assessed for their authenticity.
Their analysis revealed that six different species of Phyllanthus were available on the market based on morphological studies. Seventy-six percent
of the market samples contained Phyllanthus amarus as the predominant species (> 95 %) and the remaining 24 % included five different
species, namely P. debilis, P. fraternus, P. urinaria, P. maderaspatensis , and P. kozhikodianus . Species-specific DNA barcode signatures were developed for the tested Phyllanthus species using the chloroplast DNA region psbA-trnH. The trade sample identities were validated and confirmed by these species-specific
DNA barcodes [17 ]. Ginseng samples (Panax spp.) were tested for their authenticity by both Zuo et al. [93 ] and Wallace et al. [49 ]. Zuo et al. [93 ] analyzed DNA from fresh leaves of 95 ginseng samples, representing all of the species
in the genus Panax . The analysis showed that the combination of psbA-trnH and ITS was able to identify all of the species and clusters in the genus. Wallace
et al. [49 ] tested 41 commercial ginseng samples (raw materials and finished products) in the
North American market and found that the core barcodes matK and rbcL required additional data from ITS for successful species identification. Stoeckle
et al. [94 ] analyzed commercial tea samples (Camellia sinensis ) with 90 % success identification rates using rbcL and matK barcode loci and reported 33 % adulterations in herbal teas. Black cohosh (Actaea racemosa ) samples were analyzed by Baker et al. [95 ], and with a mini-barcode approach using the matK locus they could identify 75 % of the tested samples. They found that 25 % of the
tested black cohosh samples were adulterated. Little and Jeanson [71 ] used mini-barcodes from the rbcL and matK regions to authenticate saw palmetto (S. repens ) herbal dietary supplements. The analysis of these tested supplements demonstrated
that 85 % contained saw palmetto and that 6 % of the supplements contain related species
(Aceolorrhaphe wrightii or an unidentified species) that cannot be legally sold as herbal dietary supplements
in the United States [70 ]. Similarly, a mini-barcode assay of ginkgo (G. biloba ) dietary supplements by Little [51 ] revealed that of the 40 supplements tested, 83.8 % contained identifiable G. biloba DNA, and six supplements (16.2 %) contained fillers without any detectable G. biloba DNA. Substitution of raw herbal materials in local Indian markets was shown for Sida cordifolia
[96 ] and in 50 % of Cassia fistula and Senna spp. [97 ]. Palhares et al. [98 ] analyzed 257 dried or powdered samples from 8 medicinal plant species approved by
the World Health Organization (WHO) for the production of herbal drugs sold in Brazilian
markets. These included witch hazel (Hamamelis virginiana ) leaves, chamomile (Matricaria recutita ) flowers, espinheira santa (Maytenus ilicifolia ) leaves, guaco (Mikania glomerata ) leaves, Asian ginseng (P. ginseng ) roots, passion flower (Passiflora incarnata ) leaves, boldo (Peumus boldus ) leaves, and valerian (V. officinalis ) roots. The DNA barcoding analysis using matK, rbcL , and ITS2 regions confirmed species belonging to the correct genus in 42 % of the
samples. For the remainder, results suggested that the level of substitutions might
be as high as 71 % [98 ], although some of this is due to the misapplication of the common name, e.g., 100 %
of the samples presented as P. ginseng were actually from the genus Pfaffia , also called “Brazilian ginseng”. Recently, Han et al. [47 ] investigated 1436 samples representing 295 medicinal species from 7 primary TCM
markets in China. Their results indicated that of the 1260 samples, approximately
4.2 % were identified as being adulterated and they suggested a regulatory platform
based on DNA barcoding should be established for TCM market supervision. However,
verification of the accuracy of the results using established compendial methods,
e.g., those listed in the European Pharmacopoeia or the United States Pharmacopeia
(USP), was not performed in any of these studies.
The first investigation into a larger set of diverse dietary supplements using DNA
barcoding was published by Newmaster et al. in 2013 [99 ]. In the study, the authenticity of 44 (41 capsules, 2 powders, and 1 tablet) single
ingredient herbal products from 12 companies was evaluated. The validity of the approach
was evaluated by including a second set of samples grown from commercially available
seeds in horticultural greenhouses, including all those plants listed on the herbal
product labels and some closely related species. The authors of the study were able
to recover DNA from 40 out of 44 samples. According to the results, 14 samples out
of 40 were correctly labelled, 14 contained the correct species but included additional
DNA from either another species or from an excipient, and 12 contained only DNA from
other species or excipients. A number of concerns about the study were raised by DNA
experts and members of the American Botanical Council [100 ], e.g., the results were not confirmed by orthogonal methods, such as chemical analysis,
and the DNA barcoding method was not validated to the standard required of the industry.
According to the authors, products contained solely crude powdered raw material; however,
attempts to identify the contents using botanical microscopy were unsuccessful due
to the lack of recognizable plant fragments (S. Newmaster personal communication,
July 8, 2014). Also, although the products were manufactured and sold, the authors
did not take into account the normal manufacturing process for commercial dietary
supplement products. Another issue is the acceptable presence of other species. Monographs
for herbal raw materials, such as those in the European Pharmacopoeia and the United
States Pharmacopeia, allow a certain amount (e.g., 2 % in USP) of foreign organic
matter. DNA from foreign organic matter such as other plant species could be accidentally
introduced at any stage of processing, for example, at the time of collection of herbal
plant material, during storage, drying, grinding, at various stages of the product
manufacturing process, or during the analysis in the quality control laboratory ([Fig. 4 ]). Therefore, it is normal to detect DNA from additional species in crude herbal
raw materials. The results and methodology published by Newmaster et al. [99 ] prompted the New York State Attorney General (NYAG) to launch his own investigation
into the quality of herbal dietary supplements using DNA barcoding (details about
the method have not been made publicly available) [101 ], [102 ]. The results from this investigation suggested that out of 24 commercial products
[labeled to contain Echinacea (Echinacea spp.), garlic (Allium sativum ), ginkgo (G. biloba ), ginseng (Panax spp.), saw palmetto (S. repens ), St. Johnʼs wort (Hypericum perforatum ), or valerian (V. officinalis )] analyzed, only 5 contained DNA of the labeled species. The results led the NYAG
to demand that the four retailers selling the supplements, GNC, Target, Walgreens,
and Walmart, remove the products from their shelves [101 ]. The accuracy of the results was immediately questioned, mostly because the majority
of the products were made from herbal extracts, where DNA was probably fragmented
or degraded, and DNA barcoding (i.e., the amplification of the several hundred nucleotide
long complete barcodes) has not been shown to provide useful results for these cases
[103 ]. In addition, the investigation found DNA from Oryza, Allium , and Dracaena species in 19, 9, and 7 of the samples analyzed, respectively, strongly suggesting
cross-contamination. Another puzzling result was the occurrence of saw palmetto DNA
in one of the valerian samples, again raising the question about cross-contamination.
And while the usefulness of full-length DNA barcoding in providing reliable information
of what types of genera or species might be present in a sample containing crude or
raw botanical materials is generally recognized, the full-length barcoding approach
suffers from the misconception that DNA is always uniformly preserved and available
[72 ], [104 ]. As mentioned before, DNA quality is affected by heat, freeze/thaw cycles, fungal
or bacterial contamination, irradiation, and a number of chemicals. In addition, most
of the DNA is typically removed during metabolite extraction, and any DNA that does
remain in an herbal extract will usually be fragmented. Additional purification steps
used in commercial extraction, such as column chromatography, may eliminate any remaining
DNA altogether. Therefore, the relatively long genomic regions required for universal
DNA barcoding are no longer present in most botanical extracts. According to Little
[72 ], universal DNA barcoding rarely provides accurate species identification, since
distinction among closely related species is difficult, often impossible, and that
the diagnostic features examined may not be distinctive enough for a given species.
Fig. 4 Various steps in the processing of plant material during which exogenous DNA may
be introduced into a sample. (Color figure available online only.)
Method Validation
A critical step in ensuring reliable results from any analytical method is the method
validation. Guidelines for the validation of methods to identify botanical ingredients
have been published, e.g., by AOAC International [105 ]. The guidelines were written with chemical methods in mind, so transferring these
to DNA barcoding may not be straightforward. The method of choice has to be able to
distinguish the species of interest from its adulterants, which may include different
plant parts from the same species. It also needs to be able to determine at what level
of contamination the method can detect an adulterant. This is done by preparing mixtures
of the target species with known amounts of adulterant. These requirements can usually
be fulfilled with raw herbal material, except that DNA barcoding will not be able
to determine the plant part from which the article is derived from. However, method
validations become complicated when finished products are evaluated. DNA barcoding
methods are most often “validated” by using fresh or dried raw plant material in order
to determine if the chosen DNA barcode region can be amplified [72 ], [98 ], [99 ] and distinguished from related species. The usefulness of such validation in finished
products is questionable since processing methods may alter or eliminate the DNA,
excipients and secondary metabolites may interfere with DNA extraction and amplification,
and the presence of DNA from excipients and fillers may lead to erroneous results.
Therefore, the development of guidelines to address the method validation requirements
for DNA barcoding, in particular with regard to finished products, is much needed
and one of the larger issues to be resolved.
Limitations and Future Challenges of DNA Barcoding for Authentication of Herbal Products
Limitations and Future Challenges of DNA Barcoding for Authentication of Herbal Products
The major limitations to DNA barcoding of herbal products are related to the quality
of DNA, primer affinity, PCR amplification, and sequencing of amplicons. Plant DNA
is a relatively stable molecule and can be easily extracted from fresh or dried plant
material using simple DNA extraction methods. However, the manufacturing process of
herbal products that involves extensive heat treatment, irradiation, distillation,
filtration, UV light exposure, and/or supercritical fluid extraction leads to either
complete removal of DNA or degradation of DNA into smaller fragments [52 ]. Hence, DNA barcoding is not feasible for processed herbal products such as extracts
and tinctures in which the DNA is not present at all or highly degraded. Therefore,
the stage at which DNA barcoding analysis should be performed is very important. For
example, in the case of processed herbal materials, the molecular analysis is more
successful if it is carried out at the initial stage during the collection of raw
herbal materials from which the botanical preparation is manufactured. DNA barcoding
is more feasible from dried or powdered raw material from which the extract or tincture
is manufactured. One of the major advantages of DNA-based analysis is the ubiquitous
presence of DNA in all parts of the plant and no effect of seasonal variations on
the quality and quantity of DNA has been observed. However, in traditional systems
of medicine, specific plant parts/tissues collected in a particular season when secondary
metabolite production is highest have been prescribed to be used for therapeutic purposes
[106 ]. DNA barcoding cannot differentiate the different tissues of the plant within the
same species. For example, in the case of Asian ginseng (P. ginseng ), the roots are often mixed or substituted with undeclared P. ginseng leaves. Roots and leaves both contain high levels of ginsenosides, but exhibit a
different chemical profile. In such cases of substitution with the same species parts,
DNA barcoding will not be able to detect the adulteration. Similarly, the approach
will not detect if exhausted material is used, i.e., material where the putative active
components have been removed previously via extraction. Likewise, if the prescribed
plant is collected in the wrong season, the efficacy of the herbal material decreases
and DNA barcoding will fail to detect the substitution with such low quality material.
To overcome these limitations of DNA-based analysis, chemical profiling or other analytical
chemistry methods should be adopted for authentication of herbal products. Thus, DNA
barcoding should go hand-in-hand with chemical analysis and macroscopic and/or microscopic
evaluation to tackle the adulteration problems prevailing in the herbal industry.
Another limitation of DNA barcoding is the universal primer set affinity to the excipient
DNA or DNA of another adulterating or substituted species. Generally, the excipients
are added after the processing of the herbal material is completed and, hence, the
fillerʼs DNA remains intact. Consequently, DNA barcode primers are likely to preferentially
amplify DNA from excipients, possibly yielding a false negative result for the herbal
species that was intended to be detected. PCR bias can partly be overcome by NGS,
digital PCR [51 ], [71 ], or by cloning specific PCR products into vectors and sequencing the amplicon. Moreover,
it is preferable to design species/genus-specific primers for short barcode sequences
(< 200 bp), so-called mini-barcodes, for successful amplification of species-specific
barcodes, rather than using the longer universal barcodes. The shortcomings of Sanger-based
sequencing methods can be overcome by NGS techniques for sequencing of herbal products
containing multiple plant species or admixtures. DNA barcoding, like any other analytical
method, has its own limitations, and hence, we view it as an additional identification
tool for the authentication of herbal products in combination with other established
methods.
One of the major challenges of DNA barcoding for authentication of herbal products
is the lack of reference libraries and voucher specimens linked to reference DNA sequences
in the GenBank database. The creation of a mini-barcode reference library or Herb-BOL
(also suggested by Mishra et al. [107 ]), containing all of the authentic reference barcode sequences linked to the respective
taxonomically validated herbarium vouchers, would be a useful tool to ensure access
to reliable DNA barcodes. The use of a barcode reference library could provide a basis
for using DNA technologies as a cGMP-compliant approach for the authentication of
herbal products and dietary supplements in the future.
Conclusions
The DNA sequencing technologies to identify medicinal plant species in herbal products
and dietary supplements is a highly reliable and promising tool under specific conditions,
such as the correct stage of analysis when the DNA could be detected, primer affinity
for successful PCR amplification, and absence of contaminating DNA. The detection
of adulteration of botanical ingredients could be improved if DNA barcoding is routinely
and appropriately used for authentication of herbal materials. It is important to
apply the most appropriate method to efficiently detect and identify the analyzed
raw or processed material. However, the inherent limitations of the DNA barcoding
methods make it unsuitable as a stand-alone tool for identifying and authenticating
the herbal plant species. Therefore, we advocate the addition of DNA barcoding to
the other existing analytical methods for authentication of botanical ingredients
in herbal medicines and dietary supplements.
Supporting information
Reference materials for the DNA barcoding of H. perforatum L. (St. Johnʼs wort) are available as Supporting Information.
Acknowledgements
This research was funded, in part, by the Food and Drug Administration grant no. 1U01FD004246–05.
We thank Jon Parcher for his revision of the manuscript and suggestions.