We present here a protocol to construct and validate models for nondestructive prediction of total sugar, total organic acid, and total anthocyanin content in individual blueberries by near-infrared spectroscopy.
Nondestructive prediction of ingredient contents of farm products is useful to ship and sell the products with guaranteed qualities. Here, near-infrared spectroscopy is used to predict nondestructively total sugar, total organic acid, and total anthocyanin content in each blueberry. The technique is expected to enable the selection of only delicious blueberries from all harvested ones. The near-infrared absorption spectra of blueberries are measured with the diffuse reflectance mode at the positions not on the calyx. The ingredient contents of a blueberry determined by high-performance liquid chromatography are used to construct models to predict the ingredient contents from observed spectra. Partial least squares regression is used for the construction of the models. It is necessary to properly select the pretreatments for the observed spectra and the wavelength regions of the spectra used for analyses. Validations are necessary for the constructed models to confirm that the ingredient contents are predicted with practical accuracies. Here we present a protocol to construct and validate the models for nondestructive prediction of ingredient contents in blueberries by near-infrared spectroscopy.
Near-infrared (NIR) spectroscopy is widely applied as a nondestructive technique to analyze contents of fruits and vegetables of various kinds.1,2 Nondestructive analyses by NIR spectroscopy enable the shipping of only delicious fruits and vegetables with guaranteed qualities. NIR spectroscopy has already been applied to orange, apple, melon, cherry, kiwi fruit, mango, papaya, peach and so on to know their Brix that corresponds to the total sugar content, acidity, TSC (total solids contents), and so on. Recently, we have reported the application of NIR spectroscopy to the quality evaluation of blueberries.3 We measured not only the total sugar content and the total organic acid content corresponding to acidity, but also the total anthocyanin content. Anthocyanin is a bioactive component which is believed to improve human health. It is convenient for consumers if they can buy delicious blueberries with an assurance of their sugar content, acidity, and anthocyanin content.
In NIR absorption spectra of fruits and vegetables, only broad absorption bands are observed. They are mainly the bands due to fiber and moisture. Although many weak bands due to various ingredients of the non-destructed target are observed simultaneously, the observed bands cannot be assigned to specific vibrational modes of specific components of the target in most cases. Therefore, the traditional technique to determine the content of a specific component using the Lambert-Beer's law is not effective for NIR spectra. Instead, calibration models to predict the contents of the target components from the observed spectra are constructed using chemometrics by examining the correlation between the observed spectra and the ingredient contents corresponding to the spectra.4,5 Here, a protocol to construct and validate the models for prediction of total sugar content, total organic acid content corresponding to acidity, and total anthocyanin content of blueberries from NIR spectra is presented.
Figure 1 shows the general flow chart to construct reliable and robust calibration models. Samples of sufficient number are collected. Some of them are used for the construction of models while the others are used for the validation of the constructed models. For each of collected samples, an NIR spectrum is measured, and then the target components are analyzed quantitatively with traditional destructive chemical analysis methods. Here, high-performance liquid chromatography (HPLC) is used for the chemical analyses of sugars, organic acids, and anthocyanins. Partial least squares (PLS) regression is used for the construction of calibration models where the correlation between the observed spectra and the ingredient contents determined by chemical analyses is examined. In order to construct robust models with the best prediction ability, the pretreatments of observed spectra and the wavelength regions used for the prediction are also examined. Finally, the constructed models are validated to confirm their sufficient prediction ability. In the validation, the contents predicted from the observed spectrum by the constructed model (predicted values) are compared with the contents determined by the chemical analyses (observed values). If the sufficient correlation cannot be found between the predicted and observed values, the calibration model should be re-constructed until the sufficient correlation is obtained. Although it is preferable to use different groups of samples for the construction and validation of a model as shown in this figure (external validation), samples in a same group are used both for the construction and the validation (cross validation) when the number of samples is not large enough.
Figure 1. Flow chart for the construction and validation of the calibration model. The procedures surrounded by blue and green lines correspond, respectively, to the construction of a calibration model and its validation. Please click here to view a larger version of this figure.
1. Collection of Samples
2. Measurements of Spectra
3. Pretreatment for HPLC Measurements of Sugars and Organic Acids8
Note: Extract sugars and organic acids of each blueberry, which are soluble in water, with ultrapure water as follows. The whole of each blueberry is used for analyses.
4. HPLC Measurements of Sugars
Note: In this study, sum content of sucrose, glucose and fructose of each blueberry is considered as the total sugar content. Therefore, the working curve for each of three sugars is obtained first, and then sum content of the sugars in each blueberry is obtained. The standard contents are reported as 0.3-0.4 wt% (sucrose), 3.8-4.8 wt% (glucose), and 4.2-5.3 wt% (fructose).9
5. HPLC Measurements of Organic Acids
Note: In this study, sum content of citric acid, quinic acid, malic acid, and succinic acid are considered as the total organic acid content. Therefore, working curve for each of four organic acids is obtained first, and then the organic acid content in each blueberry is measured. The standard contents are reported as 0.42-0.62 wt% (citric acid), 0-0.15 wt% (quinic acid), 0.08-0.23 wt% (malic acid), and 0.06-0.25 wt% (succinic acid).9
6. Pretreatment for HPLC Measurements of Anthocyanins
7. HPLC Measurements of Anthocyanins
Note: About 13 kind anthocyanins are included in blueberries. Since it is difficult to get working curves for all anthocyanins, a working curve for only cyanidin-3-O-glucoside chloride, one of the most popular anthocyanins in blueberries, is obtained. The working curve is applied for approximate quantifications of other anthocyanins.
8. Construction of Calibration Models for Prediction of Ingredient Contents
Note: PLS regression,4,5 which is a kind of multiple regression technique using latent variants, is used for the construction of calibration models for each ingredient from the observed spectra and the ingredient contents determined by chemical analyses. PLS regression is performed either with the commercial programs or with the home-made programs. See references5,10 for the detailed processes of the construction of models.
9. Validation of the Constructed Calibration Models
Note: See references5,10 for the detailed processes of the validation of constructed models.
Figure 2 shows as an example a set of NIR absorption spectra of blueberries where spectra of 70 blueberries are shown simultaneously. Since the bands definitely assignable to sugars, organic acids, or anthocyanins are not observed in the NIR spectra, traditional Lambert-Beer's law is not applicable to quantify the ingredient contents. Therefore, the construction of models for the prediction of ingredient contents is necessary.
Figure 3 shows typical chromatograms for the quantitative analysis of sugars in blueberries. Three panels from the top are, respectively, the chromatograms of standard solutions of sucrose, glucose, and fructose. The bottom panel shows a chromatogram of a sample solution, i.e. the extract of a blueberry. The kinds and concentrations of sugars in the sample solution are known from the retention times and the area intensities of the observed peaks. Total sugar content is obtained as the sum of sucrose, glucose, and fructose contents.
Figure 4 shows an example of chromatogram for the analysis of organic acids in a blueberry. By referring to the chromatograms of the standard solution (not shown here), the kinds and concentrations of organic acids in the sample solution are known. For the assignments of the observed peaks shown in the figure legend, two peaks are observed for quinic acid in the chromatograms of the standard and sample solutions. They might be assignable to isomers of quinic acid. Total organic acid content is obtained as sum of citric acid, quinic acid, malic acid, and succinic acid contents.
Figure 5 shows an example of chromatogram for the analysis of anthocyanins in a blueberry. Many peaks corresponding to different kind anthocyanins are observed. Since the order of elusion for these anthocyanins were reported14,15 as shown in Table 1, the observed peaks can be assigned to individual anthocyanins. Total anthocyanin content is obtained as the sum of the contents of 13 kinds of anthocyanins.
Calibration models are constructed from the observed spectra and the chemically determined ingredient contents. Table 2 shows an example of the examination of pretreatments. Six type pretreatments including "none (without pretreatment)" were examined for the construction of the calibration model of total sugar content using spectra at a fixed wavenumber region of 12,500-3,600 cm-1. Different pretreatments result in different prediction performances. Performances of the models were evaluated with R2 and RPD. The type of pretreatments that gives the best prediction performance is chosen. In Table 2, "Second derivative + MSC," which means MSC after the second derivative calculation, gives the best results. Then the wavenumber regions used for the model construction are examined by varying the regions with the fixed pretreatments.
Figure 6 shows as an example a result of cross validation of the calibration model for total sugar content, where the correlation between the values predicted by NIR spectroscopy and those determined by HPLC is shown. The model was constructed with "the second derivative + MSC" as the pretreatments and using the 8,539-7,775 cm-1 region of the spectra. The prediction performance of the model is R2 = 0.85 and RPD = 2.6, just above the criteria for the practical use. In this example, the number of samples used for the model construction was 30, which is too small to construct high performance models.
Figure 7 shows as an example a result of the cross validation of the calibration model for total organic acid content, where the correlation between the values predicted by NIR spectroscopy and those determined by HPLC is shown. The model was constructed with "the first derivative + MSC" as the pretreatments and using the 7,505-5,446 and 4,605-4,242 cm-1 regions of spectra. The prediction performance of the model is R2 = 0.92 and RPD = 3.6, which are sufficient for the practical application.
Figure 8 shows as an example a result of the external validation of the calibration model for total anthocyanin content, with the correlation between the values predicted by NIR spectroscopy and those determined by HPLC. The model was constructed with "the first derivative" as the pretreatment and using the 12,489-6,094 and 4,605-4,242 cm-1 regions of spectra. The prediction performance of the model is R2 = 0.95 and RPD = 4.4, which shows fairly good performance of the constructed model. Since anthocyanin exists mainly in the peel of blueberries, it is easily observed with diffuse reflectance measurements although its content in a blueberry is not high. The good performance shown in Figure 8 would be caused also by the large number of samples (70) used for model construction.
Figure 2. NIR absorption spectra of blueberries. Spectra of 70 blueberries are shown simultaneously. Please click here to view a larger version of this figure.
Figure 3. Chromatograms for the quantitative analysis of sugars in blueberries. Chromatograms of standard solutions of (A) sucrose, (B) glucose, (C) fructose, and (D) a chromatogram of a sample solution. Please click here to view a larger version of this figure.
Figure 4. Chromatogram for the quantitative analysis of organic acids in a blueberry. Observed peaks correspond to citric acid (1), malic acid (2), quinic acid (0 and 3), and succinic acid (4). Please click here to view a larger version of this figure.
Figure 5. Chromatogram for the quantitative analysis of anthocyanins in a blueberry. Observed peaks are assigned to individual anthocyanins listed in Table 1 where the standard retention time for each anthocyanin is shown. Please click here to view a larger version of this figure.
Figure 6. A result of cross validation of the model for total sugar content. The values predicted by NIR spectroscopy are plotted against those determined by HPLC. R2 = 0.85 and RPD = 2.6 are obtained. Please click here to view a larger version of this figure.
Figure 7. A result of cross validation of the model for total organic acid content. The values predicted by NIR spectroscopy are plotted against those determined by HPLC. R2 = 0.92 and RPD = 3.6 are obtained. Please click here to view a larger version of this figure.
Figure 8. A result of external validation of the model for total anthocyanin content. The values predicted by NIR spectroscopy are plotted against those determined by HPLC. R2 = 0.95 and RPD = 4.4 are obtained. Please click here to view a larger version of this figure.
Formula | Anthocyanin | Representative Retention time (min) |
C21H21O12 | Delphinidin-3-O-galactoside | 17.3 |
C21H21O12 | Delphinidin-3-O-glucoside | 19.7 |
C21H21O11 | Cyanidin-3-O-galactoside | 22.8 |
C20H19O11 | Delphinidin-3-O-arabinoside | 23.6 |
C21H21O11 | Cyanidin-3-O-glucoside | 24.5 |
C22H23O12 | Petunidin-3-O-galactoside | 28.7 |
C20H19O10 | Cyanidin-3-O-arabinoside | 31.3 |
C22H23O12 | Petunidin-3-O-glucoside | 36.0 |
C22H23O11 | Peonidin-3-O-galactoside | 37.0 |
C21H21O11 | Petunidin-3-O-arabinoside | 40.8 |
C22H23O11 | Peonidin-3-O-glucoside | 43.7 |
C23H25O12 | Malvidin-3-O-galactoside | 45.0 |
C23H25O12 | Malvidin-3-O-glucoside | 49.6 |
Table 1. Major anthocyanins contained in blueberries. The representative retention times in the HPLC analysis under the present experimental conditions are also listed.
Preprocessing | Wavenumber region used for analysis (cm-1) | RPD | R2 |
None | 12,500-3,600 | 1.7 | 0.69 |
Second derivative | 12,500-3,600 | 2.6 | 0.85 |
First derivative | 12,500-3,600 | 2.5 | 0.84 |
MSC | 12,500-3,600 | 2.3 | 0.81 |
Second derivative + MSC | 12,500-3,600 | 2.8 | 0.88 |
First derivative + MSC | 12,500-3,600 | 2.7 | 0.87 |
Table 2. An examination of the dependence of prediction performance on the pretreatments of the observed spectra. R2 and RPD for the prediction of total sugar content are listed.
Some additional comments on the protocol are described here. Firstly, in step 1.1, it is mentioned to decide the cultivars included in the target. Although it is possible to construct models covering blueberries from many cultivars or without specifying cultivars, the prediction accuracies with the models are sometimes much lower than those with the models for a single cultivar and for limited cultivars. It should also be noted that the calibration models should be constructed for blueberries from each production site to get high prediction performance because blueberries harvested at different production sites have different characteristics which affect prediction performance.1
Secondly, in step 2.3, it is mentioned to select the diffuse reflectance mode for the measurements of spectra. The transmission mode is also prepared for measurements on the spectrophotometer. Although spectra measured in the transmission mode are also available for the construction of calibration models, more accurate and more robust models can be constructed with the spectra measured in the diffuse reflectance mode in most cases. The total organic acid contents cannot be predicted with the spectra measured in the transmittance mode.3
Thirdly, for the measurements of spectra of blueberries, it is not recommended to measure spectra at the calyx since the surface condition and the contents around the calyx are different from those at other positions. Nevertheless, it is possible to construct calibration models using an ample number of spectra measured at both the calyx and other positions. However, the accuracies of the models are in most cases lower than those of the models constructed with the spectra measured only at positions other than the calyx.
Fourthly, a NIR spectrum of blueberry depends on the temperature. Therefore, for precise prediction either it is important to always measure spectra at the same ambient temperature or to construct calibration models with compensation for temperature variation.1
Fifthly, although only R2 and RPD are used for choosing the pretreatments and assessing the performance of constructed models here, some other values such as SEC (Standard Error of Calibration), SEP (Standard Error of Prediction), SECV (Standard Error of Cross-Validation), RMSEP (Root Mean Square Error of Prediction), and RMSECV (Root Mean Square error of Cross-Validation) are usually used for more detailed examination. In our previous paper,3 for example, RMSEP and RMSECV were used for choosing pretreatments and assessing the performance of constructed models.
Nondestructive prediction of total sugar, total organic acid, and total anthocyanin contents in a blueberry was found to be possible if the models for predictions are constructed properly. This technique is applicable for the selection of only delicious blueberries from all harvested ones, which cannot be achieved with other traditional analytical techniques.8,9 Although the procedures of the chemical analyses may seem complicated, they are included in popular analytical techniques and are not accompanied by great difficulties. It is important to get accurate results for the chemical analyses because the results are the basis of the constructed model. In this study, RSD (relative standard deviation) of the HPLC measurements was around 1%. It is also necessary to follow the basic procedure, e.g. as shown in Figure 1, for the construction of practically applicable models.
Simple and quick methods instead of HPLC can be applied for the chemical analyses. Total sugar content and acidity can be measured, respectively, with a refractometer (Brix meter) and a pH meter. The pH differential method16,17 is applicable for the measurement of total anthocyanin content. Application of the simple methods make the construction of models much easier although the accuracy of values predicted by the models might be lower than those predicted by the models constructed on the basis of the HPLC measurements shown here. Nevertheless, the accuracy of the models constructed on the basis of simple chemical analyses may be practically applicable at production sites and circulation processes because high accuracies are not always needed there. The methods for the chemical analyses, therefore, should be selected according to the accuracies needed for the models to be constructed.
Although some fruits such as apple and orange are sold generally with guaranteed sugar contents and acidities, blueberries have not been sold with guaranteed qualities. As a result, the quality of the commercial blueberries does not seem stable at least in Japan; sometimes low quality blueberries are sold in markets. The nondestructive analytical methods by NIR spectroscopy shown here is expected to enable, in principle, the shipment and sale of blueberries with guaranteed qualities.
Finally, there are limitations of this method. Firstly, as mentioned above, the construction of prediction models is rather troublesome. Moreover, the prediction model should be constructed for each site and each year of cultivation because the difference in the amounts of coexisting components (which depend on the site and the year of cultivation) affects the precision of the prediction. Therefore, some effort is needed for the maintenance of prediction models. Secondly, although we have shown that near-infrared spectroscopy is, in principle, applicable for the quality check of blueberries, the equipment and techniques shown here are only used in the laboratory and not applicable at production sites because a quick check of the large amounts of berries at production sites is impossible. Practical development of suitable equipment and development of robust calibration models suitable for use in production sites and circulation processes are future directions.
The authors have nothing to disclose.
This work was partially supported by the project “A Scheme to Revitalize Agriculture and Fisheries in Disaster Area through Deploying Highly Advanced Technology” of Ministry of Agriculture, Forestry and Fisheries, Japan.
FT-NIR spectrophotometer | Bruker Optics GmbH | MPA | |
High-Performance Liquid Chromatography | Shimadzu Corporation | 228-45041-91, 228-45000-31, 228-45018-31, | For sugar analysis |
223-04500-31, 228-45010-31, 228-45095-31 | Refractive Index Detector | ||
High-Performance Liquid Chromatography | Shimadzu Corporation | 228-45041-91, 228-45003-31, 228-45000-31, | For organic acid analysis |
228-45018-31, 228-45010-31, 223-04500-31 | Ultraviolet-Visible Detector | ||
High-Performance Liquid Chromatography | Shimadzu Corporation | 228-45041-91, 228-45018-31, 228-45000-31, | For anthocyanin analysis |
228-45012-31, 228-45119-31, 228-45005-31, | Photodiode Array Detector | ||
228-45009-31 | |||
pH meter | Mettler-Toledo | 30019028 | S220, Automatic temperature compensation |
Ultra-pure water treatment equipment | ORGANO Corporation | ORG-ULXXXM1; PRA-0015-0V0 | PURELAB ultra; PURELITE |
Biomedical Freezers | SANYO | 2-6780-01 | MDF-U338 |
Ultra-Low Temperature Freezer | Panasonic healthcare Co.,Ltd. | KM-DU73Y1 | -80°C |
Vacuum lyophilizer | IWAKI GLASS Co.,Ltd | 119770 | DRC-3L;FRD-82M |
Homoginizer | Microtec Co., Ltd. | Physcotron | |
Ultracentrifuge | Hitachi Koki Co.,Ltd | S204567 | CF15RXII |
Mini-centrifuge | LMS CO.,LTD. | KN3136572 | MCF-2360 |
Centrifuge | Kokusan Co.,Ltd | 2-5534-01 | H-103N |
Filter Paper | Advantec | 1521070 | 5B, Eqivalent to Whatman 40 |
Sep-Pak C18 column | Waters Corporation Milford | WAT020515 | |
Sep-Pak CM column | Waters Corporation Milford | WAT020550 | |
Sep-Pak QMA column | Waters Corporation Milford | WAT020545 | |
Centrifugal Filter Unit | Merck Millipore Corporation | R2SA18503 | PVDF, 0.45 μm |
Microtube | As One Corporation | 1-1600-02 | PP, 2 mL |
Syringe Filter | GE Healthcare CO.,LTD. | 6788-1304 | PP, 0.45 μm |
Sucrose | Wako Pure Chemical Industries,Ltd | 194-00011 | Reagent-grade |
Glucose | Wako Pure Chemical Industries,Ltd | 049-31165 | Reagent-grade |
Fructose | Wako Pure Chemical Industries,Ltd | 123-02762 | Reagent-grade |
Citric acid | Wako Pure Chemical Industries,Ltd | 036-05522 | Reagent-grade |
Malic acid | Wako Pure Chemical Industries,Ltd | 355-17971 | Reagent-grade |
Succinic acid | Wako Pure Chemical Industries,Ltd | 190-04332 | Reagent-grade |
Quinic acid | Alfa Aesar, A Johnson Matthey Company | 10176328 | Reagent-grade |
Phosphoric acid | Wako Pure Chemical Industries,Ltd | 162-20492 | HPLC-grade |
Trifluoroacetic acid | Wako Pure Chemical Industries,Ltd | 208-02746 | Reagent-grade |
Methanol | Wako Pure Chemical Industries,Ltd | 131-01826 | Reagent-grade |
Acetonitrile | Wako Pure Chemical Industries,Ltd | 015-08633 | HPLC-grade |
Grade cyanidin-3-O-glucoside chloride | Wako Pure Chemical Industries,Ltd | 306-37661 | HPLC-grade |
Software for analyses | Bruker Optics GmbH | OPUS ver. 6.5 | |
Softoware for preprocessing | Microsoft | Excel powered by Visual Basic for Applications | |
Software for construction of models | Freemat 4.0 | http://freemat.sourceforge.net/ |