The article is based on the creation of an adapted protocol to scan, detect, sort, and identify digitized objects corresponding to benthic river macroinvertebrates using a semi-automatic imaging procedure. This procedure allows the acquisition of the individual size distributions and size metrics of a macroinvertebrate community in about 1 h.
Body size is an important functional trait that can be used as a bioindicator to assess the impacts of perturbations in natural communities. Community size structure responds to biotic and abiotic gradients, including anthropogenic perturbations across taxa and ecosystems. However, the manual measurement of small-bodied organisms such as benthic macroinvertebrates (e.g., >500 µm to a few centimeters long) is time-consuming. To expedite the estimation of community size structure, here, we developed a protocol to semi-automatically measure the individual body size of preserved river macroinvertebrates, which are one of the most commonly used bioindicators for assessing the ecological status of freshwater ecosystems. This protocol is adapted from an existing methodology developed to scan marine mesozooplankton with a scanning system designed for water samples. The protocol consists of three main steps: (1) scanning subsamples (fine and coarse sample size fractions) of river macroinvertebrates and processing the digitized images to individualize each detected object in each image; (2) creating, evaluating, and validating a learning set through artificial intelligence to semi-automatically separate the individual images of macroinvertebrates from detritus and artifacts in the scanned samples; and (3) depicting the size structure of the macroinvertebrate communities. In addition to the protocol, this work includes the calibration results and enumerates several challenges and recommendations to adapt the procedure to macroinvertebrate samples and to consider for further improvements. Overall, the results support the use of the presented scanning system for the automatic body size measurement of river macroinvertebrates and suggest that the depiction of their size spectrum is a valuable tool for the rapid bioassessment of freshwater ecosystems.
Benthic macroinvertebrates are broadly used as bioindicators to determine the ecological status of water bodies1. Most indices to describe macroinvertebrate communities focus on taxonomic metrics. However, new bioassessment tools that integrate body size are encouraged to provide an alternative or complementary perspective to taxonomic approaches2,3.
Body size is considered a metatrait that is related to other vital traits such as metabolism, growth, respiration, and movement4. Furthermore, body size can determine trophic position and interactions5. The relationship between individual body size and the normalized biomass (or abundance) by size class in a community is defined as the size spectrum6 and follows the general pattern of a linear decrease in normalized biomass as individual size increases on a logarithmic scale7. The slope of this linear relationship has been extensively studied theoretically, and empirical studies across ecosystems have used it as an ecological indicator of the community size structure4. Another synthetic indicator of community size structure that has been successfully used in biodiversity-ecosystem functioning studies is community size diversity, which is represented as the Shannon index of the size classes of the size spectrum or its analog, which is calculated based on the individual size distributions8.
In freshwater ecosystems, the size structure of different faunal groups is used as an ataxic indicator to assess the response of biotic communities to environmental gradients9,10,11 and to anthropogenic perturbations12,13,14,15,16. Macroinvertebrates are not an exception, and their size structure also responds to environmental changes17,18 and anthropogenic perturbations, such as mining19, land use20, or nitrogen (N) and phosphorus (P) enrichment20,21,22. However, measuring hundreds of individuals to describe the community size structure is a tedious and time-consuming task that is often avoided as a routine measurement in laboratories due to a lack of time. Thus, several semi-automatic or automatic imaging methods to classify and measure specimens have been developed23,24,25,26. However, most of these methods are focused on taxonomic classification more than on the individual size of the organisms and are not ready to use for all kinds of macroinvertebrates. In marine plankton ecology, a scanning image analysis system has been extensively used to determine the size and taxonomic composition of zooplankton communities27,28,29,30,31. This instrument can be found in several marine institutes worldwide, and it is used to scan preserved zooplankton samples to obtain high-resolution digital images of the entire sample. The present protocol adapts the use of this instrument to estimate the macroinvertebrate community size spectrum in rivers in a rapid automatic manner without investing in creating a new device.
The protocol consists of scanning a sample and processing the whole image to automatically obtain single images (i.e., vignettes) of the objects in the sample. Several measures of shape, size, and grey-level features characterize each object and allow for the automatic classification of the objects into categories, which are then validated by an expert. The individual size of each organism is calculated using the ellipsoidal biovolume (mm3), which is derived from the area of the organism measured in pixels. This allows for obtaining the size spectrum of the sample in a rapid manner. To the best of our knowledge, this scanning imaging system has only been used to process mesozooplankton samples, but the device may potentially allow for working with freshwater benthic macroinvertebrates.
The overall goal of this study is, therefore, to introduce a method to rapidly obtain the individual size of preserved river macroinvertebrates by adapting an existing protocol previously used with marine mesozooplankton27,32,33. The procedure consists of using a semi-automatic approach that operates with a scanning device to scan water samples and three open software to process the scanned images. An adapted protocol to scan, detect, and identify digitized river macroinvertebrates to automatically acquire the community size structure and related size metrics is herein presented. The assessment of the procedure and guidelines to enhance the efficiency are also presented based on 42 scanned images of riverine macroinvertebrate samples collected from three basins in the North-Eastern (NE) Iberian Peninsula (Ter, Segre-Ebre, and Besòs).
The samples were collected at 100 m river stretches following the protocol for field sampling and laboratory analysis of benthic river macroinvertebrates in fordable rivers from the Spanish Government34. The samples were collected with a surber sampler (frame: 0.3 m x 0.3 m, mesh: 250 µm) following a multi-habitat survey. In the laboratory, the samples were cleaned and sieved through a 5 mm and a 500 µm mesh to obtain two subsamples: a coarse subsample (5 mm mesh) and a fine subsample (500 µm mesh), which were stored in separate vials and preserved in 70% ethanol. Separating the sample into two size fractions allows for a better estimation of the community size structure, since large organisms are rarer and fewer than the small organisms. Otherwise, the scanned sample has a biased representation of the large size fraction.
NOTE: The protocol described here is based on the system developed by Gorsky et al.27 for marine mesozooplankton. A specific description of the scanner (ZooSCAN), scanning software (VueScan 9×64 [9.5.09]), image processing software (Zooprocess, ImageJ), and automatic identification software (Plankton Identifier) steps can be found in previous references32,33. To best adjust the sizes of the benthic macroinvertebrates with respect to the mesozooplankton, once the project is created following the original protocol32,33, change the parameter of minimum size (minsizeesd_mm) to 0.3 mm and the parameter of maximum size (maxsizeesd_mm) to 100 mm in the configuration file. To help follow the protocol, this is summarized in a work chart (Figure 1). The created project is stored in the computer's C folder and is organized in the following folders: PID_process, Zooscan_back, Zooscan_check, Zooscan_config, Zooscan_meta, Zooscan_results, and Zooscan_scan. Each folder is composed of several subfolders that the different software applications use in the following steps of the protocol.
1. Acquisition of digital images for macroinvertebrate samples
2. Automatic recognition of the objects
NOTE: Create a learning set to automatically predict the identity of the detected objects, thus separating the organisms from the debris in the sample.
Figure 1: Work chart representing section 1 and section 2 of the protocol. The times are illustrative and could change depending on the computer, the abundance of vignettes to process, and the number of categories of the learning set. This case corresponds to the validation of a learning set of three categories on a set of 42 subsamples (in total, 47,473 vignettes). Please click here to view a larger version of this figure.
3. Calculating the individual size distribution, size spectra, and size metrics
NOTE: The calculations mentioned in this section were performed using Matlab (see script as Supplementary File 1).
Size class limits (mm3) | Size class mid-point (mm3) |
0,1236 | 0,1855 |
0,2473 | 0,3709 |
0,4946 | 0,7418 |
0,9891 | 1,4837 |
1,9783 | 1,4837 |
3,9560 | 5,9348 |
7,9131 | 11,8696 |
15,8261 | 23,7392 |
31,6522 | 47,4783 |
63,3044 | 94,9567 |
126,6089 | 189,9133 |
253,2178 | 379,8267 |
506,4300 | 7597,7000 |
1012,9000 | 15193,0000 |
2025,7000 |
Table 1: Size classes of the normalized biomass size spectrum (NBSS). The table also shows the 15 size class limits and the size class mid-points of the organisms.
Acquisition of digital images of macroinvertebrate samples
Scanning nuances: Ethanol deposition in the scan tray
While testing the system for macroinvertebrates, several scans were of poor quality. A dark saturated area in the background prevented normal processing of the image and the measurement of the individual sizes of the macroinvertebrates (Figure 2). Several reasons have been given for the appearance of saturated areas in the background or highly pixelated images: (1) the presence of too many organisms on the scan tray; (2) the presence of dirtiness in the samples; (3) an insufficient delay between the preview of the sample and its scan; or (4) using in the image processing a background image of poor quality because of condensation, dirtiness, or poor water quality33. In macroinvertebrate community samples, the use of ethanol instead of water causes precipitation on the tray, which forms a dark shadow if it is not properly rinsed with water in between scans. This is vital to obtain sharp images and to minimize any related corrosion of the scan tray glass.
Scanning nuances: Debris concentration
From the analysis of a subset of 47,473 vignettes, a high percentage (86.1%) corresponded to debris, including detritus, fibers, or body parts (such as legs or gills), or scanning artifacts (Figure 3A–E). Invertebrate organisms corresponded to the remaining 13.9% of the detected objects (Figure 3F–L). Thus, despite the previous meticulous separation of organisms from organic matter in the laboratory, plenty of small debris still remained in the vial.
Scanning nuances: Touching objects
The significant presence of debris enhances the touching between organisms, and therefore, the creation of vignettes with aggregates that include multiple touching organisms and organisms attached to particles or fibers (Figure 4A–C). These vignettes are a source of bias in determining the shape of the individual size structure. In a set of five samples (11 subsamples), out of all the vignettes with any macroinvertebrates, 10% corresponded to groups with touching organisms or organisms touching particles or fibers. Those vignettes were edited with the image processing program in order to separate the touching organisms and the organisms with particles attached. Reprocessing the samples with the separation mask involved the creation of new vignettes with the newly separated objects, which were validated to ensure their proper classification.
Automatic recognition of the objects
Learning set results
A learning set is a set of vignettes of objects classified into different categories by an expert and used in a supervised learning model, and this can also be called a training set27. It is possible to work with an existing learning set, update the existing learning set with new vignettes and/or categories, or create a new learning set for a specific project.
To determine the best learning set to rapidly obtain the macroinvertebrate size structure, several learning sets were created and tested through cross-validation with the Random Forest algorithm. The resulting confusion matrix shows the true classification (rows) versus the automatic classification (columns). The recall is the percentage of organisms belonging to a category that was automatically well classified, whereas the 1-precision is the percentage of organisms misclassified by the algorithm as belonging to a category (contamination in a category)33. As a rule of thumb, the recall should be above 70%, and the contamination (1-precision) should be lower than 20% to keep a category in the learning set. The learning set with the greatest recall and precision for macroinvertebrates is then further validated with a subset of samples to determine its real accuracy in macroinvertebrate identification.
Three types of ataxic learning sets (raw, intermediate, and fine) with categories based on the morphological features of the objects were tested. The raw learning set included three categories: macroinvertebrates, other invertebrates (microcrustaceans), and debris (fibers, particles, and artifacts like glass stains). The intermediate learning set included 16 categories: 5 for macroinvertebrates, 3 for other invertebrates, and 8 for debris. The fine learning set included 4 more categories of macroinvertebrates, with a total of 20 categories (Table 2).
In addition to defining the categories, the effect of the number of vignettes per category was also tested. Each learning set was tested separately using 50 vignettes, 100 vignettes, and 300 vignettes in each category (and 500 vignettes for the raw learning set with three categories). All the categories were balanced in number except for "Ostracoda", "long-round macroinvertebrates", and "round shell macroinvertebrates", which included fewer individuals in the 100 vignette and 300 vignette learning sets because not enough organisms of these categories were detected in the scanned images.
The recall and precision for macroinvertebrates (all the macroinverebrate categories together) and organisms (the macroinvertebrate and other invertebrate categories together) were considered to select the best learning set by cross-validation (see the tables in Supplementary File 2). The best learning set was the raw learning set with three categories (macroinvertebrates, other invertebrates, and debris), with 300 objects in each category (Table 2). The raw learning set was subsequently used to validate the automatic classification of the objects in the subset of scanned samples.
Learning set | Number of categories | Images per category | Recall Organisms | Recall macro-invertebrates | 1-precision organisms | 1-precision macroinvertebrates |
Raw | 3 | 50 | 0.97 | 0.84 | 0.12 | 0.24 |
100 | 0.96 | 0.87 | 0.06 | 0.17 | ||
300 | 0.95 | 0.91 | 0.09 | 0.15 | ||
500 | 0.93 | 0.88 | 0.13 | 0.2 | ||
Medium | 16 | 50 | 0.83 | 0.77 | 0.17 | 0.24 |
100 | 0.84 | 0.79 | 0.15 | 0.21 | ||
300 | 0.87 | 0.84 | 0.14 | 0.18 | ||
Fine | 20 | 50 | 0.89 | 0.86 | 0.14 | 0.18 |
100 | 0.9 | 0.87 | 0.11 | 0.14 | ||
300 | 0.9 | 0.86 | 0.13 | 0.14 |
Table 2: Created and tested learning sets (raw, intermediate, and fine) with the categories within each one and the number of objects per category. Recall and 1-precision of the created learning sets. Categories of the Raw learning set: Macroinvertebrates (1), Other invertebrates (2), Debris (3). Categories of the Medium learning set: Long macroinvertebrates (1), Long smooth macroinvertebrates (2), Long spiky macroinvertebrates (3), Round macroinvertebrates (4), Round shell macroinvertebrates (5), Cladocera (6), Copepoda (7), Ostracoda (8), Aggregates (9), Fibres (10), Heads (11), Legs (12), Stains (13), Dark stains (14), Light grey stains (15), Round stains (16). categories of the Fine learning set: Long macroinvertebrates (1), Long smooth macroinvertebrates (2), Long smooth dark macroinvertebrates (3), Long-round macroinvertebrates (4), Long spiky macroinvertebrates (5), Round macroinvertebrates (6), Round shell macroinvertebrates (7), Round dark macroinvertebrates (8), Round shell macroinvertebrates (9), Cladocera (10), Copepoda (11), Ostracoda (12), Aggregates (13), Fibres (14), Heads (15), Legs (16), Stains (17), Dark stains (18), light grey stains (19), Round stains (20).
Validation of automatic recognition with the best learning set
The objects in a subset of 42 fine and coarse subsamples were automatically classified by the selected learning set with the Random Forest algorithm. After manual validation, the recall for all the categories was high (on average, 0.94 for macroinvertebrates, 0.95 for other invertebrates, and 0.92 for debris), while the contamination (1-precision) was rather low, except for other invertebrates (0.25 for macroinvertebrates, 0.84 for other macroinvertebrates, and 0.01 for debris) (Figure 5). Other invertebrates (microcrustaceans) were rare in the samples (present in 17 out of 42 subsamples); thus, the comparison was not robust. Moreover, this category was highly affected by the contamination because of the similarity in shape and grey levels to other objects.
The comparison of automatic versus validated macroinvertebrate abundance showed that these were highly correlated (Pearson's r = 0.92, p-value < 0.0001, n = 24 for coarse subsamples; Pearson's r = 0.98, p-value < 0.0001, n = 18 for fine subsamples), with a slight overestimation by the automatic performance due to contamination from debris (slopes < 1) (Figure 6). Regarding the comparison of the mean ellipsoidal volume, the correlation was also high (Pearson's r = 0.96, p-value < 0.0001, n = 24 for coarse samples; Pearson's r = 0.99, p-value < 0.0001, n = 18 for fine samples), and the size spectrum slope was close to −1 (Figure 6). The difference in the slopes between the fine and coarse fractions reflect the greater effect of misclassification in the large size fractions, which is related to their low organism counts.
The probability density functions of the individual size distributions of the automatic prediction strongly concurred with the validated predictions for the fine subsamples, as well as for the coarse subsamples. However, there were some exceptions for the coarse subsamples related to the number of organisms and, thus, greater effect of misclassification in those cases, as highlighted before (Figure 7).
Effect of touching organisms on the individual size distributions, size spectra, and size metrics
A comparison of the size distributions obtained before and after the separation of the touching organisms and before the validation in a subset of five selected samples was performed to assess the effect of touching objects. To compare the size distributions, the coarse and fine subsamples were combined, according to their fractionation, to reconstruct a sample representing the macroinvertebrate community. In three samples, the abundance after validation increased (>500 individuals) (Figure 8A). Despite this increase, the mean ellipsoidal volume fit very closely to the one calculated in the validated samples (Figure 8B).
The size distributions of the corrected samples (after the separation of touching organisms) differed slightly from the validated ones. Thus, the presence of multiple objects had a small influence on the size distributions in those samples (Figure 9A–E). Accordingly, the size diversity calculated based on the corrected samples correlated strongly with the size diversity of the validated ones (Pearson's r = 0.94, p-value = 0.017, n = 5) (Figure 9F).
Theoretically, the normalized biovolume size spectrum (NBSS) of a community with several trophic levels has a size spectrum slope in the log2 scale approaching −1 in steady state conditions4. The NBSS in natural communities often has a bump rather than a linear distribution, and this is mostly attributed to the sampling bias of the smallest size classes36. In the present study, the third size class was the most common in the NBSS.
The NBSSs were quite similar between the steps of the protocol (Figure 10A–C), except for a few size classes in a couple of spectra (Figure 10D–E). Accordingly, the size spectrum slope calculated based on the corrected samples correlated strongly with the slope based on the validated ones (Pearson's r = 0.99, p-value ≤ 0.0001, n = 5) (Figure 10F).
Figure 2: Examples of scanned images with different qualities before and after being processed. (A,B) Raw image (left) and processed image (right) of a fine subsample with good scan quality; (C,D) Raw image (left) and processed image (right) of a fine subsample with bad scan quality (dark background and cut image on the left edge); (E,F) raw image (left) and processed image (right) of a fine subsample with bad scan quality (very pixelated dark background). Please click here to view a larger version of this figure.
Figure 3: Contour vignettes representing different objects present in the samples. (A–E) Debris (fiber, round stain, macroinvertebrate leg, stains, and organic debris); (F–I) macroinvertebrates (Coleoptera, Diptera, Plecoptera, and Trichoptera) and (J–L) other invertebrates (Cladocera, Copepoda, and Ostracoda). Scale bars indicate 1 mm gma = 1.1. Please click here to view a larger version of this figure.
Figure 4: Examples of vignettes containing multiple objects. (A) A macroinvertebrate (Hydracarina) attached to a fiber; (B) multiple organisms (Caenidae) aggregated by a fiber; and (C) two touching macroinvertebrates (Chironomidae and Caenidae). Scale bars indicate 1 mm gma = 1.1. Please click here to view a larger version of this figure.
Figure 5: Boxplots of recall and contamination (1-precision). The boxplots for the three categories of macroinvertebrates, other invertebrates, and debris (300 vignettes per category) of the selected learning set validated on a subset of samples (n = 42). Please click here to view a larger version of this figure.
Figure 6: Comparison between the abundance and mean ellipsoidal volume estimates in automatic versus validated classification. (A) Abundance estimates in the subsamples (n = 42) and (B) mean ellipsoidal volume estimates in the subsamples (n = 42). The dark dots correspond to the coarse subsamples (>0.5 cm mesh); the grey dots correspond to the fine subsamples (>500 µm mesh). The dashed line represents the 1:1 relationship. Please click here to view a larger version of this figure.
Figure 7: Probability density functions representing the relative contribution (y-axis) of the individual size in the log-scale (x-axis) for comparison between automatic estimates and between validated estimates. (A,B) Automatic and validated estimates for coarse subsamples (n = 18), (C,D) Automatic and validated estimates for fine subsamples (n = 24). (A,C) Comparison between automatic estimates and (B,D) comparison between validated estimates. Colors represent each subsample to help discern the spectra. Please click here to view a larger version of this figure.
Figure 8: Comparison between the abundance and mean ellipsoidal volume estimates in validated subsamples versus subsamples validated after the separation of touching objects from selected natural samples (fine and coarse subsamples together). (A) Abundance estimates by sampling frame (n = 5) and (B) mean ellipsoidal volume estimates (n = 5). The dashed line represents the 1:1 relationship. Please click here to view a larger version of this figure.
Figure 9: Probability density functions representing the relative contribution (y-axis) of the individual size in the log2-scale (x-axis) for the automatic prediction, validated prediction, and validated prediction with their respective size diversity values (Sd). (A–E) Probability density functions for selected natural samples (fine and coarse subsamples together) (n = 5); the red line corresponds to the automatic prediction, the blue line corresponds to the validated prediction, and the green line corresponds to the corrected samples (validated after the separation of touching objects). (F) Comparison of validated versus corrected size diversity estimates; the dashed line corresponds to the 1:1 relationship. Please click here to view a larger version of this figure.
Figure 10: Normalized biovolume size spectra (NBSS) and comparison of NBSS slopes (µ) in between treatments. (A–E) NBSS representing the relationship between the mid-point value of each size class in the log-scale (x-axis) versus the normalized biovolume per scanning frame (y-axis) of the five selected samples for the automatic (red crosses), validated (blue triangles), and corrected (green circles) predictions with their respective size spectrum slopes (µ) calculated in the size classes from the from the modal size class and upward (the third size class is indicated by the vertical dashed line). (F) Comparison of the slopes calculated on the validated samples versus the corrected ones (after the separation of touching objects). The dashed line corresponds to the 1:1 relationship, r2. Please click here to view a larger version of this figure.
Supplementary File 1: Matlab script to perform the calculations. Please click here to download this File.
Supplementary File 2: Cross-validation, recall, and 1-precision of the created learning sets. (A) Raw learning set with 3 categories and 50 vignettes per category; (B) raw learning set with 3 categories and 100 vignettes per category; (C) raw learning set with 3 categories and 300 vignettes per category; (D) raw learning set with 3 categories and 500 vignettes per category; (E) raw learning set with 5 categories and 50 vignettes per category; (F) raw learning set with 5 categories and 100 vignettes per category; (G) raw learning set with 5 categories and 300 vignettes per category; (H) intermediate learning set with 16 categories and 50 vignettes per category; (I) intermediate learning set with 16 categories and 100 vignettes per category; (J) intermediate learning set with 16 categories and 300 vignettes per category; (K) fine learning set with 20 categories and 50 vignettes per category; (L) fine learning set with 20 categories and 100 vignettes per category; and (M) fine learning set with 20 categories and 300 vignettes per category. Please click here to download this File.
The adaptation of the methodology described by Gorsky et al. 2010 for riverine macroinvertebrates allows for high classification accuracy in estimating the community size structure in freshwater macroinvertebrates. The results suggest that the protocol can reduce the time for estimating the individual size structure in a sample to about 1 hour. Thus, the proposed protocol is intended to promote the routine use of macroinvertebrate size spectra as a fast and integrative bioindicator to assess the impact of perturbations in freshwater ecosystems. The macroinvertebrate size spectrum has already been used as a successful index to evaluate the ecological status of coastal lagoons22. With the development of the protocol, intensive surveys on invertebrates can be carried out to enable field monitoring campaigns that cover large spatial and temporal scales.
As the aim of this protocol is to obtain the individual size distribution of the sampled community in a quick way, disregarding taxonomy, it is recommended to a create simple learning set like the one proposed here. Tests of finer learning sets, with a higher number of categories, give lower recall and precision for macroinvertebrates as a whole (Table 2), and the validation step is more time-consuming.
The automatic prediction strongly concurred with the validated prediction of 42 natural subsamples from different sampling sites, suggesting that the method in automatic mode is suitable for counting and measuring the macroinvertebrates in natural samples (Figure 6). Moreover, the similarity in the NBSSs between the automatic and validated predictions and the high fit to the linear theoretical model suggests that the automatic mode is a promising method for pursuing theoretical ecological studies (Figure 10).
During the adaptation of this protocol, several issues were encountered, and they were solved or minimized in different ways. An issue to take into consideration when scanning macroinvertebrate samples is the appearance of dark saturated areas. Thus, it is important to check the processed, scanned images as soon as possible to detect this problem and to repeat the scan if necessary. This problem has also been found when scanning plankton33, but it is increased by the use of ethanol instead of tap water. It is not recommended to use tap water, as the organisms preserved in 70% ethanol will drift on the surface. Even though the device is designed to resist diluted ethanol (5%), the invertebrate samples are preserved with 70% ethanol. Operating with lower concentrations of ethanol is not recommended either, as the organisms could be damaged through rehydration and dehydration processes37. The proposed solution, which is highly recommended, is to rinse the scan tray with fresh water several times after every scan performed with ethanol. This avoids the accumulation of precipitates that may alter the image background and protects the glass of the scan tray from corrosion.
Another detected issue is the presence of vignettes with multiple organisms, which can alter the size spectrum because of the underestimation of individuals of certain sizes. When the number of vignettes with multiple objects is low (<10%), as in this study, the presence of multiple objects has a small influence on the size distributions and NBSSs in those samples (Figure 9 and Figure 10). This indicates that, to obtain a representative size structure of the macroinvertebrate community, it is not necessary to invest time in step 1.5 of the protocol (the separation of touching organisms), for which the image reprocessing lasts about 1.5 h. Instead, it is highly recommended to take time in step 2.5 of the protocol (separating touching organisms or aggregates using a wooden needle), which is much less time-consuming (maximum 30 min) and ensures a proper estimate of the size distributions in automatic mode30. An option to reduce the number of touching organisms is to work with fewer organisms per scan, but the time commitment invested in scanning one sample in a high number of fractions and the possibility of aggregation of organisms should be taken into consideration. Another solution would be to preserve only a subsample that would allow for calculating a representative size spectrum when sorting the organisms in the laboratory instead of preserving all the sampled organisms, as done in this work. The reduction in the number of organisms per sample would reduce the probability of touching organisms. Moreover, when fewer individuals are stored, the sample contains less debris, which facilitates the separation, especially if fibers can be avoided.
The observed limitation of the automatic classification method is related to the low presence of microcrustaceans (category: other macroinvertebrates) in the used samples. The lack of representation of microcrustaceans can affect their correct classification and limit the precision of the automatic prediction for this category. Nevertheless, the other categories, debris and macroinvertebrates, which are the main objective of this work, present high recall and precision. Alternatives to using this scanner device would be to adapt a common scanner to hold water frames, promote open-source codes for sample processing and machine learning like the one provided here, and write codes for measuring organisms under the microscope with a camera or through flux with a set of cameras. This has been done on several occasions23,24,25,26,38,39,40, but the method that we propose regulates the scanning parameterization in order to obtain comparable size estimates, which is difficult to control for with the other systems. Furthermore, the proposed protocol and scanning device are ready to use, open-source, and already established in the marine mesozooplankton community. Overall, the adaptation of this protocol demonstrates a promising avenue for using this automatic imaging method to obtain the size structure of freshwater macroinvertebrates efficiently and to test the potential of size metrics for freshwater bioassessment.
The authors have nothing to disclose.
This work was supported by the Spanish Ministry of Science, Innovation and Universities (grant number RTI2018-095363-B-I00). We thank the CERM-UVic-UCC members Èlia Bretxa, Anna Costarrosa, Laia Jiménez, María Isabel González, Marta Jutglar, Francesc Llach, and Núria Sellarès for their work in macroinvertebrate field sampling and laboratory sorting and David Albesa for collaborating in the sample scanning. We finally thank Josep Maria Gili and the Institut de Ciències del Mar (ICM-CSIC) for the use of the laboratory facilities and scanner device.
Beaker | Labbox | Other containers could be used | |
Dionized water | Icopresa | 8420239600123 | To dilute the ethanol |
Funnel | Vitlab | 41094 | |
Glass vials 8 ml | Labbox | SVSN-C10-195 | 1 vial/subsample |
ImageJ Software | Free access | Version 4.41o/ Image processing software | |
Large frame | Hydroptic | Provided by ZooScan | 24.5 cm x 15.8 cm |
Monalcol 96 (Ethanol 96) | Montplet | 1050JE001 | |
Plankton Identifier Software | Free access | Version 1.2.6/ Automatic identification software | |
Sieve | Cisa | 26852.2 | Nominal aperture 500µ and nominal aperture 0,5 cm |
Tweezers | Bondline | B5SA | Stainless, anti-magnetic, anti-acid |
VueScan 9 x 64 (9.5.09) Software | Hydroptic | Version 9.0.51/ Sacn software | |
Wooden needle | Any plastic or wood needle can be used | ||
Zooprocess Software | Free access | Version 7.14/Image processing software | |
ZooScan | Hydroptic | 54 | Version III/ Scanner |