Quantitative wear measurement is a method of increasing importance in measuring tooth wear progression. We here describe a protocol, its precision, and its intra/inter-rater precision for the acquisition and superimposition of repeated in vivo scanned dentitions in patients with moderate to severe wear, reporting on both height and volume measurements.
Quantitative wear measurement is of increasing interest for measuring tooth wear progression. However, most research on quantitative wear measurement has focused on simulated wear or scanned gypsum casts. A 3D Wear Analysis (3DWA) protocol has been developed that analyzes tooth wear in vivo through intra-oral scanners available to dental clinicians. This study investigated the precision of the 3DWA protocol for measuring wear through maximum height loss (mm) and volume change (mm3). Observational prospective wear data from 55 patients were analyzed after 0-1-, 0-3-, and 0-5-year intervals to determine rates of wear, and convenience samples were chosen to test the protocol's precision on dentitions scanned twice in one sitting and its intra- and inter-rater precision on scans with 0-3- and 0-5-year intervals. Scans were made using intra-oral scanners (IOS) and superimposed using 3D measurement software. T-tests were performed to determine the structural and random error, and trimmed ranges were calculated to interpret the error. For protocol precision, the mean difference was 0.015 mm (-0.002; 0.032, p = 0.076) for height and -0.111 mm3 (-0.250; 0.023, p = 0.101) for volume. The duplicate measurement error was 0.062 mm for height and 0.268 mm3 for volume. The height measurements were precise enough to measure wear after intervals of 0-3 or 0-5 years; however, volume measurements were susceptible to procedural error and operator sensitivity. The 3DWA protocol is precise enough to adequately measure tooth height loss after intervals of a minimum of 3 years or in patients with severe wear progression, but it is not suited to measuring volumetric changes.
Tooth wear, though not life-threatening, can negatively impact patients' quality of life, both physiologically and psychologically1. It can affect masticatory and esthetic function, as well as the quality of life. The severity of the impact depends on the etiology, progression, and presentation of the wear and can differ greatly between patients2. The impact of tooth wear is expected to increase in the future due to increased human life expectancy, lifestyle changes, and people retaining their natural teeth for longer3. Therefore, diagnosing tooth wear and quantifying the progression of tooth wear is of increasing importance in providing patient care.
Despite the importance of measuring tooth wear, in vivo quantitative data on the absolute amount of tooth wear is scarce. Findings on tooth wear progression are often contradictory due to great variation in the methodology used. Several studies have shown relatively low progression rates in patients with physiological wear, with reported height loss between 11 and 29 µm per year and volume loss around 0.04 µm3 per year4,5,6. In cases of advanced tooth wear or existing parafunctional habits, much higher progression rates were found, between 68 and 140 µm per year7,8,9. These measurements were based on gypsum casts and gypsum cast dies and performed with either varying scanning and 3D subtraction software or microscopes. Since these methods are not available or practical in dental practice, they are not yet suitable for use in clinical care. However, intra-oral 3D scanning is rapidly becoming available in general dental practice, with advantages for both patient and operator with respect to speed and comfort of use, coupled with easy storage and data sharing10. 3D data can also be used for quantitative wear measurement, in which scans of teeth or jaws are superimposed and the difference between the scans is measured. This provides a quantitative option for measuring the progression of loss of tooth material in height or volume11,12.
Findings on the precision (closeness of agreement between replicated measurements) and accuracy (the difference between a measured quantity and its true value) have been variable when using scanners to detect and measure wear. Quantitative wear measurement has been reported as being a time-consuming method with often unknown or inadequate precision and accuracy, especially when dealing with minimal wear13,14. Others have reported intra-oral scanners to be precise enough to detect and monitor tooth wear, with superimposition reference areas (best fit) and software settings significantly affecting the outcome15,16.
Various methods have been used to find the best fit: 1) landmark alignment based on landmarks such as soft tissue, adjacent intact teeth, and alveolar processes, 2) standard best-fit alignment with the software minimizing the mesh distance error between data clouds, or 3) reference best-fit alignment with the best fit performed on a selection of areas chosen by the operator. It has been found that the reference best-fit alignment has the highest precision and accuracy15,17. Research shows that the precision and accuracy of a quantitative wear measurement increases when smaller structures, such as single teeth, are compared, instead of a full arch18,19. Two automated systems using 3D scans and quantitative wear measurement to monitor wear have been introduced; one has been tested in an in vitro setting on shortened arches or single teeth, while the other has indicated some promise for in vivo use for volumetric measurements compared to lab-scanned casts20,21,22. Most of these studies on accuracy and precision are based on scanned casts or in vitro simulated wear and are, therefore, not easily translated to clinical outcomes. Finding a clinically feasible protocol to perform quantitative wear measurements after intra-oral scanning in vivo would, therefore, be a vital next step in monitoring tooth wear15.
At the Radboud University Medical Center in Nijmegen, the Netherlands, a 3D Wear Analysis (3DWA) protocol using 3D measurement software has been developed to measure tooth wear in vivo using an intra-oral scanner on patients with moderate to severe tooth wear. Since it is nearly impossible to measure the accuracy in vivo, this article focuses on determining the precision of the 3DWA protocol. Particularly, this study aims to 1) describe the precision of the scanner and scanning process (acquisition) and subsequent superimposition by superimposing two scans of the same dentition acquired in the same session (protocol precision). Additionally, the 3DWA protocol was tested for 2) intra- and 3) inter-rater precision when measuring wear progression in both height (mm) and volume (mm3), on scans made at 0-3- or 0-5-year intervals. The scans were made intra-orally in patients with moderate to severe wear and the quantitative wear measurement was performed using the 3DWA protocol.
To test the agreement between raters with differing types of training, three raters were selected and trained. Rater 1 was a PhD student, who received extensive training on the execution of the 3DWA protocol and had 1 year of experience working on analyzing scans before independently executing the selected duplicate measurements. Rater 2 was a final year dental MSc student, who was given the protocol and an explanation of the software program and, thereafter, executed the protocol independently. Rater 3 was a dental MSc student, who received the protocol, an explanation of the software program, and two 3-h training sessions, after which she independently executed the protocol for the duplicate measurements. The raters did not have clinical information about the subjects other than the scans prior to the analysis. The scans were anonymized and coded before analysis by researchers other than the raters. When analyzing and measuring tooth wear, the old annotations from previous raters were hidden prior to analysis in the software. The measurements from the different raters were initially saved in different files.
A group of 55 patients was included from a larger prospective observational study on the progression of tooth wear of the Radboud Tooth Wear Project at the Department of Dentistry, Radboud University Medical Center, in Nijmegen (The Netherlands). These patients were scanned at intake, 1-year recall, 3-year recall, and 5-year recall. Descriptive statistics of the available scans from the group of 55 patients were calculated regarding tooth wear after 0-1-, 0-3-, and 0-5-year intervals for height (mm) and volume (mm3) to compare and interpret the results of the analysis of the precision of tooth wear in terms of clinical relevance.
To calculate the protocol precision, two patients were randomly chosen from the above-mentioned sample of 55 and asked for permission to have their dentition scanned twice with a 15 min break instead of once at a recall appointment. The 3DWA protocol was then executed by rater 1. Due to the high number of measurements of height and volume on two dentitions (respectively 65 for height and 16 for volume per dentition), this was deemed satisfactory to reliably estimate the precision. To calculate the precision within one rater (intra-rater: rater 1), one patient was selected with moderate wear and repeated 1 month later. To calculate the precision between raters (inter-raters: raters 1, 2 and 3), a convenience sample of four patients was selected, with two patients having moderate and two patients having severe wear progression. Intervals between the selected scans were either 3 or 5 years. The results between raters were calculated, comparing rater 1 with rater 2 and rater 1 with rater 3.
Institutional ethical approval for the protocol was obtained (ABR code: NL31401.091.10).
NOTE: The following steps describe the 3DWA protocol.
Figure 1: Visual representation of the steps for superimposition and quantitative wear measurement. This figure has been modified from K. Ning et al.23. Please click here to view a larger version of this figure.
1. Acquisition
NOTE: The following procedure was used to scan dentitions.
2. Superimposition
NOTE: The following procedure was used for superimposition and quantitative wear measurement.
3. Quantitative wear measurement: Height
4. Quantitative wear measurement: Volume
5. Statistical analysis
During data analysis, the maximum height difference between occlusal surfaces was measured. For molars, three or four cusps were measured, and, for premolars, two cusps were measured. For maxillary anterior teeth, the incisal edge and palatal surface were measured, and, for mandibular anterior teeth, the incisal edge was measured. This resulted in a maximum of 65 measured locations per dentition. The difference in the volume of the occlusal surface was measured on posterior teeth only, resulting in a maximum of 16 observations per dentition.
Teeth with restorations on more than 75% of the measured surface were excluded, as well as third molars. On surfaces with partial restorations, height was measured on the tooth material. Height differences clearly caused by artifacts such as pooling of saliva were either excluded as a surface, or the measurement was done elsewhere on the surface. Other reasons for exclusion of surfaces or teeth were teeth being absent, the best fit being insufficient, or the data being incomplete (large gaps in the scan). Negative outcomes (inverse wear or "growth", which is clinically impossible) on included teeth and surfaces were not used for further statistical analysis except when calculating the protocol precision, for which the differences, both negative and positive, were noted.
Table 1: Results of the analysis of the precision of tooth wear measurements for height and volume. Please click here to download this Table.
Precision: structural differences
The data for protocol precision was visualized in violin plots (Figure 2 and Table 1). The data for intra- and inter-rater precision was visualized in Bland Altman plots (Figure 3 and Table 1). For height, a statistically significant difference was found between R1 and R3, which is clinically not significant, as can be seen from the entire confidence interval (ci) being close to 0. For volume, it is important to note that, for intra-rater precision, 50% of the teeth measured had to be excluded from analysis due to negative measurements (e.g., "growth") indicating inoperability.
Figure 2: Violin plots for (A) height (mm) and (B) volume (mm3) for protocol precision. Please click here to view a larger version of this figure.
Figure 3: Bland Alman plots for (A,D) intra-rater and (B,C,E,F) inter-rater precision for height (B–C) and volume (E–F). The continued line indicates mean difference, and the dotted lines indicate limits of agreement. Please click here to view a larger version of this figure.
Precision: random error
Regarding the DME for height, there were similar DMEs for protocol precision and inter-rater precision and a much lower DME for intra-rater precision. The correlation was high and similar for inter-rater precision, very high for intra-rater precision, and could not be calculated for protocol precision. Training seemed to have little effect when looking at DME and correlation for height. Regarding volume, there were large differences between protocol precision, inter-rater precision, and intra-rater precision results.
In order to interpret the structural and random differences described in Table 1, it is important to know the range of height and volume measurements to be expected after multiple years in patients with moderate to severe wear, which are described in Table 2.
Table 2: Trimmed ranges derived from the larger group of wear patients at 0-1-, 0-3- and 0-5-year intervals and the mean difference and DME expressed in percentages of the trimmed range. Please click here to download this Table.
Interpretation of results:
Comparing the results for height to the trimmed range of wear seen in a group of 55 patients with moderate to severe tooth wear gave small structural differences (mean difference) for all intervals and all tests. For the DME, there were large differences between 0-1- and 0-3- or 0-5-intervals for all tests, indicating that, for short intervals (limited wear progression), the protocol is not precise enough, but, for longer intervals (or higher wear progression rates), the precision is adequate.
For volume, the structural differences were small on all intervals, except for the results comparing rater 1 and rater 3. For the DME, there were large differences between 0-1- and 0-3- or 0-5-intervals for all tests. Despite good results for protocol precision, there were large differences between operators, a high number of outliers, and many teeth excluded due to measured "growth", indicating poor performance of the protocol regarding volume, even for longer intervals.
The difference between protocol precision and intra-precision is due to differences in method; to calculate protocol precision, the teeth were scanned in the same session. No wear took place between the scans, resulting in an excellent best fit. Therefore, the precision of height was determined mainly by droplets of saliva and scanning powder creating tiny spikes, causing a large height difference when measuring the highest point on the surface (Figure 4). To calculate intra-rater agreement, scans were used with a 5-year interval between them, resulting in the presence of wear that increases the difficulty of performing the best fit. However, only wear was measured, and suspected saliva/powder residuals or areas with possible restorations or flaring (distortion at scanned edges of the tooth; Figure 5) were avoided, thereby increasing precision.
Since volume is calculated for the whole occlusal area and not by localized measurements, it is much less affected by occasional droplets of saliva than height when measuring protocol precision. Intra-precision would be expected to be lower than protocol precision for volume, since it is affected by the best fit procedure, which, in turn, is made more difficult by wear taking place between scans. This affects the whole occlusal area of a tooth, and, additionally, areas with saliva, powder, restorations, and flaring cannot be deselected or ignored in contrast to when height is measured. However, the results for intra-rater precision and protocol precision for volume were similar due to a single outlier decreasing protocol precision.
When analyzing the height data on wear progression comparing rater 1 to rater 2, it became clear that, for height, a group of outliers could be attributed to two factors: 1) measurements on teeth with severe wear were made with the 2D Compare method (Step 3.2), instead of 3D compare (Step 3.1), and 2) a set of measurements was wrongly made on pooled saliva, which was mistaken for wear by rater 2 (Figure 6). The data was, therefore, split into 3 groups and analyzed separately: "saliva", "normal", and "2D Compare" (Figure 6A). Rater 3 (trained) made no measurements on pooled saliva, proving that training was successful in that regard (Figure 6B).
When comparing the heights from annotations ("normal") and manual 2D measurements (2D Compare) for rater 1, the "normal" measurements had a mean height difference of 0.132 mm, with N = 223, a standard deviation of 0.112, and range: -0.001; 0.847, and the 2D Compare measurements had a mean height difference of 0.557 mm, with N = 5, a standard deviation of 0.160, and range: 0.351; 0.743, indicating that the 2D Compare measurements were in a higher range with a higher standard deviation than normal measurements
Figure 4: Example of saliva spikes on teeth without wear (incisal yellow areas) and wear caused by artifacts (lingual blue area indicating either flaring or removed calculus). Please click here to view a larger version of this figure.
Figure 5: Example of pooled saliva in fissures (blue) and saliva spike (red-orange) on mesio-palatal and buccal cusp. Please click here to view a larger version of this figure.
Figure 6: Scatter plots for measurement of changes in height with colored dots indicating groups of measurements ("saliva", "normal", and "2D Compare"). Please click here to view a larger version of this figure.
Critical steps protocol:
The 3DWA protocol has been shown to provide precise height measurements with excellent inter- and intra-agreement. For volume measurements, however, the protocol is not suitable. The major factors determining the precision of both acquisition and superimposition were the isolation during scanning and finding the best fit while superimposing. Superimposition is straightforward if the teeth have not changed but becomes increasingly difficult when wear progresses, especially if the wear is not easily located but involves large sections of the surface.
In a clinical situation, negative wear (growth) may simply be ignored, as was done in this study, as it is an impossible outcome. Scanning errors, such as saliva droplets, the thickness of powder coating, or flaring are problematic even in unchanged teeth and may not always be readily detectable, contributing to measurement error.
Modifications and troubleshooting of the method
Performing the best fit procedure
When performing a best fit procedure on teeth with wear, the algorithm behind the root mean square (RMS)-value will always make the average distance between the points in the mesh as close to zero as possible. In teeth with wear progression, this may result in a decrease in the distance in the areas with wear and an increase in the areas without or with less wear. This will result in an underestimation of the wear in surfaces with wear. Since this is a population with moderate to severe wear, performing a standard best fit alignment followed by deselecting occlusal areas with clear facets of wear and repeating the best fit alignment almost always resulted in a better fit compared to standard best fit alignment, which is also supported by previous literature15,17. It is important that only occlusal surfaces with wear are deselected so that as many coronal surfaces are available for the best fit procedure, hereafter called the "modified reference based fit technique"22. The difficulties with obtaining the best fit in this population explain the difference in precision between height and volume measurements. If the best fit procedure results in an imperfect alignment, this will affect the volume difference of the tooth relatively more than height measurements. Additionally, locations with artifacts such as saliva can be avoided in height measurements but not in volume measurements.
Selecting the point of the highest wear
Some outliers remained despite training, caused by a variety of factors such as disagreements due to unclear anatomy, wear, or restorations, and were not preventable by adjusting the protocol. A point of improvement was achieved by editing the color spectrum depicting wear, which is shown as a blue area. By changing the spectrum, dark blue areas of wear could be decreased to a darkest blue point, pinpointing the location with the highest amount of wear, which decreased operator sensitivity in choosing the location of highest wear.
Measuring volume versus measuring height
The precision of volume measurements was insufficient for clinical tooth wear measurement. This is due, firstly, to the aforementioned issue regarding the best fit. A slight deviation in the fit can result in a large difference between the superimposed teeth. Secondly, saliva, restorations, powder, and other possible artifacts are measured by the software as changes in volume, although they are not actual wear. Thirdly, the selection of the surface for volumetric changes might be influenced by tooth size, shape, and scanned surfaces. Fourthly, the software algorithm might be too imprecise when filling holes or calculating volume to precisely detect volume changes. Since calculating the volume change is done automatically after performing the best fit, the imprecision of volume measurements did not lead to modifications of the protocol other than improving the best fit. Theoretically, volumetric change measurement would be preferable, since volumetric changes are not affected by outliers in single data points or large sections of the area being unchanged by wear, like height measurements12,17. However, volumetric changes are dependent on tooth size, which should be considered when reporting volumetric change15. Additionally, height measurement may be useful for obtaining a good impression of the wear processes on the surface. It is vital for future research to focus on methods to accurately measure both height and volumetric changes to determine the progression of tooth wear.
Strengths and limitations of the protocol
This protocol is based on a reproducible chairside method; therefore, the findings translate to what clinicians could expect when looking for a method to monitor wear using intra-oral scanners. The 3DWA protocol has been proven to be precise, and, additionally, the levels of wear found for patients with higher and lower progression of wear (Table 2) were similar to the ones found in literature, suggesting high accuracy as well4,5,7,8,9.
The limitations are also the limitations a clinician would face: patient-related factors such as limited mouth opening, the presence of saliva or scanning powder (depending on the type of scanner), and possible scanning artifacts or software errors resulting in random error (62 µm on the surface level), which is quite substantial when compared to the amount of wear one might expect after a year in patients with severe tooth wear (between 68-140 µm per year) or in patients with physiological wear of about 30 µm per year4,5,6,7,8,9. However, the duplicate measurement error becomes far less significant when the range of wear increases, either due to a longer interval or more severe and quickly progressing tooth wear. Secondly, for research purposes, measurements can be repeated to reduce the DME. Thirdly, scanners and scanning systems are constantly revised and updated, and precision is only expected to increase in the future, which creates more possibilities for precise height and volume measurements.
Although the 3DWA protocol provides useful and reliable information on the progression of tooth wear in research, it is probably still too time-consuming and costly for application in standard clinical care. The software necessary for quantitative wear measurement is not readily available for researchers, let alone dental clinicians10. Comparing complete dentitions may take between 3 and 6 h, depending on the experience of the rater and the severity of the wear. Therefore, the authors feel that a vital next step in improving patient care is the automation of this validated 3DWA protocol, which would make it more time- and cost-efficient. Different approaches, such as the use of index teeth instead of measuring all teeth and cusps to determine the progression of wear, can also be used13.
The significance of the method with respect to existing/alternative methods
This protocol provides more quantifiable, objective, and precise data on the progression of tooth wear compared to more commonly used quantitative methods such as the Tooth Wear Index (TWI), Tooth Wear Evaluation System (TWES), or Basic Erosive Wear Examination (BEWE)24,25,26. This is the first study done on all direct in vivo scans without using lab scanners or scanned impressions to assess the wear. In this study, adequate protocol precision and excellent intra-rater precision were found when measuring height. There was only a slight difference between raters, which would not result in patients being incorrectly diagnosed as having stable or progressive moderate to severe wear. This method was not able to provide precise volumetric change measurements, which previous research has argued to be more reliable, and, in that area, more research needs to be done15.
The identified protocol precision of 0.062 mm (random error, or DME) for height is not the only factor to consider when trying to determine the precision of the protocol for a given measurement. The systematic errors are minimal enough to dismiss; however, the random error of 0.062 is random and, therefore, not the same for every measurement. This excludes the existence of a simple threshold for minimum precision. In a research setting with many repeated measurements, the effect of the random error is minimal. In an individual patient, however, the random error comes into effect. The importance of a random error of 0.062 mm is dependent on which true value of height loss signifies pathological tooth wear. The chosen threshold combined with the DME determines the chance of a measurement measuring pathological tooth wear where there is none and vice versa. For example, for an individual patient, if a threshold of 0.070 mm of tooth wear per year is determined as being pathological, and 0.030 mm of tooth wear per year is considered physiological, a DME of 0.062 mm gives a 26% probability that the identified value is higher than 0.070 mm when the true value is 0.030 mm, thereby falsely classifying a patient as having pathological wear. However, after 3 years, the threshold for pathological wear would be 0.210 mm. Then, with a true value of 0.090 mm (per 3 years), there is only a 2.6% chance that the value found is higher than the threshold value. Therefore, the recommendation is to measure tooth wear after multiple years in patients with moderate wear, or at a shorter interval with higher suspected progression, in order to precisely determine individual wear.
Additionally, it is very difficult to compare the precision found with previously reported values. Although many studies have been performed on the precision and accuracy of scanners, the specific technique used in this study, scanning a full arch (which lowers precision) but comparing single teeth (which heightens the precision), makes it impossible to compare given values on full arches and single teeth18. In research conducted on the progression of tooth wear, the precision reported was based on in vitro findings on simulated wear or done with lasers or lab scanners, and, as such, is difficult to compare with findings of this study and less relevant in a clinical setting6,14,17,20,21.
Importance of application
Overall, these findings indicate that quantitative wear measurement of intra-oral scans is an attainable and precise method to quantify the progression of wear in height. The result seems to be independent of the experience of the operator and limited training in the protocol. This has great advantages in research, such as being able to quantify and monitor wear and store information digitally in subject records. This protocol would be useful in clinical practice in the management of tooth wear to determine treatment options, create awareness, and improve patient-centered care. Although currently too time-consuming to perform, modified versions of the protocol for clinical practice, such as measuring index teeth instead of full dentitions, can alleviate this problem, as well as automation of the protocol. It will be an important step towards a future where patients will be scanned regularly as part of standard care, with software diagnosing areas with wear progression.
The authors have nothing to disclose.
This study has been partly funded by the Dutch Journal of Dentistry (Stichting Bevordering Tandheelkundige Kennis, NTVT BV).
Dry Tips | Microbrush International | 273-DTL | Dry pad to cover buccal mucosa |
GeoMagic Qualify | 3D Systems | Measurement, comparison and reporting software tool for first-article and automated inspection processes | |
High Resolution Scanning Spray Powder | 3M ESPE | 42295100 | Powder to cover to-be scanned surfaces |
High Resolution Sprayer | 3M | 42295100 | Sprayer for scanning powder |
Lava Chairside Oral Scanner | 3M ESPE | 68901 | Intra Oral Scanner |
Mobile True Definition Scanner | 3M | M06-6060 | Mobile Intra Oral Scanner |
OptraGate | Ivoclar Vivadent | 49294 | Flexible lip and cheek retractor |
Saliva Ejector HYGOFORMIC | Pulpdent | SV-6075 | Intra Oral Scanning Aids in tongue retraction and suction for mandibular scanning |
True Definition Scanner | 3M | M06-6000 | Intra Oral Scanner |