A standardized evaluation method was developed for Wearable Mobility Monitoring Systems (WMMS) that includes continuous activities in a realistic daily living environment. Testing with a series of daily living activities can decrease activity recognition sensitivity; therefore, realistic testing circuits are encouraged for valid evaluation of WMMS performance.
An evaluation method that includes continuous activities in a daily-living environment was developed for Wearable Mobility Monitoring Systems (WMMS) that attempt to recognize user activities. Participants performed a pre-determined set of daily living actions within a continuous test circuit that included mobility activities (walking, standing, sitting, lying, ascending/descending stairs), daily living tasks (combing hair, brushing teeth, preparing food, eating, washing dishes), and subtle environment changes (opening doors, using an elevator, walking on inclines, traversing staircase landings, walking outdoors).
To evaluate WMMS performance on this circuit, fifteen able-bodied participants completed the tasks while wearing a smartphone at their right front pelvis. The WMMS application used smartphone accelerometer and gyroscope signals to classify activity states. A gold standard comparison data set was created by video-recording each trial and manually logging activity onset times. Gold standard and WMMS data were analyzed offline. Three classification sets were calculated for each circuit: (i) mobility or immobility, ii) sit, stand, lie, or walking, and (iii) sit, stand, lie, walking, climbing stairs, or small standing movement. Sensitivities, specificities, and F-Scores for activity categorization and changes-of-state were calculated.
The mobile versus immobile classification set had a sensitivity of 86.30% ± 7.2% and specificity of 98.96% ± 0.6%, while the second prediction set had a sensitivity of 88.35% ± 7.80% and specificity of 98.51% ± 0.62%. For the third classification set, sensitivity was 84.92% ± 6.38% and specificity was 98.17 ± 0.62. F1 scores for the first, second and third classification sets were 86.17 ± 6.3, 80.19 ± 6.36, and 78.42 ± 5.96, respectively. This demonstrates that WMMS performance depends on the evaluation protocol in addition to the algorithms. The demonstrated protocol can be used and tailored for evaluating human activity recognition systems in rehabilitation medicine where mobility monitoring may be beneficial in clinical decision-making.
Ubiquitous sensing has become an engaging research area due to increasingly powerful, small, low cost computing and sensing equipment 1. Mobility monitoring using wearable sensors has generated a great deal of interest since consumer-level microelectronics are capable of detecting motion characteristics with high accuracy 1. Human activity recognition (HAR) using wearable sensors is a recent area of research, with preliminary studies performed in the 1980s and 1990s 2–4.
Modern smartphones contain the necessary sensors and real-time computation capability for mobility activity recognition. Real-time analysis on the device permits activity classification and data upload without user or investigator intervention. A smartphone with mobility analysis software could provide fitness tracking, health monitoring, fall detection, home or work automation, and self-managing exercise programs 5. Smartphones can be considered inertial measurement platforms for detecting mobile activities and mobile patterns in humans, using generated mathematical signal features calculated with onboard sensor outputs 6. Common feature generation methods include heuristic, time-domain, frequency-domain, and wavelet analysis-based approaches 7.
Modern smartphone HAR systems have shown high prediction accuracies when detecting specified activities 1,5,6,7. These studies vary in evaluation methodology as well as accuracy since most studies have their own training set, environmental setup, and data collection protocol. Sensitivity, specificity, accuracy, recall, precision, and F-Score are commonly used to describe prediction quality. However, little to no information is available on methods for "concurrent activity" recognition and evaluation of the ability to detect activity changes in real-time 1, for HAR systems that attempt to categorize several activities. Assessment methods for HAR system accuracy vary substantially between studies. Regardless of the classification algorithm or applied features, descriptions of gold standard evaluation methods are vague for most HAR research.
Activity recognition in a daily living environment has not been extensively researched. Most smartphone-based activity recognition systems are evaluated in a controlled manner, leading to an evaluation protocol that may be advantageous to the algorithm rather than realistic to a real-world environment. Within their evaluation scheme, participants often perform only the actions intended for prediction, rather than applying a large range of realistic activities for the participant to perform consecutively, mimicking real-life events.
Some smartphone HAR studies 8,9 group similar activities together, such as stairs and walking, but exclude other activities from the data set. Prediction accuracy is then determined by how well the algorithm identified the target activities. Dernbach et al. 9 had participants write the activity they were about to execute before moving, interrupting continuous change-of-state transitions. HAR system evaluations should assess the algorithm while the participant performs natural actions in a daily living setting. This would permit a real-life evaluation that replicates daily use of the application. A realistic circuit includes many changes-of-state as well as a mix of actions not predicable by the system. An investigator can then assess the algorithm's response to these additional movements, thus evaluating the algorithm's robustness to anomalous movements.
This paper presents a Wearable Mobility Monitoring System (WMMS) evaluation protocol that uses a controlled course that reflects real-life daily living environments. WMMS evaluation can then be made under controlled but realistic conditions. In this protocol, we use a third-generation WMMS that was developed at the University of Ottawa and Ottawa Hospital Research Institute 11-15. The WMMS was designed for smartphones with a tri-axis accelerometer and gyroscope. The mobility algorithm accounts for user variability, provides a reduction in the number of false positives for changes-of-state identification, and increases sensitivity in activity categorization. Minimizing false positives is important since the WMMS triggers short video clip recording when activity changes of state are detected, for context-sensitive activity evaluation that further improves WMMS classification. Unnecessary video recording creates inefficiencies in storage and battery use. The WMMS algorithm is structured as a low-computational learning model and evaluated using different prediction levels, where an increase in prediction level signifies an increase in the amount of recognizable actions.
This protocol was approved by the Ottawa Health Science Network Research Ethics Board.
1. Preparation
2. Activity Circuit
3. Trial Completion
4. Post-processing
The study protocol was conducted with a convenience sample of fifteen able-bodied participants whose average weight was 68.9 (± 11.1) kg, height was 173.9 (± 11.4) cm, and age was 26 (± 9) years, recruited from The Ottawa Hospital and University of Ottawa staff and students. A smartphone captured sensor data at a variable 40-50 Hz rate. Sample rate variations are typical for smartphone sensor sampling. A second smartphone was used to record digital video at 1280×720 (720p) resolution.
The holster was fastened to the participant's right-front belt or pant without further standardization of the location. This demonstrated a natural method for placing the device in the hostler on the hip. With the device placed in the holster and the data logger application running, each person traversed the circuit once, at a self-selected pace. The circuit was not described in advance to the participant and proceeding activities were spoken by the investigator sequentially during the trial.
The WMMS consisted of a decision-tree with upper and lower boundary conditions, similar to work by Wu, et al. 13. The revised classifier used a 1 sec window size and features from the linear acceleration signal (sum of range, simple moving average, sum of standard deviation) and gravity signal (difference to Y, variance sum average difference) 15. Three classification sets were calculated for evaluation: (i) mobility or immobility, (ii) sit, stand, lie, or walking, and (iii) sit, stand, lie, walking, climbing stairs, or small standing movement. Activities of daily living were labeled as small movements. Representative results are shown in Table 1.
Classification | TP | FN | TN | FP | Sensitivity (%) | Specificity (%) | F1-Score (%) | ||||||||||||||
Classification Set 1 | 350 | 55 | 8701 | 91 | 86.30 ± 7.2 | 98.96 ± 0.6 | 86.17 ± 6.3 | ||||||||||||||
Classification Set 2 | 359 | 47 | 8660 | 131 | 88.35 ± 7.80 | 98.51 ± 0.62 | 80.19 ± 6.36 | ||||||||||||||
Classification Set 3 | 423 | 75 | 8540 | 159 | 84.92 ± 6.38 | 98.17 ± 0.62 | 78.42 ± 5.96 | ||||||||||||||
Classification Set 1 | |||||||||||||||||||||
Immobile to Mobile | 177 | 19 | |||||||||||||||||||
Mobile to Immobile | 171 | 36 | |||||||||||||||||||
During Mobile | 3990 | 73 | |||||||||||||||||||
During Immobile | 4711 | 18 | |||||||||||||||||||
Classification Set 2 | |||||||||||||||||||||
Stand to Walk | 134 | 17 | |||||||||||||||||||
Walk to Stand | 137 | 26 | |||||||||||||||||||
Walk to Sit | 29 | 0 | |||||||||||||||||||
Sit to Walk | 30 | 0 | |||||||||||||||||||
Walk to Lie | 11 | 4 | |||||||||||||||||||
Lie to Walk | 15 | 0 | |||||||||||||||||||
During Stand | 2872 | 73 | |||||||||||||||||||
During Sit | 644 | 9 | |||||||||||||||||||
During Lie | 447 | 9 | |||||||||||||||||||
During Walk | 4697 | 40 | |||||||||||||||||||
Classification Set 3 | |||||||||||||||||||||
Stand to Walk | 70 | 7 | |||||||||||||||||||
Walk to Stand | 74 | 14 | |||||||||||||||||||
Walk to Sit | 29 | 0 | |||||||||||||||||||
Sit to Walk | 30 | 0 | |||||||||||||||||||
Walk to Lie | 15 | 0 | |||||||||||||||||||
Lie to Walk | 15 | 0 | |||||||||||||||||||
Walk to Small Move | 68 | 7 | |||||||||||||||||||
Small Move to Walk | 61 | 13 | |||||||||||||||||||
Walk to Stairs | 13 | 2 | |||||||||||||||||||
Stairs to Walk | 13 | 2 | |||||||||||||||||||
Small Move to Small Move | 35 | 30 | |||||||||||||||||||
During Stand | 1584 | 25 | |||||||||||||||||||
During Sit | 643 | 10 | |||||||||||||||||||
During Lie | 447 | 15 | |||||||||||||||||||
During Walk | 4398 | 56 | |||||||||||||||||||
During Stairs | 246 | 0 | |||||||||||||||||||
During Brush Teeth | 190 | 12 | |||||||||||||||||||
During Comb Hair | 158 | 2 | |||||||||||||||||||
During Wash Hands | 152 | 6 | |||||||||||||||||||
During Dry Hands | 119 | 4 | |||||||||||||||||||
During Move Dishes | 93 | 5 | |||||||||||||||||||
During Fill Kettle | 190 | 5 | |||||||||||||||||||
During Toast Bread | 70 | 1 | |||||||||||||||||||
During Wash Dishes | 250 | 18 |
Table 1. Results for change-of-state determination; including, true positives (TP), false negatives (FN), true negatives (TN) false positives (FP), total changes-of-state, sensitivity, specificity, and F1-Score. During refers to TN and FP for changes-of-state during the specified action.
From Table 1, the mobile versus immobile classification set had a sensitivity of 86.30% ± 7.2% and specificity of 98.96% ± 0.6%, while the second prediction set had a sensitivity of 88.35% ± 7.80% and specificity of 98.51% ± 0.62%. For the third classification set, sensitivity was 84.92% ± 6.38% and specificity was 98.17 ± 0.62. F1 scores for the first, second and third classification sets were 86.17 ± 6.3, 80.19 ± 6.36, and 78.42 ± 5.96, respectively.
Figure 1. Changes-of-state sensitivity, specificity, and F1-Score for three classification sets.
Human activity recognition with a wearable mobility monitoring system has received more attention in recent years due to the technical advances in wearable computing and smartphones and systematic needs for quantitative outcome measures that help with clinical decision-making and health intervention evaluation. The methodology described in this paper was effective for evaluating WMMS development since activity classification errors were found that would not have been present if a broad range of activities of daily living and walking scenarios had not been included in the evaluation.
The WMMS evaluation protocol consists of two main parts: data acquisition under realistic but controlled conditions with an accompanying gold standard data set and data post-processing. Digital video was a viable solution for providing gold standard data when testing WMMS algorithm predictions across the protocol activities. Critical steps in the protocol are (i) to ensure that the gold standard video captures the smartphone shake since this permits synchronization of the gold standard video with data acquired from the participant-worn phone and (ii) to ensure that the gold-standard video records all the transitions performed by the trial participant (i.e., the person recording the gold-standard video must be in the correct position when following the trial participant).
The evaluation protocol incorporates walking activities, a daily living environment, and various terrains and transitions. All actions are done consecutively while a participant-worn smartphone continuously records data from accelerometer, gyroscope, magnetometer, and GPS sensors, and a second smartphone is used to video all activities performed by the trial participant. The protocol may be modified by adapting the order of activities based on the test location, as long as a range of continuous-performed activities of daily living are incorporated. Ten to fifteen minutes was required to complete the circuit, depending on the participant. During pilot tests, some participants with disabilities could only complete one cycle, therefore single trial testing should be considered with some populations to ensure a complete data set.
Limitations of the proposed WMMS evaluation method are that timing resolution is limited to the video frame rate of the camera used to record the gold-standard comparator video and difficulty identifying distinct change-of-state timing from video for activities of daily living. Variation by several frames when identifying a change-of-state leads to differences between the gold-standard and WMMS results that could be due to interpretation of activity start rather than WMMS error. A tolerance at each change-of-state, where no comparisons are made, can be implemented to help account for these discrepancies.
Generally, increasing the number of activities being classified and the categorization difficulty (i.e., stairs, small movements) reduced the average sensitivity, specificity, and F1 score. This may be anticipated since increasing the number of activities increases the chance for false positives and false negatives. Evaluation protocols that only use activities that are advantageous to the algorithm will produce results that are misleading and are unlikely to produce similar results when evaluated under real-world conditions. Hence, the significance with respect to existing methods is that the protocol will result in more conservative results for WMMS systems than previous reports in the literature. However, the results will better reflect outcomes in practice. The proposed method of WMMS evaluations can be used to assess a range of wearable technologies that measure or assist human movement.
The authors have nothing to disclose.
The authors acknowledge Evan Beisheim, Nicole Capela, Andrew Herbert-Copley for technical and data collection assistance. Project funding was received from the Natural Sciences and Engineering Research Council of Canada (NSERC) and BlackBerry Ltd., including smartphones used in the study.
Smartphone or wearable measurement device | Blackberry | Z10 | |
Smartphone for video recording | Blackberry | Z10 or 9800 | |
Phone holster | Any | ||
Data logger application for the smartphone | BlackBerry World – TOHRC Data Logger for BlackBerry 10 | http://appworld.blackberry.com/webstore/content/32013891/?countrycode=CA | |
Wearable mobility measurement | Custom Blackberry 10 and Matlab software for mobility monitoring | http://www.irrd.ca/cag/smartphone/ | |
Video editing or analysis software | Motion Analysis Tools | http://www.irrd.ca/cag/mat/ |