A protocol for capturing and statistically analyzing emotional response of a population to beverages and liquefied foods in a sensory evaluation laboratory using automated facial expression analysis software is described.
We demonstrate a method for capturing emotional response to beverages and liquefied foods in a sensory evaluation laboratory using automated facial expression analysis (AFEA) software. Additionally, we demonstrate a method for extracting relevant emotional data output and plotting the emotional response of a population over a specified time frame. By time pairing each participant’s treatment response to a control stimulus (baseline), the overall emotional response over time and across multiple participants can be quantified. AFEA is a prospective analytical tool for assessing unbiased response to food and beverages. At present, most research has mainly focused on beverages. Methodologies and analyses have not yet been standardized for the application of AFEA to beverages and foods; however, a consistent standard methodology is needed. Optimizing video capture procedures and resulting video quality aids in a successful collection of emotional response to foods. Furthermore, the methodology of data analysis is novel for extracting the pertinent data relevant to the emotional response. The combinations of video capture optimization and data analysis will aid in standardizing the protocol for automated facial expression analysis and interpretation of emotional response data.
Automated facial expression analysis (AFEA) is a prospective analytical tool for characterizing emotional responses to beverages and foods. Emotional analysis can add an extra dimension to existing sensory science methodologies, food evaluation practices, and hedonic scale ratings typically used both in research and industry settings. Emotional analysis could provide an additional metric that reveals a more accurate response to foods and beverages. Hedonic scoring may include participant bias due to failure to record reactions1.
AFEA research has been used in many research applications including computer gaming, user behavior, education/pedagogy, and psychology studies on empathy and deceit. Most food-associated research has focused on characterizing emotional response to food quality and human behavior with food. With the recent trend in gaining insights into food behaviors, a growing body of literature reports use of AFEA for characterizing the human emotional response associated with foods, beverages, and odorants1-12.
AFEA is derived from the Facial Action Coding System (FACS). The facial action coding system (FACS) discriminates facial movements characterized by action units (AUs) on a 5-point intensity scale13. The FACS approach requires trained review experts, manual coding, significant evaluation time, and provides limited data analysis options. AFEA was developed as a rapid evaluation method to determine emotions. AFEA software relies on facial muscular movement, facial databases, and algorithms to characterize the emotional response14-18. The AFEA software used in this study reached a "FACS index of agreement of 0.67 on average on both the Warsaw Set of Emotional Facial Expression Pictures (WSEFEP) and Amsterdam Dynamic Facial Expression Set (ADFES), which is close to a standard agreement of 0.70 for manual coding"19. Universal emotions included in the analysis are happy (positive), sad (negative), disgusted (negative), surprised (positive or negative), angry (negative), scared (negative) and neutral each on a separate scale of 0 to 1 (0=not expressed; 1=fully expressed)20. In addition, psychology literature includes happy, surprised, and angry as "approach" emotions (toward stimuli) and sad, scared, and disgusted as "withdrawal" emotions (away from aversive stimuli)21.
One limitation of the current AFEA software for characterizing emotions associated with foods is interference from facial movements associated with chewing and swallowing as well as other gross motor motions, such as extreme head movements. The software targets smaller facial muscular motions, relating position and degree of movement, based on over 500 muscle points on the face16,17. Chewing motions interfere with classification of expressions. This limitation may be addressed using liquefied foods. However, other methodology challenges can also decrease video sensitivity and AFEA analysis including data collection environment, technology, researcher instructions, participant behavior, and participant attributes.
A standard methodology has not been developed and verified for optimal video capture and data analysis using AFEA for emotional response to foods and beverages in a sensory evaluation laboratory setting. Many aspects can affect the video capture environment including lighting, shadowing due to lighting, participant directions, participant behavior, participant height, as well as, camera height, camera angling, and equipment settings. Moreover, data analysis methodologies are inconsistent and lack a standard methodology for assessing emotional response. Here, we will demonstrate our standard operating procedure for capturing emotional data and processing data into meaningful results using beverages (flavored milk, unflavored milk and unflavored water) for evaluation. To our knowledge only one peer reviewed publication, from our lab group, has utilized time series for data interpretation for emotions analysis8; however, the method has been updated for our presented method. Our aim is to develop an improved and consistent methodology to help with reproducibility in a sensory evaluation laboratory setting. For demonstration, the objective of the study model is to evaluate if AFEA could supplement traditional hedonic acceptability assessment of flavored milk, unflavored milk and unflavored water. The intention of this video protocol is to help establish AFEA methodology, standardize video capture criteria in a sensory evaluation laboratory (sensory booth setting), and illustrate a method for temporal emotional data analysis of a population.
Ethics Statement: This study was pre-approved by Virginia Tech Institutional Review Board (IRB) (IRB 14-229) prior to starting the project.
Caution: Human subject research requires informed consent prior to participation. In addition to IRB approval, consent for use of still or video images is also required prior to releasing any images for print, video, or graphic imaging. Additionally, food allergens are disclosed prior to testing. Participants are asked prior to panel start if they have any intolerance, allergies or other concerns.
Note: Exclusion Criteria: Automated facial expression analysis is sensitive to thick framed glasses, heavily bearded faces and skin tone. Participants who have these criteria are incompatible with software analysis due to an increased risk of failed videos. This is attributed to the software's inability to find the face.
1. Sample Preparation and Participant Recruitment
2. Preparation of Panel Room for Video Capture
Note: This protocol is for data capture in a sensory evaluation laboratory. This protocol is to make AFEA data capture useful for a sensory booth setting.
3. Participant Adjustment and Verbal Directions
4. Individual Participant Process for Video Capture
5. Evaluating Automated Facial Expression Analysis Options
Note: Many facial expression analysis software programs exist. Software commands and functions may vary. It is important to follow the manufacturer's user guidelines and reference manual20.
6. Timestamp Participant Videos for Data Analysis
7. Time Series Emotional Analysis
Note: Consider the "baseline" to be the control (i.e., unflavored water in this example). The researcher has the ability to create a different "baseline treatment stimulus" or a "baseline time without stimulus" for paired comparison dependent on the interests of the investigation. The method proposed accounts for a "default" state by using a paired statistical test. In other words, the procedure uses statistical blocking (i.e., a paired test) to adjust for the default appearance of each participant and therefore reduces the variability across participants.
The method proposes a standard protocol for AFEA data collection. If suggested protocol steps are followed, unusable emotional data output (Figure 1) resulting from poor data collection (Figure 2: A; Left Picture) may be limited. Time series analysis cannot be utilized if log files (.txt) predominantly contain "FIT_FAILED" and "FIND_FAILED" as this is bad data (Figure 1). Furthermore, the method includes a protocol for direct statistical comparison between two treatments of emotional data output over a time frame to establish an emotional profile. Time series analysis can provide emotional trends over time and can provide a value-added dimension to hedonic acceptability results. Additionally, time series analysis can show changes in emotional levels over time, which is valuable during the eating experience.
Unflavored milk, unflavored water and vanilla extract flavor in milk were not different (p>0.05) in mean acceptability scores and were rated as "liked slightly" (Figure 9). Hedonic results infer that there were not any acceptability differences between unflavored milk, unflavored water and vanilla extract flavor in milk. However, AFEA time series analysis indicated unflavored milk generated less disgusted (p<0.025; 0 sec), surprised (p<0.025; 0-2.0 sec), less sad (p<0.025; 2.0-2.5 sec) and less neutral (p<0.025; ~3.0-3.5 sec) responses than did unflavored water (Figure 10). Additionally, vanilla extract flavor in milk introduced more happy expressions just before 5.0 seconds (p<0.025) and less sad (p<0.025; 2.0-3.0 and 5.0 sec) than unflavored water (Figure 11). Vanilla, as an odor, has been associated with the terms "relaxed", "serene", "reassured", "happiness", "well-being", "pleasantly surprised"23 and "pleasant"24. Salty flavor in milk had lower (p<0.05) mean hedonic acceptability scores (disliked moderately) (Figure 9) and salty flavor in milk generated more disgust (p<0.025) later (3.0-5.0 sec) than unflavored water (Figure 12). Intense salty has been associated with disgust and surprised25, 26. However, some studies have stated that salty flavor does not elicit facial response7, 27-29.
Figure 1. Example of sub-optimal data capture due to participant incompatibility with AFEA software resulting in loss of raw emotional data response points in the exported output files [FIT_FAILED; FIND_FAILED]. Video failures occur when serious facial occlusions or the inability to map the face persists during the specified post-consumption window. Please click here to view a larger version of this figure.
Figure 2. Example of sub-optimal data capture due to participant software modeling. The figure presents sub-optimal data capture due to participant software modeling incompatibility and failure of face mapping to determine emotional response (A). Example of successful fit modeling and ability to capture participant's emotional response (B). Please click here to view a larger version of this figure.
Figure 3. Example of extracted participant data compiled in a new data spreadsheet. Participant data (participant number, treatment, original video time, and emotion response) is identified per emotion (happy, neutral, sad, angry, surprised, scared, and disgusted) for the select time frame (seconds). This spreadsheet is utilized for subsequent analyses. Please click here to view a larger version of this figure.
Figure 4. Example of extracted participant data compiled for subsequent analysis. The extracted participant data (A1 and B1) is compiled (A2 and B2), graphed (A3 and B3) and aligned (A4 and B4) as a visual for direct comparison. The respective time zero for control (A4: Surprised Unflavored Water) and treatment (B4: Surprised Unflavored Milk) are displayed for comparing the surprised emotional results. This example represents and identifies the corresponding time zero from the timestamp file for each participant-treatment pair. Please click here to view a larger version of this figure.
Figure 5. Example of extracted participant data with adjusted time frame. The extracted participant data is presented with adjusted time frame with a true "time zero" (A1 and B1). The time adjustment allows for direct comparison between a control (A: Surprised Unflavored Water) and a treatment (B2: Surprised Unflavored Milk) (A2 and B2). This example represents and identifies the corresponding true "time zero" (adjusted) from the timestamp file for each participant-treatment pair. Please click here to view a larger version of this figure.
Figure 6. Example of the process for compiling all participants' data. The participant, adjusted time, and paired treatment (e.g., unflavored water and unflavored milk) at each time point is compiled to prepare for statistical analysis. Please click here to view a larger version of this figure.
Figure 7. Data spreadsheet example comparing a control (Unflavored Water) and a treatment (Unflavored Milk) using Wilcoxon tests across participants at a specific time point. The figure represents direct comparison between the emotional results of a respective sample and the control (unflavored water) using sequential paired nonparametric Wilcoxon tests across the participants. Please click here to view a larger version of this figure.
Figure 8. Example of the data spreadsheet to graph the results if (p<0.025) on the associated treatment graph (i.e., unflavored milk compared to unflavored water). Results of sequential paired nonparametric Wilcoxon tests across the participants are graphed for the times where the null hypothesis is rejected. Please click here to view a larger version of this figure.
Figure 9. Mean acceptability (hedonic) scores of unflavored water, unflavored milk, vanilla extract flavor in milk and salty flavor in milk beverage solutions. Acceptability was based on a 9-point hedonic scale (1=dislike extremely, 5=neither like nor dislike, 9=like extremely; mean +/- SD)1. Treatment means with different superscripts significantly differ in liking (p<0.05). Unflavored milk, unflavored water and vanilla extract flavor in milk were not different (p>0.05) in mean acceptability scores and were rated as "liked slightly". Salty flavor in milk had a lower (p<0.05) mean acceptability scores (disliked moderately). Please click here to view a larger version of this figure.
Figure 10. Time series graphs of classified emotions on automated facial expression analysis data over 5.0 seconds comparing unflavored milk and unflavored water. Based on sequential paired nonparametric Wilcoxon tests between unflavored milk and unflavored water (baseline), results are plotted on the respective treatment graph if the treatment median is higher and of greater significance (p<0.025) for each emotion. Presence of a line indicates a significant difference (p<0.025) at the specific time point where the median is higher, while absence of a line indicates no difference at a specific time point (p>0.025). Absence of lines in unflavored milk (A) reveals no emotional categorization compared to unflavored water (p<0.025) over 5.0 seconds. In the unflavored water (B), emotional results compared to unflavored milk reveal disgusted (crimson line) at 0 sec, surprised (orange line) occurs between 0 – 1.5 sec, sad (green line) occurs around 2.5 sec, and neutral (red line) occurs around 3 – 3.5 sec (p<0.025). Please click here to view a larger version of this figure.
Figure 11. Time series graphs of classified emotions based on automated facial expression analysis data over 5.0 seconds comparing vanilla extract flavor in milk and unflavored water (baseline). Based on sequential paired nonparametric Wilcoxon tests between vanilla extract flavor in milk and unflavored water, results are plotted on the respective treatment graph if treatment median is higher and of greater significance (p<0.025) for each emotion. Presence of a line indicates a significant difference (p<0.025) at the specific time point where the median is higher, while absence of a line indicates no difference at a specific time point (p>0.025). Vanilla extract flavor in milk (A) shows happy just before 5 sec (blue line) while unflavored water (B) displays more sad around 2 – 2.5 and 5 sec (green line) (p<0.025). Please click here to view a larger version of this figure.
Figure 12. Time series graphs of classified emotions based on automated facial expression analysis data over 5.0 seconds comparing salty flavor in milk and unflavored water. Based on sequential paired nonparametric Wilcoxon tests between salty flavor in milk and unflavored water (baseline), results are plotted on the respective treatment graph if treatment median is higher and of greater significance (p<0.025) for each emotion. Presence of a line indicates a significant difference (p<0.025) at the specific time point where the median is higher, while absence of a line indicates no difference at a specific time point (p>0.025). Salty flavor in milk (A) has significant disgust from 3 – 5 seconds (crimson line) while unflavored water (B) has disgust at the beginning (crimson line) and more neutral from 2 – 5 seconds (red line) (p<0.025). Please click here to view a larger version of this figure.
AFEA application in literature related to food and beverage is very limited1-11. The application to food is new, creating an opportunity for establishing methodology and data interpretation. Arnade (2013)7 found high individual variability among individual emotional response to chocolate milk and white milk using area under the curve analysis and analysis of variance. However, even with participant variability, participants generated a happy response longer while sad and disgusted had shorter time response7. In a separate study using high and low concentrations of basic tastes, Arnade (2013)7, found that the differences in emotional response among basic tastes as well as between two levels of basic taste intensities (high and low intensity), were not as significant as expected, thereby questioning the accuracy of current AFEA methodology and data analysis. Sensory evaluation of foods and beverages is a complex and dynamic response process30. Temporal changes can occur throughout oral processing and swallowing thus potentially influencing the acceptability of the stimuli over time30. For this reason, it may beneficial to measure evaluator response throughout the entire eating experience. Specific oral processing times have been suggested (initial contact with tongue, mastication, swallowing, etc.)31, but none are standardized and times are largely dependent on the project and the researcher's discretion30.
The proposed emotional time series analysis was able to detect emotional changes and statistical differences between the control (unflavored water) and respective treatments. Moreover, emotional profiles associated with acceptability may aid in anticipating behavior related to foods and beverages. Results show that distinguishable time series trends exist with AFEA related to flavors in milk (Figures 10, 11, and 12). The time series analysis assists in differentiating food acceptability across a population by integrating characterized emotions (Figure 10, 11, and 12) as well as supporting hedonic acceptability trends (Figure 9). Leitch et al.8 observed differences between sweeteners and the water baseline using time series analysis (5 sec), and also found that the utilization of time series graphs provided for better interpretation of data and results. Moreover, emotional changes can be observed over time and emotional response treatment differences may be determined at different time points or intervals. For example, Leitch et al.8 observed that the approach emotions (angry, happy and surprised) were observed between the artificial sweetener-water comparisons but were observed at different times over the 5 sec observation window. However, Leitch et al.8 did not establish directionality of expression, making it difficult to understand the emotional difference between the control (water) and the treatment (unsweetened tea) using their graphical interpretation and presentation. The modified and improved time series analysis methodology presented in our study allows for statistical difference directionality. The directionality and results plotting allows researchers to visualize where statistically relevant emotional changes occur over the selected time frame.
Reducing video analysis failures is essential for attaining valid data and effectively using time and personnel resources. Critical steps and troubleshooting steps in the protocol include optimizing the participant sensory environment (lighting, video camera angle, chair height, thorough participant guidance instructions, etc.). Also, participants should be screened and excluded if they fall into a software incompatibility category (i.e., thick framed glasses, heavily bearded faces and skin tone) (Figure 2). These factors will influence AFEA fit modeling, emotional categorization, and data output. If a significant portion of a participant's data output consists of "FIT_FAILED" and "FIND_FAILED", data should be reevaluated for inclusion in the time series analysis (Figure 1). Time series analysis cannot be utilized if data output log files predominantly contain "FIT_FAILED" and "FIND_FAILED" as this is bad data (Figure 1). Shadowing on the face due to lighting settings may severely inhibit video capture quality, resulting in poor video collection. To avoid intense shadowing, diffuse frontal lighting is ideal while the light intensity or color is not as relevant20. Intense overhead lighting should be reduced as it can promote shadows on the face20. A dark background behind the participant is recommended20. It is suggested from the AFEA software manufacturer to place the setup in front of a window to have diffuse daylight lighting20. Also, if using a computer monitor, two lights may be placed on either side of the user's face for illumination and shadow reduction20. Additionally, professional photo lights may be used to counteract undesirable environment lighting20. Ultimately, it is up to the discretion of the researcher, individual protocol/methodology, and environment to control lighting for capture. It is recommended to discuss the data capture environment and the tools with the software provider before purchase and installation. Furthermore, chair height and camera angle are important to adjust individually for each participant. The participant should be comfortable but at a height where the camera is straight on the face. An attempt to reduce the camera angle on the face is encouraged for optimizing the AFEA video capture. Lastly, it is imperative to give verbal instructions to the participants prior to sampling. Participant behavior during video capture may limit data collection due to facial occlusion, movements, and camera avoidance.
For participant sample size needed for a study, the authors recommend a range of 10 to 50 participants. Although a small number will provide almost no statistical power, at least 2 participants are needed in general for time series analysis. Participant variability is high, and in the early stages of this research there is no guidance to offer with sample size. Sample size will vary depending on flavors, flavor intensity, and expected treatment acceptability. Samples with smaller flavor differences will require more participants. The 30 second controlled sampling period encompasses a time span adequate for the entire sampling evaluation period (i.e., showing the index card, opening a sample (removing the lid), consumption, and emotional capture). The entire 30 seconds is not used in data analysis. The benefit of this designated 30 second capture time is that the researcher can decide the pertinent evaluation time to be used in data analysis. The 30 second time window can assist in selecting a time frame of interest during a video sample while coding or timestamping videos. Ultimately, the time window is up to the discretion of the researcher. In our example, we used the 5 sec sampling window post-consumption. Furthermore, the present methodology defines time zero when the sample cup no longer occludes the face (cup at the chin). It is critically important to lessen the time between consumption and sample cup facial occlusion due to brief and changing emotions. Due to sample cup facial occlusion the initial time where the sample makes contact with the tongue is unreliable data (see Figure 1). Therefore, the point where the cup no longer occludes the face is the optimal recommendation. Timestamps need to be consistent for all participants. The color card is a convenient way for researchers to identify treatments in the video and mark the appropriate time frame (time zero) for sample evaluation. The color cards are especially helpful if treatments are in random order and serve as an extra validation of sample identification in the continuous video.
Limitations of this technique exist as participants may not follow directions or unavoidable shadowing on the participant's face may cause face fit model failures (Figure 2). However, the suggested critical steps offer ways to mitigate and reduce these interferences. Additionally, time series analysis will not read exported log files with files predominantly containing "FIT_FAILED" and "FIND_FAILED" (Figure 1). These file cannot be salvaged and will not be able to be included in time series analysis. Also, the consumption of food and beverages still may alter the facial structure in such a way to distort the emotional categorization. Hard or chewy foods require extensive jaw motion. Use of a drinking straw and associated sucking, also causes facial occlusion (straw) and distorts the face (sucking). This observation is based on preliminary data from our laboratory research. The software facial model cannot discern the differences between chewing (or sucking) and motor expressions associated with emotional categorization. With food and beverage samples, the opportunity for facial occlusion is higher than that of viewing videos and pictures. Participants must bring the sample to the face and remove the container from the face thus interrupting the software model and potentially reducing valuable emotional information (See Figure 1). As mentioned previously, emotions happen quickly and for a short duration. It is important to reduce the facial occlusion in an effort to capture emotions. The proposed methodology makes treatment comparisons at one thirtieth of a second to find changes in emotional patterns and changes in emotional duration across time. With the proposed methodology, patterns of emotional longevity are important. Unfortunately, emotional categorization problems can occur. Most notably there is a problem categorizing happy and disgust6, 9, 32, 33, 34. Oftentimes, this is due to participants masking their distaste or surprised feeling by smiling6, 32, 33, 34 that could be due to a "social display rule"32. Furthermore, the AFEA software is limited to seven emotional categories (neutral, happy, sad, scared, surprised, angry and disgusted). Emotional response to foods and beverages may be more complex than the current AFEA classification of universal emotions and categorization may be different in response to a food or beverage stimuli. Manual coding using FACS has been applied to gustofacial and olfactofacial responses of basic tastes and an assortment of odors and appeared to be sensitive enough to detect treatment differences in regards to AUs32. FACS is tedious and very time consuming, however, the temporal application of absence or presence of AUs may be useful to assist with complex responses that AFEA might not classify correctly or if emotional results are unexpected. While time series data allows for facial classifications to occur simultaneously and with significant expression, caution should be used with translating results into a single emotion due to emotional complexity.
The proposed methodology and data analysis technique may be applied to other beverages and soft foods. AFEA software was able to identify emotions to flavored and unflavored samples. The proposed methodology and temporal analysis may aid with characterizing implicit responses thereby providing new advances in emotional responses and behaviors of a population relating to food. Future applications of this technique may expand into other beverage categories or soft foods. We have demonstrated methodology to attain video capture for emotional response and data analysis methodology. We aim to create a standard approach for both emotional AFEA capture and emotional time series analysis. The method approach has shown success in our research. We hope to expand and apply this approach for evaluating emotional response to foods and beverages and the relationship to choice and behaviors.
The authors have nothing to disclose.
This project was funded, in part, by ConAgra Foods (Omaha, NE, USA), the Virginia Agricultural Experiment Station, the Hatch Program of the National Institute of Food and Agriculture, U.S. Department of Agriculture, and the Virginia Tech Water INTERface Interdisciplinary Graduate Education Program.
2% Reduced Fat Milk | Kroger Brand, Cincinnati, OH or DZA Brands, LLC, Salisbury, NC | na | for solutions |
Drinking Water | Kroger Brand, Cincinnati, OH | na | for solutions |
Imitation Clear Vanilla Flavor | Kroger Brand, Cincinnati, OH | na | for solutions |
Iodized Salt | Kroger Brand, Cincinnati, OH | na | for solutions |
FaceReader 6 | Noldus Information Technology, Wageningen, The Netherlands | na | For Facial Analysis |
Sensory Information Management System (SIMS) 2000 | Sensory Computer Systems, Berkeley Heights, NJ | Version 6 | For Sensory Data Capture |
Rhapsody | Acuity Brands Lighting, Inc., Conyers, GA | For Environment Illumination | |
R Version | R Core Team 2015 | 3.1.1 | For Statistical Analysis |
Microsoft Office | Microsoft | na | For Statistical Analysis |
JMP | Statistical Analysis Software (SAS) Version 9.2, SAS Institute, Cary, NC | na | For Statistical Analysis |
Media Recorder 2.5 | Noldus Information Technology, Wageningen, The Netherlands | na | For capturing participants sensory evaluation |
Axis M1054 Camera | Axis Communications, Lund, Sweden | na | |
Beverage | na | Beverage or soft food for evaluation |