Protocol for Data Collection and Analysis Applied to Automated Facial Expression Analysis Technology and Temporal Analysis for Sensory Evaluation

Courtney A. Crist; Susan E. Duncan; Daniel L. Gallagher

doi:10.3791/54046

JoVE Journal > Behavior

Comportamento

Protocol for Data Collection and Analysis Applied to Automated Facial Expression Analysis Technology and Temporal Analysis for Sensory Evaluation

Published: August 26, 2016

doi:

10.3791/54046

Courtney A. Crist, Susan E. Duncan, Daniel L. Gallagher

¹Food Science and Technology,Virginia Tech, ²Civil and Environmental Engineering,Virginia Tech

Summary

A protocol for capturing and statistically analyzing emotional response of a population to beverages and liquefied foods in a sensory evaluation laboratory using automated facial expression analysis software is described.

Abstract

We demonstrate a method for capturing emotional response to beverages and liquefied foods in a sensory evaluation laboratory using automated facial expression analysis (AFEA) software. Additionally, we demonstrate a method for extracting relevant emotional data output and plotting the emotional response of a population over a specified time frame. By time pairing each participant’s treatment response to a control stimulus (baseline), the overall emotional response over time and across multiple participants can be quantified. AFEA is a prospective analytical tool for assessing unbiased response to food and beverages. At present, most research has mainly focused on beverages. Methodologies and analyses have not yet been standardized for the application of AFEA to beverages and foods; however, a consistent standard methodology is needed. Optimizing video capture procedures and resulting video quality aids in a successful collection of emotional response to foods. Furthermore, the methodology of data analysis is novel for extracting the pertinent data relevant to the emotional response. The combinations of video capture optimization and data analysis will aid in standardizing the protocol for automated facial expression analysis and interpretation of emotional response data.

Introduction

Automated facial expression analysis (AFEA) is a prospective analytical tool for characterizing emotional responses to beverages and foods. Emotional analysis can add an extra dimension to existing sensory science methodologies, food evaluation practices, and hedonic scale ratings typically used both in research and industry settings. Emotional analysis could provide an additional metric that reveals a more accurate response to foods and beverages. Hedonic scoring may include participant bias due to failure to record reactions¹.

AFEA research has been used in many research applications including computer gaming, user behavior, education/pedagogy, and psychology studies on empathy and deceit. Most food-associated research has focused on characterizing emotional response to food quality and human behavior with food. With the recent trend in gaining insights into food behaviors, a growing body of literature reports use of AFEA for characterizing the human emotional response associated with foods, beverages, and odorants^1-12.

AFEA is derived from the Facial Action Coding System (FACS). The facial action coding system (FACS) discriminates facial movements characterized by action units (AUs) on a 5-point intensity scale¹³. The FACS approach requires trained review experts, manual coding, significant evaluation time, and provides limited data analysis options. AFEA was developed as a rapid evaluation method to determine emotions. AFEA software relies on facial muscular movement, facial databases, and algorithms to characterize the emotional response^14-18. The AFEA software used in this study reached a "FACS index of agreement of 0.67 on average on both the Warsaw Set of Emotional Facial Expression Pictures (WSEFEP) and Amsterdam Dynamic Facial Expression Set (ADFES), which is close to a standard agreement of 0.70 for manual coding"¹⁹. Universal emotions included in the analysis are happy (positive), sad (negative), disgusted (negative), surprised (positive or negative), angry (negative), scared (negative) and neutral each on a separate scale of 0 to 1 (0=not expressed; 1=fully expressed)²⁰. In addition, psychology literature includes happy, surprised, and angry as "approach" emotions (toward stimuli) and sad, scared, and disgusted as "withdrawal" emotions (away from aversive stimuli)²¹.

One limitation of the current AFEA software for characterizing emotions associated with foods is interference from facial movements associated with chewing and swallowing as well as other gross motor motions, such as extreme head movements. The software targets smaller facial muscular motions, relating position and degree of movement, based on over 500 muscle points on the face^16,17. Chewing motions interfere with classification of expressions. This limitation may be addressed using liquefied foods. However, other methodology challenges can also decrease video sensitivity and AFEA analysis including data collection environment, technology, researcher instructions, participant behavior, and participant attributes.

A standard methodology has not been developed and verified for optimal video capture and data analysis using AFEA for emotional response to foods and beverages in a sensory evaluation laboratory setting. Many aspects can affect the video capture environment including lighting, shadowing due to lighting, participant directions, participant behavior, participant height, as well as, camera height, camera angling, and equipment settings. Moreover, data analysis methodologies are inconsistent and lack a standard methodology for assessing emotional response. Here, we will demonstrate our standard operating procedure for capturing emotional data and processing data into meaningful results using beverages (flavored milk, unflavored milk and unflavored water) for evaluation. To our knowledge only one peer reviewed publication, from our lab group, has utilized time series for data interpretation for emotions analysis⁸; however, the method has been updated for our presented method. Our aim is to develop an improved and consistent methodology to help with reproducibility in a sensory evaluation laboratory setting. For demonstration, the objective of the study model is to evaluate if AFEA could supplement traditional hedonic acceptability assessment of flavored milk, unflavored milk and unflavored water. The intention of this video protocol is to help establish AFEA methodology, standardize video capture criteria in a sensory evaluation laboratory (sensory booth setting), and illustrate a method for temporal emotional data analysis of a population.

Protocol

Ethics Statement: This study was pre-approved by Virginia Tech Institutional Review Board (IRB) (IRB 14-229) prior to starting the project.

Caution: Human subject research requires informed consent prior to participation. In addition to IRB approval, consent for use of still or video images is also required prior to releasing any images for print, video, or graphic imaging. Additionally, food allergens are disclosed prior to testing. Participants are asked prior to panel start if they have any intolerance, allergies or other concerns.

Note: Exclusion Criteria: Automated facial expression analysis is sensitive to thick framed glasses, heavily bearded faces and skin tone. Participants who have these criteria are incompatible with software analysis due to an increased risk of failed videos. This is attributed to the software's inability to find the face.

1. Sample Preparation and Participant Recruitment

Prepare beverage or soft food samples.
1. Prepare intensified dairy solutions using 2% milk and suggested flavors from Costello and Clark (2009)²² as well as other flavors. Prepare the following solutions: (1) unflavored milk (2% reduced fat milk); (2) unflavored water (drinking water); (3) vanilla extract flavor in milk (0.02 g/ml) (imitation clear vanilla flavor); and (4) salty flavor in milk (0.004 g/ml iodized salt).
  Note: These solutions are used for demonstration purposes only.
2. Pour half ounce aliquots (~15 g) of each solution into 2 oz. transparent plastic sample cups and cap with color coded lids.
  Note: It is recommended to use transparent cups; however, it is up to the researcher's discretion.
Recruit participants from the campus or the local community to participate in the study.
Note: Participant sample size needed for a study is up to the discretion of the researcher. We recommend a range of 10 to 50 participants.
Obtain human subject consent prior to participation in the study.

2. Preparation of Panel Room for Video Capture

Note: This protocol is for data capture in a sensory evaluation laboratory. This protocol is to make AFEA data capture useful for a sensory booth setting.

Use individual booths with a touchscreen monitor in front of them (face level) to keep their focus forward and to prevent looking down.
Use adjustable height chairs with back support.
Note: These are essential for allowing participants to be vertically adjusted and placed in a suitable range for video capture. Use stationary chairs (no rolling feature) with adjustable back height support so the participant's movements are reduced.
Set overhead lighting at "100% daylight" for optimal facial emotional video capture (Illuminant 6504K; R=206; G=242; B=255).
Note: To avoid intense shadowing, diffuse frontal lighting is ideal while the light intensity or color is not as relevant²⁰. Ultimately, it is up to the discretion of the researcher, individual protocol/methodology, and environment to control lighting for capture.
Affix an adjustable camera above the touchscreen monitor for recording.
1. Use a camera with a resolution of at least 640 x 480 pixels (or higher)²⁰. Discuss the required camera capabilities with the software provider before purchase and installation²⁰. Note: The aspect ratio is not important²⁰.
2. Set camera capture speed to 30 frames per second (or other standard speed) for consistency.
3. Connect and ensure media recording software is set up to the camera to record and save participant videos.

3. Participant Adjustment and Verbal Directions

Have only one participant at a time evaluate the samples in the sensory booth.
Note: Testing more than one participant at the same time may interfere with the testing environment and disrupt the concentration of the participant or create bias.
Upon arrival, give participants verbal instructions about the process and standard operating procedures.
1. Have the participants sit straight up and against the back of the chair.
2. Adjust chair height, position of the chair (distance from the camera), and camera angle so that the participant's face is captured in the center of the video recording, with no shadows on chin or around eyes.
  Note: In the sensory booth, the participant's head is roughly 20 – 24 inches away from the camera and the monitor with the face centered in the camera video feed.
3. Instruct participants to remain seated as positioned and focused facing towards the monitor display. Additionally, instruct participants to refrain from any sudden movements post-sample consumption during the 30 sec evaluation period per sample.
4. Instruct the participant to consume the entire beverage or liquefied food sample and swallow.
5. Instruct the participant to quickly move the sample cup below the chin and down to the table immediately after the sample is in the mouth. This is to eliminate facial occlusion. Remind them to keep looking toward the monitor.
  Note: The sample carrier to deliver the sample is up to the discretion of the researcher. A straw or cup may be used. Regardless, initial facial occlusion is unavoidable because the face will be occluded or distorted due to consumption.
Instruct the participant to follow the instructions as they appear on the touchscreen monitor. Note: Instructions are automatically sequenced as programmed into the automated sensory software.

4. Individual Participant Process for Video Capture

Confirm video camera is optimally capturing participant’s face while the participant is seated comfortably in the booth (before sample presentation) by viewing the computer monitor on which the video capture is displayed. Begin recording by clicking the record button on the computer monitor.
Instruct participants to sip water to cleanse their palate.
Provide treatments one at a time, starting with a baseline or control treatment (unflavored water). Identify each sample by a unique colored index card placed on top of each sample relating to the sample color code for sample treatment identification within the video.
Note: Programmed guidance on the touchscreen monitor instructs participants. The instructions direct the participant through a series of standardized steps for each treatment sample.
Via the touchscreen monitor, direct the participant to:
1. Hold up the associated color index card pre-consumption for sample identification in the video.
  Note: The color card is a way researchers can identify treatments in the video and mark the appropriate time frame (time zero) for sample evaluation.
2. After holding the card briefly, place the card back on the tray.
3. Fully consume the sample and wait approximately 30 seconds, enforced through the programmed guidance on the monitor, while facing towards the camera.
  Note: The 30 sec controlled sampling period encompasses a time span adequate for the entire sampling evaluation period (i.e., showing the index card, opening a sample (removing the lid), consumption, and emotional capture).
4. Enter their hedonic acceptability score on the touchscreen monitor (1=dislike extremely, 2=dislike very much, 3=dislike moderately, 4=dislike slightly, 5=neither like nor dislike, 6=like slightly, 7=like moderately, 8=like very much, 9=like extremely).
5. Rinse mouth with drinking water before the next sample process.

5. Evaluating Automated Facial Expression Analysis Options

Note: Many facial expression analysis software programs exist. Software commands and functions may vary. It is important to follow the manufacturer's user guidelines and reference manual²⁰.

Save recordings in a media format and transfer to the automated facial expression analysis software.
Analyze participant videos using automated facial analysis software.
1. Double click on the software icon on the computer desktop.
2. Once the program is open, click "File", select "New…", and select "Project…"
3. In the pop up window, name the project and save the project.
4. Add participants to the project by clicking the "Add participants" icon (Person with a (+) sign). More participants can be added by repeating this step.
5. Add participant's video to the respective participant for analysis.
  1. On the left side of the screen click the icon of the film reel with a plus (+) sign to add a video to analyze.
  2. Click the "magnifying glass" under the participant of interest to browse the video to add.
Analyze videos frame-by-frame under continuous calibration analysis settings in the software.
1. Click the pencil icon to adjust settings at the bottom of the window, under the "settings" tab for each participant video.
  1. Set "Face Model" to General. Set "Smoothen classifications" to Yes. Set "Sample Rate" to Every frame.
  2. Set "Image rotation" to No. Set "Continuous calibration" to Yes. Set "Selected calibration" to None.
2. Save project settings.
3. Press the batch analysis icon (the same red and black target-like symbol) to analyze the project videos.
4. Save the results once analysis is completed.
  Note: Other video settings exist in the software if researcher preference warrants another analysis method.
5. Consider videos failures if serious facial occlusions or the inability to map the face persists during the specified post-consumption window (Figure 1). Additionally, if the model fails data will say "FIT_FAILED" or "FIND_FAILED" in the exported output files (Figure 2). This represents lost data since the software cannot classify or analyze the participant's emotions.
  Note: AFEA translates facial muscle motion to neutral, happy, disgusted, sad, angry, surprised and scared on a scale from 0 (not expressed) to 1 (fully expressed) for each emotion.
Export the AFEA data output as log files (.txt) for further analysis.
1. Once analyses are complete, export the whole project.
  1. Click "File", "Export", "Export Project Results".
  2. When a window opens, choose the location of where the exports should be saved and save the log files (.txt) to a folder.
  3. Convert each participant log life to a data spreadsheet (.csv or .xlsx) to extract relevant data.
    1. Open data spreadsheet software and select the "Data" tab.
    2. On the "Data" tab, in the "Get External Data" group, click "From Text".
    3. In the "Address bar", locate, double-click the participant text file to import, and follow the on screen wizard instructions.
    4. Continue the export process for all relevant participant files.

6. Timestamp Participant Videos for Data Analysis

Using the AFEA software, manually review each participant’s video and identify post-consumption time zero for each sample. Record the timestamp in a data spreadsheet. Post-consumption is defined when the sample cup is below the participant’s chin and no longer occludes the face.
Note: The placement of the timestamp is critical for evaluation. The point where the cup no longer occludes the face is the optimal recommendation and timestamps need to be consistent for all participants.
Save the timestamp data spreadsheet (.csv) as a reference for extracting relevant data from videos.
Note: Participant videos may also be coded internally in the software as "Event Marking".

7. Time Series Emotional Analysis

Note: Consider the "baseline" to be the control (i.e., unflavored water in this example). The researcher has the ability to create a different "baseline treatment stimulus" or a "baseline time without stimulus" for paired comparison dependent on the interests of the investigation. The method proposed accounts for a "default" state by using a paired statistical test. In other words, the procedure uses statistical blocking (i.e., a paired test) to adjust for the default appearance of each participant and therefore reduces the variability across participants.

Extract relevant data from the exported files (.csv or .xlsx).
1. Identify a time frame relevant to the study evaluation (seconds).
2. Manually extract respective data (time frame) from the exported participant files consulting the participant timestamp (time zero).
3. Compile each participant's treatment data (participant number, treatment, original video time, and emotion response) per emotion (happy, neutral, sad, angry, surprised, scared, and disgusted) for the select time frame (seconds) in a new data spreadsheet for future analysis (Figure 3).
4. Continue this process for all participants.
Identify the corresponding time zero from the timestamp file for each participant-treatment pair and adjust video time to a true time "0" for direct comparison (Figure 4, Figure 5).
Note: Participant data is collected in a continuous video therefore each treatment "time zero" is different (i.e., unflavored water video time zero is 02:13.5 and unflavored milk video time zero is 03:15.4) in Figure 4. Due to the different treatment "time zeroes", the video times need to be readjusted and realigned to start at "0:00.0" or other standard start time in order for direct time comparison of treatment emotional response data.
For each participant, emotion, and adjusted time point, extract the paired treatment (e.g., unflavored milk) and control treatment (e.g., unflavored water) quantitative emotional score. In other words, align a participant's treatment and control time series of responses for each emotion (Figure 5).
Compile all participant's information (participant, adjusted time, and paired treatment (e.g., unflavored water and unflavored milk) at each time point (Figure 6).
Note: The steps below demonstrate the steps for a paired Wilcox test by hand. Most data analysis software programs will do this automatically. It is recommended to discuss the statistical analysis process with a statistician.
Once the samples are reset and aligned with new adjusted video times, directly compare between the emotional results of a respective sample and the control (unflavored water) using sequential paired nonparametric Wilcoxon tests across the participants (Figure 7).
Note: The new time alignment of the samples will allow for direct comparison within the 5 seconds post-consumption time frame. If a paired observation is not present in a treatment, drop the participant from that time point comparison.
1. Calculate the difference between the control and the respective sample for each paired comparison using data spreadsheet management software.
  Note: The comparison will be dependent on the frame rate selected for emotional analysis in the software. The protocol demonstrates 30 individual comparisons per second for 5 seconds (selected time frame).
  Note: Use Figure 7 as a reference for columns and steps.
  1. Subtract the value of milk (e.g., unflavored milk) from the value of the control (e.g., unflavored water) to determine the difference. In the data spreadsheet management software in a new column titled "Treatment Difference", enter "=(C2)-(D2)", where "C2" is the control emotional values and "D2" is the selected treatment emotional values. Continue this process for all time points.
  2. Calculate the absolute value of the treatment difference. In the data spreadsheet management software in a new column, enter "=ABS(E2)", where "E2" is the Treatment Difference. Continue this process for all time points.
  3. Determine the rank order of the treatment difference. In the data spreadsheet management software in a new column, enter "=RANK(G2, $G$2:$G$25, 1)" where "G2" is the Absolute Difference and "1" is "ascending". Continue this process for all time points.
  4. Determine the signed rank of the rank order on the spreadsheet. Change the sign to negative if the treatment difference was negative (Column I).
  5. Calculate the positive sum (=SUMIF(I2:I25, ">0", I2:I25) and negative sum =SUMIF(I2:I25,"<0",I2:I25) of the rank values.
  6. Determine the test statistic. The test statistic is the absolute value lower sum.
  7. Consult statistical tables for Wilcoxon Signed Ranked Test Statistic using the number of observations included at the specific time and a selected alpha value to determine the critical value.
  8. If the test statistic is less than the critical value reject the null hypothesis. If it is greater, accept the null hypothesis.
Graph the results on the associated treatment graph (i.e., unflavored milk compared to unflavored water) for the times when the null hypothesis is rejected. Use the sign of the difference to determine which treatment has the greater emotion (Figure 8).
1. In the data spreadsheet management software, create a graph using the values of presence or absence of significance.
  1. Click "Insert" tab.
  2. Select "Line"
  3. Right click on the graph box.
  4. Click "select data" and follow the screen prompts to select and graph relevant data (Figure 8).
    Note: The graphs will portray emotional results where the sample or control is higher and significant. Graph dependent, the emotion is higher at that specific time allowing the ability to discern how participant's emotions evolve over the 5 second time period between two samples.
    Note: Statistical support with a statistician is highly recommended to extract relevant data. Development of statistical coding is required to analyze emotional results.

Representative Results

The method proposes a standard protocol for AFEA data collection. If suggested protocol steps are followed, unusable emotional data output (Figure 1) resulting from poor data collection (Figure 2: A; Left Picture) may be limited. Time series analysis cannot be utilized if log files (.txt) predominantly contain "FIT_FAILED" and "FIND_FAILED" as this is bad data (Figure 1). Furthermore, the method includes a protocol for direct statistical comparison between two treatments of emotional data output over a time frame to establish an emotional profile. Time series analysis can provide emotional trends over time and can provide a value-added dimension to hedonic acceptability results. Additionally, time series analysis can show changes in emotional levels over time, which is valuable during the eating experience.

Unflavored milk, unflavored water and vanilla extract flavor in milk were not different (p>0.05) in mean acceptability scores and were rated as "liked slightly" (Figure 9). Hedonic results infer that there were not any acceptability differences between unflavored milk, unflavored water and vanilla extract flavor in milk. However, AFEA time series analysis indicated unflavored milk generated less disgusted (p<0.025; 0 sec), surprised (p<0.025; 0-2.0 sec), less sad (p<0.025; 2.0-2.5 sec) and less neutral (p<0.025; ~3.0-3.5 sec) responses than did unflavored water (Figure 10). Additionally, vanilla extract flavor in milk introduced more happy expressions just before 5.0 seconds (p<0.025) and less sad (p<0.025; 2.0-3.0 and 5.0 sec) than unflavored water (Figure 11). Vanilla, as an odor, has been associated with the terms "relaxed", "serene", "reassured", "happiness", "well-being", "pleasantly surprised"²³and "pleasant"²⁴. Salty flavor in milk had lower (p<0.05) mean hedonic acceptability scores (disliked moderately) (Figure 9) and salty flavor in milk generated more disgust (p<0.025) later (3.0-5.0 sec) than unflavored water (Figure 12). Intense salty has been associated with disgust and surprised^{25, 26}. However, some studies have stated that salty flavor does not elicit facial response^{7, 27-29}.

Figure 1. Example of sub-optimal data capture due to participant incompatibility with AFEA software resulting in loss of raw emotional data response points in the exported output files [FIT_FAILED; FIND_FAILED]. Video failures occur when serious facial occlusions or the inability to map the face persists during the specified post-consumption window. Please click here to view a larger version of this figure.

Figure 2. Example of sub-optimal data capture due to participant software modeling. The figure presents sub-optimal data capture due to participant software modeling incompatibility and failure of face mapping to determine emotional response (A). Example of successful fit modeling and ability to capture participant's emotional response (B). Please click here to view a larger version of this figure.

Figure 3. Example of extracted participant data compiled in a new data spreadsheet. Participant data (participant number, treatment, original video time, and emotion response) is identified per emotion (happy, neutral, sad, angry, surprised, scared, and disgusted) for the select time frame (seconds). This spreadsheet is utilized for subsequent analyses. Please click here to view a larger version of this figure.

Figure 4. Example of extracted participant data compiled for subsequent analysis. The extracted participant data (A1 and B1) is compiled (A2 and B2), graphed (A3 and B3) and aligned (A4 and B4) as a visual for direct comparison. The respective time zero for control (A4: Surprised Unflavored Water) and treatment (B4: Surprised Unflavored Milk) are displayed for comparing the surprised emotional results. This example represents and identifies the corresponding time zero from the timestamp file for each participant-treatment pair. Please click here to view a larger version of this figure.

Figure 5. Example of extracted participant data with adjusted time frame. The extracted participant data is presented with adjusted time frame with a true "time zero" (A1 and B1). The time adjustment allows for direct comparison between a control (A: Surprised Unflavored Water) and a treatment (B2: Surprised Unflavored Milk) (A2 and B2). This example represents and identifies the corresponding true "time zero" (adjusted) from the timestamp file for each participant-treatment pair. Please click here to view a larger version of this figure.

Figure 6. Example of the process for compiling all participants' data. The participant, adjusted time, and paired treatment (e.g., unflavored water and unflavored milk) at each time point is compiled to prepare for statistical analysis. Please click here to view a larger version of this figure.

Figure 7. Data spreadsheet example comparing a control (Unflavored Water) and a treatment (Unflavored Milk) using Wilcoxon tests across participants at a specific time point. The figure represents direct comparison between the emotional results of a respective sample and the control (unflavored water) using sequential paired nonparametric Wilcoxon tests across the participants. Please click here to view a larger version of this figure.

Figure 8. Example of the data spreadsheet to graph the results if (p<0.025) on the associated treatment graph (i.e., unflavored milk compared to unflavored water). Results of sequential paired nonparametric Wilcoxon tests across the participants are graphed for the times where the null hypothesis is rejected. Please click here to view a larger version of this figure.

Figure 9. Mean acceptability (hedonic) scores of unflavored water, unflavored milk, vanilla extract flavor in milk and salty flavor in milk beverage solutions. Acceptability was based on a 9-point hedonic scale (1=dislike extremely, 5=neither like nor dislike, 9=like extremely; mean +/- SD)¹. Treatment means with different superscripts significantly differ in liking (p<0.05). Unflavored milk, unflavored water and vanilla extract flavor in milk were not different (p>0.05) in mean acceptability scores and were rated as "liked slightly". Salty flavor in milk had a lower (p<0.05) mean acceptability scores (disliked moderately). Please click here to view a larger version of this figure.

Figure 10. Time series graphs of classified emotions on automated facial expression analysis data over 5.0 seconds comparing unflavored milk and unflavored water. Based on sequential paired nonparametric Wilcoxon tests between unflavored milk and unflavored water (baseline), results are plotted on the respective treatment graph if the treatment median is higher and of greater significance (p<0.025) for each emotion. Presence of a line indicates a significant difference (p<0.025) at the specific time point where the median is higher, while absence of a line indicates no difference at a specific time point (p>0.025). Absence of lines in unflavored milk (A) reveals no emotional categorization compared to unflavored water (p<0.025) over 5.0 seconds. In the unflavored water (B), emotional results compared to unflavored milk reveal disgusted (crimson line) at 0 sec, surprised (orange line) occurs between 0 – 1.5 sec, sad (green line) occurs around 2.5 sec, and neutral (red line) occurs around 3 – 3.5 sec (p<0.025). Please click here to view a larger version of this figure.

Figure 11. Time series graphs of classified emotions based on automated facial expression analysis data over 5.0 seconds comparing vanilla extract flavor in milk and unflavored water (baseline). Based on sequential paired nonparametric Wilcoxon tests between vanilla extract flavor in milk and unflavored water, results are plotted on the respective treatment graph if treatment median is higher and of greater significance (p<0.025) for each emotion. Presence of a line indicates a significant difference (p<0.025) at the specific time point where the median is higher, while absence of a line indicates no difference at a specific time point (p>0.025). Vanilla extract flavor in milk (A) shows happy just before 5 sec (blue line) while unflavored water (B) displays more sad around 2 – 2.5 and 5 sec (green line) (p<0.025). Please click here to view a larger version of this figure.

Figure 12. Time series graphs of classified emotions based on automated facial expression analysis data over 5.0 seconds comparing salty flavor in milk and unflavored water. Based on sequential paired nonparametric Wilcoxon tests between salty flavor in milk and unflavored water (baseline), results are plotted on the respective treatment graph if treatment median is higher and of greater significance (p<0.025) for each emotion. Presence of a line indicates a significant difference (p<0.025) at the specific time point where the median is higher, while absence of a line indicates no difference at a specific time point (p>0.025). Salty flavor in milk (A) has significant disgust from 3 – 5 seconds (crimson line) while unflavored water (B) has disgust at the beginning (crimson line) and more neutral from 2 – 5 seconds (red line) (p<0.025). Please click here to view a larger version of this figure.

Discussion

AFEA application in literature related to food and beverage is very limited^1-11. The application to food is new, creating an opportunity for establishing methodology and data interpretation. Arnade (2013)⁷ found high individual variability among individual emotional response to chocolate milk and white milk using area under the curve analysis and analysis of variance. However, even with participant variability, participants generated a happy response longer while sad and disgusted had shorter time response⁷. In a separate study using high and low concentrations of basic tastes, Arnade (2013)⁷, found that the differences in emotional response among basic tastes as well as between two levels of basic taste intensities (high and low intensity), were not as significant as expected, thereby questioning the accuracy of current AFEA methodology and data analysis. Sensory evaluation of foods and beverages is a complex and dynamic response process³⁰. Temporal changes can occur throughout oral processing and swallowing thus potentially influencing the acceptability of the stimuli over time³⁰. For this reason, it may beneficial to measure evaluator response throughout the entire eating experience. Specific oral processing times have been suggested (initial contact with tongue, mastication, swallowing, etc.)³¹, but none are standardized and times are largely dependent on the project and the researcher's discretion³⁰.

The proposed emotional time series analysis was able to detect emotional changes and statistical differences between the control (unflavored water) and respective treatments. Moreover, emotional profiles associated with acceptability may aid in anticipating behavior related to foods and beverages. Results show that distinguishable time series trends exist with AFEA related to flavors in milk (Figures 10, 11, and 12). The time series analysis assists in differentiating food acceptability across a population by integrating characterized emotions (Figure 10, 11, and 12) as well as supporting hedonic acceptability trends (Figure 9). Leitch et al.⁸ observed differences between sweeteners and the water baseline using time series analysis (5 sec), and also found that the utilization of time series graphs provided for better interpretation of data and results. Moreover, emotional changes can be observed over time and emotional response treatment differences may be determined at different time points or intervals. For example, Leitch et al.⁸ observed that the approach emotions (angry, happy and surprised) were observed between the artificial sweetener-water comparisons but were observed at different times over the 5 sec observation window. However, Leitch et al.⁸ did not establish directionality of expression, making it difficult to understand the emotional difference between the control (water) and the treatment (unsweetened tea) using their graphical interpretation and presentation. The modified and improved time series analysis methodology presented in our study allows for statistical difference directionality. The directionality and results plotting allows researchers to visualize where statistically relevant emotional changes occur over the selected time frame.

Reducing video analysis failures is essential for attaining valid data and effectively using time and personnel resources. Critical steps and troubleshooting steps in the protocol include optimizing the participant sensory environment (lighting, video camera angle, chair height, thorough participant guidance instructions, etc.). Also, participants should be screened and excluded if they fall into a software incompatibility category (i.e., thick framed glasses, heavily bearded faces and skin tone) (Figure 2). These factors will influence AFEA fit modeling, emotional categorization, and data output. If a significant portion of a participant's data output consists of "FIT_FAILED" and "FIND_FAILED", data should be reevaluated for inclusion in the time series analysis (Figure 1). Time series analysis cannot be utilized if data output log files predominantly contain "FIT_FAILED" and "FIND_FAILED" as this is bad data (Figure 1). Shadowing on the face due to lighting settings may severely inhibit video capture quality, resulting in poor video collection. To avoid intense shadowing, diffuse frontal lighting is ideal while the light intensity or color is not as relevant²⁰. Intense overhead lighting should be reduced as it can promote shadows on the face²⁰. A dark background behind the participant is recommended²⁰. It is suggested from the AFEA software manufacturer to place the setup in front of a window to have diffuse daylight lighting²⁰. Also, if using a computer monitor, two lights may be placed on either side of the user's face for illumination and shadow reduction²⁰. Additionally, professional photo lights may be used to counteract undesirable environment lighting²⁰. Ultimately, it is up to the discretion of the researcher, individual protocol/methodology, and environment to control lighting for capture. It is recommended to discuss the data capture environment and the tools with the software provider before purchase and installation. Furthermore, chair height and camera angle are important to adjust individually for each participant. The participant should be comfortable but at a height where the camera is straight on the face. An attempt to reduce the camera angle on the face is encouraged for optimizing the AFEA video capture. Lastly, it is imperative to give verbal instructions to the participants prior to sampling. Participant behavior during video capture may limit data collection due to facial occlusion, movements, and camera avoidance.

For participant sample size needed for a study, the authors recommend a range of 10 to 50 participants. Although a small number will provide almost no statistical power, at least 2 participants are needed in general for time series analysis. Participant variability is high, and in the early stages of this research there is no guidance to offer with sample size. Sample size will vary depending on flavors, flavor intensity, and expected treatment acceptability. Samples with smaller flavor differences will require more participants. The 30 second controlled sampling period encompasses a time span adequate for the entire sampling evaluation period (i.e., showing the index card, opening a sample (removing the lid), consumption, and emotional capture). The entire 30 seconds is not used in data analysis. The benefit of this designated 30 second capture time is that the researcher can decide the pertinent evaluation time to be used in data analysis. The 30 second time window can assist in selecting a time frame of interest during a video sample while coding or timestamping videos. Ultimately, the time window is up to the discretion of the researcher. In our example, we used the 5 sec sampling window post-consumption. Furthermore, the present methodology defines time zero when the sample cup no longer occludes the face (cup at the chin). It is critically important to lessen the time between consumption and sample cup facial occlusion due to brief and changing emotions. Due to sample cup facial occlusion the initial time where the sample makes contact with the tongue is unreliable data (see Figure 1). Therefore, the point where the cup no longer occludes the face is the optimal recommendation. Timestamps need to be consistent for all participants. The color card is a convenient way for researchers to identify treatments in the video and mark the appropriate time frame (time zero) for sample evaluation. The color cards are especially helpful if treatments are in random order and serve as an extra validation of sample identification in the continuous video.

Limitations of this technique exist as participants may not follow directions or unavoidable shadowing on the participant's face may cause face fit model failures (Figure 2). However, the suggested critical steps offer ways to mitigate and reduce these interferences. Additionally, time series analysis will not read exported log files with files predominantly containing "FIT_FAILED" and "FIND_FAILED" (Figure 1). These file cannot be salvaged and will not be able to be included in time series analysis. Also, the consumption of food and beverages still may alter the facial structure in such a way to distort the emotional categorization. Hard or chewy foods require extensive jaw motion. Use of a drinking straw and associated sucking, also causes facial occlusion (straw) and distorts the face (sucking). This observation is based on preliminary data from our laboratory research. The software facial model cannot discern the differences between chewing (or sucking) and motor expressions associated with emotional categorization. With food and beverage samples, the opportunity for facial occlusion is higher than that of viewing videos and pictures. Participants must bring the sample to the face and remove the container from the face thus interrupting the software model and potentially reducing valuable emotional information (See Figure 1). As mentioned previously, emotions happen quickly and for a short duration. It is important to reduce the facial occlusion in an effort to capture emotions. The proposed methodology makes treatment comparisons at one thirtieth of a second to find changes in emotional patterns and changes in emotional duration across time. With the proposed methodology, patterns of emotional longevity are important. Unfortunately, emotional categorization problems can occur. Most notably there is a problem categorizing happy and disgust^{6, 9, 32, 33, 34}. Oftentimes, this is due to participants masking their distaste or surprised feeling by smiling^{6, 32, 33, 34} that could be due to a "social display rule"³². Furthermore, the AFEA software is limited to seven emotional categories (neutral, happy, sad, scared, surprised, angry and disgusted). Emotional response to foods and beverages may be more complex than the current AFEA classification of universal emotions and categorization may be different in response to a food or beverage stimuli. Manual coding using FACS has been applied to gustofacial and olfactofacial responses of basic tastes and an assortment of odors and appeared to be sensitive enough to detect treatment differences in regards to AUs³². FACS is tedious and very time consuming, however, the temporal application of absence or presence of AUs may be useful to assist with complex responses that AFEA might not classify correctly or if emotional results are unexpected. While time series data allows for facial classifications to occur simultaneously and with significant expression, caution should be used with translating results into a single emotion due to emotional complexity.

The proposed methodology and data analysis technique may be applied to other beverages and soft foods. AFEA software was able to identify emotions to flavored and unflavored samples. The proposed methodology and temporal analysis may aid with characterizing implicit responses thereby providing new advances in emotional responses and behaviors of a population relating to food. Future applications of this technique may expand into other beverage categories or soft foods. We have demonstrated methodology to attain video capture for emotional response and data analysis methodology. We aim to create a standard approach for both emotional AFEA capture and emotional time series analysis. The method approach has shown success in our research. We hope to expand and apply this approach for evaluating emotional response to foods and beverages and the relationship to choice and behaviors.

Divulgazioni

The authors have nothing to disclose.

Acknowledgements

This project was funded, in part, by ConAgra Foods (Omaha, NE, USA), the Virginia Agricultural Experiment Station, the Hatch Program of the National Institute of Food and Agriculture, U.S. Department of Agriculture, and the Virginia Tech Water INTERface Interdisciplinary Graduate Education Program.

Materials

2% Reduced Fat Milk	Kroger Brand, Cincinnati, OH or DZA Brands, LLC, Salisbury, NC	na	for solutions
Drinking Water	Kroger Brand, Cincinnati, OH	na	for solutions
Imitation Clear Vanilla Flavor	Kroger Brand, Cincinnati, OH	na	for solutions
Iodized Salt	Kroger Brand, Cincinnati, OH	na	for solutions
FaceReader 6	Noldus Information Technology, Wageningen, The Netherlands	na	For Facial Analysis
Sensory Information Management System (SIMS) 2000	Sensory Computer Systems, Berkeley Heights, NJ	Version 6	For Sensory Data Capture
Rhapsody	Acuity Brands Lighting, Inc., Conyers, GA		For Environment Illumination
R Version	R Core Team 2015	3.1.1	For Statistical Analysis
Microsoft Office	Microsoft	na	For Statistical Analysis
JMP	Statistical Analysis Software (SAS) Version 9.2, SAS Institute, Cary, NC	na	For Statistical Analysis
Media Recorder 2.5	Noldus Information Technology, Wageningen, The Netherlands	na	For capturing participants sensory evaluation
Axis M1054 Camera	Axis Communications, Lund, Sweden	na
Beverage		na	Beverage or soft food for evaluation

Riferimenti

De Wijk, R. A., Kooijman, V., Verhoeven, R. H. G., Holthuysen, N. T. E., De Graaf, C. Autonomic nervous system responses on and facial expressions to the sight, smell, and taste of liked and disliked foods. Food Qual Prefer. 26 (2), 196-203 (2012).
De Wijk, R. A., He, W., Mensink, M. G. J., Verhoeven, R. H. G., De Graaf, C. ANS responses and facial expression differentiate between the taste of commercial breakfast drinks. PLoS ONE. 9 (4), 1-9 (2014).
He, W., Boesveldt, S., De Graaf, C., De Wijk, R. A. Behavioural and physiological responses to two food odours. Appetite. 59 (2), 628 (2012).
He, W., Boesveldt, S., De Graaf, C., De Wijk, R. A. Dynamics of autonomic nervous system responses and facial expressions to odors. Front Psychol. 5 (110), 1-8 (2014).
Danner, L., Sidorkina, L., Joechl, M., Duerrschmid, K. Make a face! Implicit and explicit measurement of facial expressions elicited by orange juices using face reading technology. Food Qual Prefer. 32 (2014), 167-172 (2013).
Danner, L., Haindl, S., Joechl, M., Duerrschmid, K. Facial expression and autonomous nervous system responses elicited by tasting different juices. Food Res Int. 64 (2014), 81-90 (2014).
Arnade, E. A. . Measuring consumer emotional response to tastes and foods through facial expression analysis [thesis]. , 1-187 (2013).
Leitch, K. A., Duncan, S. E., O’Keefe, S., Rudd, R., Gallagher, D. L. Characterizing consumer emotional response to sweeteners using an emotion terminology questionnaire and facial expression analysis. Food Res Int. 76, 283-292 (2015).
Crist, C. A., et al. Application of emotional facial analysis technology to consumer acceptability using a basic tastes model. , (2014).
Garcia-Burgos, D., Zamora, M. C. Facial affective reactions to bitter-tasting foods and body mass index in adults. Appetite. 71 (2013), 178-186 (2013).
Garcia-Burgos, D., Zamora, M. C. Exploring the hedonic and incentive properties in preferences for bitter foods via self-reports, facial expressions and instrumental behaviours. Food Qual Prefer. 39 (2015), 73-81 (2015).
Lewinski, P., Fransen, M. L., Tan, E. S. H. Predicting advertising effectiveness by facial expressions in response to amusing persuasive stimuli. J. Neurosci. Psychol. Econ. 7 (1), 1-14 (2014).
Ekman, P., Friesen, W. V. Facial action coding system: A technique for the measurement of facial movement. , (1978).
Viola, P., Jones, M. Rapid object detection using a boosted cascade of simple features. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern. 1, 511-518 (2001).
Sung, K. K., Poggio, T. Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20 (1), 39-51 (1998).
. . FaceReader 5™ Technical Specifications. , (2014).
Cootes, T., Taylor, C. . Statistical models of appearance for computer vision: Technical report. , (2000).
Bishop, C. M. . Neural networks for pattern recognition. , (1995).
Lewinski, P., den Uyl, T. M., Butler, C. Automated facial coding: validation of basic emotions and FACS AUs in FaceReader. J. Neurosci. Psychol. Econ. 7 (4), 227-236 (2014).
Noldus Information Technology. . FaceReader Reference Manual Version 6. , (2014).
Alves, N. T., Fukusima, S. S., Aznar-Casanova, J. A. Models of brain asymmetry in emotional processing. Psychol Neurosci. 1 (1), 63-66 (2008).
Costello, M., Clark, S., Clark, S., Costello, M., Drake, M., Bodyfelt, F. Preparation of samples for instructing students and staff in dairy products evaluation (Appendix F). The sensory evaluation of dairy foods. , 551-560 (2009).
Porcherot, C., et al. How do you feel when you smell this? Optimization of a verbal measurement of odor-elicited emotions. Food Qual Prefer. 21, 938-947 (2010).
Warrenburg, S. Effects of fragrance on emotions: Moods and physiology. Chem. Senses. 30, i248-i249 (2005).
Bredie, W. L. P., Tan, H. S. G., Wendin, K. A comparative study on facially expressed emotions in response to basic tastes. Chem. Percept. 7 (1), 1-9 (2014).
Wendin, K., Allesen-Holm, B. H., Bredie, L. P. Do facial reactions add new dimensions to measuring sensory responses to basic tastes?. Food Qual Prefer. 22, 346-354 (2011).
Rosenstein, D., Oster, H. Differential facial responses to four basic tastes in newborns. Child Dev. 59 (6), 1555-1568 (1988).
Rosenstein, D., Oster, H., P, E. k. m. a. n., E, R. o. s. e. n. b. e. r. g. Differential facial responses to four basic tastes in newborns. What the face reveals: Basic and applied studies of spontaneous expression using the facial action coding system (FACS). , 302-327 (1997).
Rozin, P., Fallon, A. E. A perspective on disgust. Psychol. Rev. 94 (1), 23-41 (1987).
Delarue, J., Blumenthal, D. Temporal aspects of consumer preferences. Curr. Opin. Food Sci. 3, 41-46 (2015).
Sudre, J., Pineau, N., Loret, C., Marin, N. Comparison of methods to monitor liking of food during consumption. Food Qual Prefer. 24 (1), 179-189 (2012).
Weiland, R., Ellgring, H., Macht, M. Gustofacial and olfactofacial responses in human adults. Chem. Senses. 35 (9), 841-853 (2010).
Ekman, P., Cole, J. Universal and cultural differences in facial expressions of emotion. Nebraska symposium on motivation. , 207-283 (1971).
Griemel, E., Macht, M., Krumhuber, E., Ellgring, H. Facial and affective reactions to tastes and their modulation by sadness and joy. Physiol Behav. 89 (2), 261-269 (2006).

Play Video

PDF

DOI

DOWNLOAD MATERIALS LIST

Citazione di questo articolo

Crist, C. A., Duncan, S. E., Gallagher, D. L. Protocol for Data Collection and Analysis Applied to Automated Facial Expression Analysis Technology and Temporal Analysis for Sensory Evaluation. J. Vis. Exp. (114), e54046, doi:10.3791/54046 (2016).