Summary

IntelliSleepScorer, a Software Package with a Graphic User Interface for Mice Automated Sleep Stage Scoring

Published: November 08, 2024
doi:

Summary

We present a software package with a graphic user interface for researchers without coding experience to score sleep stages in mice with a simple download and operation.

Abstract

Sleep stage scoring in rodents is the process of identifying the three stages: nonrapid eye movement sleep (NREM), rapid eye movement sleep (REM), and wake. Sleep stage scoring is crucial for studying sleep stage-specific measures and effects.

Sleep patterns in rodents differ from those in humans, characterized by shorter episodes of NREM and REM interspaced by waking, and traditional manual sleep stage scoring by human experts is time-consuming. To address this issue, previous studies have used machine learning-based approaches to develop algorithms to automatically categorize sleep stages, but high-performing models with great generalizability are often not publicly available/cost-free nor user-friendly for non-trained sleep researchers.

Therefore, we developed a machine learning-based LightGBM algorithm trained with a large dataset. To make the model available to sleep researchers without coding experience, a software tool named IntelliSleepScorer (v1.2- newest version) was developed based on the model, which features an easy-to-use graphic user interface. In this manuscript, we present step-by-step instructions for using the software to demonstrate a convenient and effective automatic sleep stage scoring tool in mice for sleep researchers.

Introduction

Sleep stage scoring in rodents is the procedure to identify the three stages: non-rapid eye movement sleep (NREM), rapid eye movement sleep (REM), and wake2. In rodents, NREM is characterized by reduced muscle activity, slow and regular breathing, decreased heart rate, and low-frequency oscillations of the brain waves. REM in rodents, similar to humans, shows muscle atonia, EEG activation, and rapid eye movements, although the occurrence of vivid dreaming is less clear in rodents compared to humans2,3. The "wake" state in rodents is marked by desynchronized brain activity with high-frequency, low-amplitude waves, increased muscle tone, and active behavior, such as grooming and exploration4. These three stages can be identified by inspecting electroencephalogram (EEG) and electromyogram (EMG) signals5.

The automatic sleep stage scoring models in rodents are in great need. First, manual sleep stage scoring by human experts is labor-intensive and time-consuming. Secondly, sleep patterns in rodents differ from those in humans and have more fragmented episodes of NREM and REM interspaced by waking, around 10 min, in contrast to 60-120 min in humans6. Therefore, identifying these brief periods during manual scoring is challenging. There have been many attempts since the 60s to develop an automatic scoring system of rodent sleep data7. Although many automated rodent sleep scoring methods exist, their performances vary8,9,10,11,12,13,14,15,16,17,18. Importantly, most high-performing models with high generalizability are not publicly available (some need special requests from developers) or are not cost-free for sleep researchers.

Therefore, to fill the current technology gap, we developed a machine learning-based model using a large dataset of 5776 h of EEG and EMG signals from 519 recordings across 124 mice with the LightGBM algorithm1. The lightGBM uses a gradient-boosting approach to construct decision trees19. In Wang et al., 2023, the LightGBM model (consisting of over 8000 decision trees) achieved an overall accuracy of 95.2% and a Cohen's kappa of 0.91, which outperformed two widely used baseline models such as the logistic regression model (accuracy = 93.3%) and the random forest model (accuracy = 94.3%, kappa = 0.89). The overall performance of the model also displayed a similar performance to that of human experts. Most importantly, the model has been proved to have generalizability and not overfitted to the original training data1: 1) It performed well (accuracy > 89%) on two other publicly available independent datasets, from Miladinovic and colleagues11, with different sampling frequencies and epoch lengths; 2) The performance of the model is not impacted by the light/dark cycle of mice; 3) A modified LightGBM model performed well on data containing only one EEG and one EMG electrode with kappa ≥ 0.89; 4) Both wildtype and mutant mice were used for the testing and the performances of the model were both accurate. This suggests the model can score sleep stages for mice with different genetic backgrounds.

In order to make this model accessible to sleep researchers who may not have coding expertise, we developed IntelliSleepScorer, a user-friendly software tool with a visually intuitive interface. The software can fully automate the sleep-scoring procedure in mice. It produces interactive visualizations of the signals, hypnogram, and Shapley Additive exPlanations (SHAP) values from an European data format (EDF)/EDF+ file input. The SHAP value approach, based on cooperative game theory, enhances the interpretability of machine learning models20. The model offers both global and epoch-level SHAP values, revealing how different feature values contribute to the scoring decision of the model overall and for each epoch. This advanced program significantly reduces the time and effort required for sleep stage scoring in mice while ensuring that downstream analysis can rely on highly accurate results. In this manuscript, we present step-by-step usage of IntelliSleepScorer (v1.2) with several updates upon version 1.0, including an option to run SHAP analysis separately from sleep pattern prediction, an user adjustable epoch length for sleep stage scoring, and a sleep stage manual correction feature integrated within the GUI.

Protocol

This study used data collected from in vivo experiments in mice. No human experiments were involved in the study. All the experiments with animals were approved by the Institutional Animal Care and Use Committee at the Broad Institute. All experiments were performed in accordance with relevant guidelines and regulations. The ARRIVE guidelines are not applicable to this study because the focus of this study is to develop machine learning models rather than comparing different treatment groups.

1. Data preparation

NOTE: Data compatibility: the recorded data can have any sampling rate higher than 40 Hz. There is no need to bandpass filter the signal because the software bandpass filters the EEG and EMG signals at the first step. The LightGBM models were developed and tested using data from mice. No evidence regarding the performance of the LightGBM models in other types of lab animals is available. The recording electrodes need to be placed at the frontal and the parietal cortex, or either place if only one EEG channel is recorded.

  1. EDF/EDF+ format arrangement and requirement
    NOTE: The software used in this study only reads EDF/EDF+ files using the MNE-Python package. The standard EDF/EDF+ specification needs to be applied to generate the EDF/EDF+ files. In addition to the standard specification, ensure that the EDF/EDF+ annotations are encoded in UTF-8. Otherwise, the software application will crash.
    1. Convert another file (non-EDF/EDF+ file) format into EDF/EDF+ format with online free tools.
      NOTE: There is no requirement for the apparatus filter when obtaining EEG and EMG signals. As long as users sample their EEG and EMG data at a frequency of 40 Hz or higher, the software will function correctly. This is because, in the initial preprocessing step, the signals undergo bandpass filtering between 1 Hz and 40 Hz. This bandpass filtering is integrated into the software's preprocessing pipeline, eliminating users needing to perform any additional signal processing.
  2. There are two models inside the software for scoring. One is LightGBM-2EEG, and the other is LightGBM-1EEG. The LightGBM-2EEG model is designated for recording data that has 2 EEG channels and 1 EMG channel. Perform the following steps depending on the model.
    1. Organize data files for LightGBM-2EEG specifically in the following order: 1) EEG channel recorded in the parietal area; 2) EEG channel recorded in the frontal area; 3) EMG channel. The LightGBM-1EEG is designated for data that only contains 1 EEG channel (electrode placement on either parietal or frontal area) and 1 EMG channel.
    2. Organize the channels in EDF/EDF+ files for LightGBM-1EEG in the following order: 1) EEG channel; 2) EMG channel.

2. Downloading IntelliSleepScorer for Windows, Mac, and Linux users

  1. For Windows users, a Windows Executable for the software is available using PyInstaller. Find the download link on the Pan group research page https://sites.broadinstitute.org/pan-lab/resources. For MacOS or Linux users, use the source code on the GitHub repository https://github.com/broadinstitute/IntelliSleepScorer to launch the software. 
  2. Access two recorded example data to test the program saved as EDF files by downloading them through the GitHub repository.
  3. The source code repository does not include the models folder due to size limits. Instead, download models.zip, unzip it, and copy the models folder inside the repository for the program to run. Otherwise, the software will crash due to missing model files.

3. Workflow and Program launch and operation

  1. Launch IntelliSleepScorer
    1. To launch the software in Windows, double-click IntelliSleepScorer.exe located in the root folder. To launch the software in MacOS or Linux, open a terminal emulator, change the directory to the root folder of the software, and then launch the software using the command: python3 IntelliSleepScorer.py.
  2. Once the software opens, click Select EDF/EDF+ File(s) to select the intended file(s) to score. If files were selected by mistake, click the Clear button to clear the selected file list.
    NOTE: By default, the software encodes the sleep stages as Wake:1, NREM:2, and REM:3 in the output score files. The default epoch length is set at 10 s. The current version (v1.2) of the GUI allows users to change stage encodings or epoch lengths to 4 s, 10 s, or 20 s with the dropdown menu.
  3. Select the desired epoch length. Use the provided dropdown menu to select the intended epoch length among options of 4 s, 10 s, and 20 s for sleep stage scoring.
  4. Select the model that is to be used for sleep scoring. LightGBM-2EEG is intended for data files with two EEG channels and one EMG channel, while the LightGBM-1EEG is designed for data with one EEG channel and one EMG channel.
  5. Before running the sleep stage prediction, include the additional SHAP calculation that helps explain the sleep stage prediction results. To process the SHAP calculation, check the Run/Plot SHAP checkbox. The SHAP calculation requires around 5-10 min to process.
  6. Click Score All Files. The model automatically scores all the EDF/EDF+ files and calculates the global and epoch SHAP values to interpret the scoring decisions in the list if that is chosen to do so.
    NOTE: During the scoring process, the model generates the following files and saves them to the same folder where the EDF/EDF+ files are located. The model uses these files to plot the global SHAP values and epoch SHAP values.

    "EDF/EDF+ file name}_{model_name}_features.csv"; this file stores all the extracted feature values.
    "EDF/EDF+ file name}_{model_name}_scores.csv"; this file stores the predicted sleep stages.
    "EDF/EDF+ file name}_{model_name}_rs_100hz.npy"; this file stores a copy of the resampled/downsampled signals (100hz). To improve the visualization speed, the model uses the downsampled signal instead of the original signal when plotting the signal.
    "​EDF/EDF+ file name}_{model_name}explainer. pickle"; "{EDF/EDF+ file name}{model_name}shap_500samples.pickle"; "{EDF/EDF+ file name}{model_name}_indicies_500samples.npy";
  7. After finishing the sleep scoring process, click on the Visualize the Selected File option to visualize the EEG/EMG signals and a hypnogram time-aligned with the signals.
    1. Score the selected file again before visualization if the epoch length is changed.

4. Navigating the scored results

  1. Click the provided Navigation buttons to move forward and backward to see different epoch data.
  2. If SHAP calculation is chosen to be performed, view both the global and epoch-level SHAP values. Right-click on an epoch to plot the epoch-level SHAP values.
    NOTE: It will take a few seconds to update the epoch-level SHAP plot. Figure 1 shows the GUI page overview after running the prediction for Example-1 EDF/EDF+ file with 1_LightBGM-2EEG model.

5. Interpretation of the scored sleep stages hypnogram

NOTE: There are 4 rows in the hypnogram (Figure 2). The top row is the predicted results. The bottom 3 rows are raw data of 2 EEG and 1 EMG channels, respectively. On the top row, orange suggests the Wake stage, blue suggests the NREM stage, and red suggests the REM stage in each epoch.

  1. To change the number of the epochs to display, click on the menu box on the right of Select Number of Epochs to Display and choose a desired value. In Figure 2, 100 epochs were chosen. Therefore, only 100 epochs are displayed in the sleep stage prediction plot. Select a smaller number in the dropdown menu to zoom into the plot.
  2. The pink transparent bar on the left of Figure 2 indicates the current location of the epoch. Left-click on any place on the hypnogram to switch to another epoch, or simply click on Go to Epoch and enter a specific number of epochs to be observed. Right-click the selected epoch to generate its epoch SHAP plot if the user has enabled the SHAP function.

6. Manual correction of the predicted sleep stages on GUI (Optional)

NOTE: if no anomaly is observed or extremely high accuracy is not required for REM stage prediction, manual verification is not needed.

  1. Left-click on an epoch in the sleep stage prediction plot (top plot) to select a specific epoch. The model predicted Stage of Selected Epoch is shown on the right of the text. To manually change the predicted stage on that epoch, click on the widget and select a new stage from the Wake, NREM, and REM options in the dropdown menu.
  2. The user-corrected stages are marked with dashed lines on top of the original plot (Figure 3). Close the GUI, and a new file with corrected prediction results will be automatically generated in the same folder.
    1. To open a saved scored file on the GUI again, ensure that the epoch length setting and the selected model match the one used when the EDF file was initially processed to be able to reopen it. All the previously modified/scored information will be quickly loaded.

Representative Results

There are three plots (only the top plot if SHAP values were not run) generated in the GUI after sleep stage scoring: the top plot presents EEG and EMG channels with a hypnogram of sleep stage prediction. The middle plot presents epoch SHAP values. The bottom plot presents Global SHAP values (Figure 1).

There are 4 types of data presented in the sleep stage prediction hypnogram plot (Figure 2). The top row is the predicted results. The bottom 3 rows are raw data of 2 EEG and 1 EMG channels, respectively. On the top row, orange suggests the "Wake" stage, blue suggests the "NREM" stage, and red suggests the "REM" stage in each epoch. The current example epoch is on 1305 and is in a stage of "Wake" because the pink location bar overlaps with an orange-colored line.

In Figure 3, a user-corrected stage is marked with dashed lines on top of the original. The red dashed line indicates the sleep stage has been changed from "Wake" to "REM".

In Figure 4, an example result of epoch 1305 in the example file 1 is shown. The Y-axis of the Epoch SHAP plot shows the top 10 features with the highest absolute SHAP values for the selected epoch. The x-axis shows the SHAP values, which indicate the contribution of each feature to the prediction compared to the average prediction. The positive SHAP value indicates a positive contribution to the prediction, and vice versa. Feature "emg_abs_max" has a very positive epoch-level SHAP value for Wake, indicating that "emg_abs_max" from the selected epoch increases the likelihood of the chosen epoch being scored as "Wake" (Figure 4). This is physiologically reasonable because a large EMG amplitude signifies active movements, thereby indicating the "Wake" stage.

In the Global SHAP plot example (Figure 5), each dot in the p-swarm plot represents one sample of data. The y-axis of the plots shows the top 10 features with the highest absolute global SHAP values calculated from 500 randomly sampled epochs. The x-axis shows the SHAP values, which indicate the contribution of each feature to the prediction compared to the average prediction. Different from the epoch SHAP plot, the global SHAP plot has two dimensions: the x-value of each dot and the intensity of the dot color. The positive x-axis SHAP value indicates a positive contribution to the prediction, and vice versa. Samples with a darker red color have higher feature values. By visually examining the correlations of position and the color of the 500 dots for each feature, one can interpret how LightGBM makes decisions based on the values of each feature. In the "Wake" global SHAP plot, as SHAP values for "emg_abs_max" increase from being more negative to being more positive, the color of the dots becomes darker. It indicates that the increased likelihood of being predicted as "Wake" is positively correlated with the increased value of "emg_abs_max". It is also worth noting that features with broader distributions of SHAP values (higher absolute values) contribute more to the prediction by the model. For example, the global NREM SHAP plot (Figure 5) has a wide spread of dots for the "eeg2_gamma_delta_ratio" feature. The highly negative SHAP value of the feature decreases the likelihood of the model to be scored as the "NREM" stage.

Figure 1
Figure 1: GUI page overview after running prediction for Example-1 EDF/EDF+ file with 1_LightBGM-2EEG model. Both sleep stage prediction and SHAP calculation are scored. The top plot shows EEG and EMG channels with a hypnogram of sleep stage prediction. The middle plot shows epoch SHAP values. The bottom plot shows Global SHAP values. Please click here to view a larger version of this figure.

Figure 2
Figure 2: Sleep stage prediction hypnogram from Example-1 EDF/EDF+ file. There are 4 data presented in the plot. The top row is the predicted results. The bottom 3 rows are raw data of 2 EEG and 1 EMG channels, respectively. On the top row, orange suggests the "Wake" stage, blue suggests the "NREM" stage, and red suggests the "REM" stage in each epoch. Please click here to view a larger version of this figure.

Figure 3
Figure 3: Sleep stage prediction hypnogram from Example-1 EDF/EDF+ File with user-modified manual correction. Epoch 1305 has been changed from stage "Wake" to stage "REM" as a demonstration of sleep stage manual correction inside the GUI. The dashed red line indicates the user-modified sleep stage "REM". Please click here to view a larger version of this figure.

Figure 4
Figure 4: Wake, NREM, and REM Epoch SHAP values for Example-1 EDF/EDF+ file at Epoch 1305. The y-axis of the Epoch SHAP plot shows the top 10 features with the highest absolute SHAP values for the selected epoch. The x-axis shows the SHAP values, which indicate the contribution of each feature to the prediction compared to the average prediction. Please click here to view a larger version of this figure.

Figure 5
Figure 5: Wake, NREM, and REM Global SHAP values for Example-1 EDF/EDF+ file. Each dot in the p-swarm plot represents one sample of data. The y-axis of the plots shows the top 10 features with the highest absolute global SHAP values calculated from 500 randomly sampled epochs. The x-axis shows the SHAP values, which indicate the contribution of each feature to the prediction compared to the average prediction. The x-axis of the global SHAP value has two dimensions: the x-value of each dot and the intensity of the dot color. The positive x-axis SHAP value indicates a positive contribution to the prediction, and vice versa. Samples with a darker red color have higher feature values. Please click here to view a larger version of this figure.

Discussion

This paper presents how to use the IntelliSleepScorer (v1.2) graphic user interface to automatically score the sleep stages of mice and how to leverage SHAP values/plots to better understand the sleep stage scores generated by the model.

An important consideration when using the software is data compatibility. The in-house data used in this study was limited to electrodes placed in the frontal and parietal regions. In the independent dataset from Miladinovic and colleagues11, despite differing electrode coordinates for these regions, the software maintained satisfactory performance. While the software may be applicable to other brain regions, we have not conducted tests to confirm this. Therefore, we cannot assert that there are no limitations regarding electrode placement. However, we encourage users to test it if they have recordings from other regions.

The duration of EEG/EMG recording sessions for sleep stage analysis varies depending on the specific goals of the study. Typically, recording sessions last either 12 h, covering either the light or dark phase, or 24 h, encompassing both phases in a single day. Both 12-h and 24-h recordings are commonly used to capture distinct sleep-wake patterns and circadian rhythms. The testing on the model shows that sleep stage scoring is reliable and accurate for both 12-h and 24-h recorded sessions. There is no known upper limit for the duration of the recordings.

Two note-worthy steps within the software protocol are data preprocessing and manual verification. To enhance the generalizability of the trained models, we allowed for noise and artifacts in the input data and implemented minimal quality control measures. The quality control we performed is aiming to exclude poor recordings due to loss of signals. Loss of signals is usually caused by connection issues, such as electrodes coming loose or falling off. Loss of signals may cause errors in the sleep scores generated by the software. For example, a flat line in the EMG channel of an awake mouse due to a loose EMG electrode may share the same feature as the immobile/sleep phase of the mice, which may lead to the "NREM" or "REM" prediction by the LightGBM models. For reference, we used the following criteria to automatically exclude recordings that had a significant amount of signal loss: 1) the amplitude of any EEG signal is less than 1 µV for at least 50% of the recording duration, or 2) the amplitude of the EMG signal is less than 1 pV for at least 50% of the recording duration. The users need to implement their own quality control criteria based on their experimental setup and recording system. Line noises caused by alternating current are usually located at 50 Hz or 60 Hz. These line noises will be removed during the bandpass filter (1-40 Hz) step implemented in the software. Therefore, there is no need for users to preprocess the line noises before inputting the data into the software. Minor noises or interfering factors, such as body movements, have been considered during model development1. The approach used in this study ensures that the generalizability of the model can tolerate these minor artifacts which do not significantly influence the final sleep staging results. The format, quality of the recording, and the organization sequence of the EDF data files are all critical components to optimize the performance of this pre-trained model.

For manual verification, given that the LightGBM model performed poorly (REM F1-score < 0.6) on a few recordings, we recommend that users verify the model-generated sleep stages if accurate REM stage scoring is critical. For studies focusing on wake or NREM stage physiology, the LightGBM model still supports a very robust and fully automated analytical pipeline. Interpreting SHAP values could become an excellent tool in conjunction with manual verification to confirm the accuracy of the software-generated outcomes. SHAP is an approach that uses game theory to explain the output of machine learning models. It connects optimal credit allocation with local explanations using classical Shapley values and their related extensions20. For example, in the Global NREM SHAP Values plot (Figure 4), if a dark red dot suddenly appears on the far-right side of the 'eeg2_gamma_delta_ratio" feature, it may indicate an anomaly in that epoch. This is because the isolated epoch sample dot deviates from the entire data set. Interpreting SHAP plots could also provide users with a clear and user-friendly explanation of the scoring of the sleep stages. For example, the epoch SHAP plots provide information about which features are the top deciding factor for the sleep stage scoring. The result prediction hypnogram (Figure 2) also provides significant information about the prediction results. In instances where the scored sleep stages in hypnogram exhibit frequent shifts between "Wake" and "REM" stages, which is physiologically abnormal, it is recommended to conduct a comprehensive evaluation of the scoring and corresponding SHAP values to ascertain the quality of the outcomes. With SHAP and hypnogram plots, researchers can quickly identify any prediction error during the scoring of sleep stages. However, while SHAP values explain how the model makes the predictions, they do not necessarily mean the predictions or the way the model makes certain predictions are correct. The goal of presenting SHAP values is to assist the users' understanding of the scoring process of the sleep stage and to enable users to quickly identify any error by examining the logic of the LightGBM model.

There are two outstanding features in the current version (v1.2) of the model. First, following the previous paragraph, manual verification/correction could be very tedious and inconvenient if users need to return to CSV files to make modifications, especially for lengthy recordings. Therefore, we offer a manual correction feature directly integrated into the GUI for users to change the scored sleep stage on any epoch. After the user click on a specific epoch, the scored sleep stage of either "Wake," "REM," or "NREM" will be shown on the dropdown menu on top of the GUI bar. If the user wishes to change the stage from one to another, they can simply select another stage from the dropdown menu, and a new scored file manually corrected by the users will be generated. On top of this, instead of a standard fixed 10-second epoch length provided for analysis, we provide an option for adjustment of epoch length on GUI to cater to the specific experimental needs of different sleep researchers. The options now are 4-s, 10-s, and 20-s epochs, which are all commonly used among sleep researchers. Even though this model has been trained with in-house data of 10-s epochs, the performance of the model on independent tests of data of 4-s epochs from different laboratories was comparable to that of human experts across all sleep stages1. Users must be careful when implementing 20-s epochs because 1) scoring mice sleep/wake with 20-s epochs may miss very short events such as transient arousals; 2) 20-s epochs are more likely to encompass mixed stages in each epoch8.

It will be insightful to compare this model with other existing automated sleep stage scoring methods for the users. Besides IntelliSleepScorer, there are some other developed models for automatic sleep stage scoring with varying degrees of accuracy, complexities, and efficacies8,9,10,11,12,13,14,15,16,17,18. The model used in this study employs the LightGBM algorithm to achieve high accuracy in sleep stage scoring, comparable to existing models. In our evaluation, IntelliSleepScorer demonstrated an overall accuracy of 95.2%, which is on par with the performance metrics reported for similar models such as MC-SleepNet8 and Sleep-Deep-Learner9. However, the true distinction of IntelliSleepScorer lies not only in its accuracy but also in its accessibility and ease of use for non-trained researchers with limited coding experience.

MC-SleepNet8, trained using deep neural networks on a large dataset of 4200 mice, achieves a high scoring accuracy of 96.4% and kappa statistic of 0.94, surpassing most existing methods. However, to our knowledge, there is no software based on MC-SleepNet that is publicly/freely available to date.

Sleep-Deep-Learner9, automates scoring in mice with an F1 score of 0.86 for REM sleep, 0.95 for NREM sleep, and 0.97 for wakefulness. However, the authors addressed in the paper that Sleep-Deep-Learner is not suitable for individuals who are not well-versed in sleep-wake scoring, as it requires a subset of manually scored epochs. IntelliSleepScorer, on the other hand, does not require any further manual manipulation during sleep stage scoring and has a very user-friendly GUI for any researchers.

Somnivore10 is a versatile, multi-layered system designed for automated wake-sleep stage scoring, adept at learning from limited training sets with complex polysomnography inputs. It operates with rapid computational efficiency and demonstrates robust generalization across diverse subjects, including humans, rodents (wildtype and transgenic), and pigeons. However, Somnivore is not cost-free.

SlumberNet11 and AccuSleep12 are both readily accessible on Zenodo and GitHub, respectively, and have a scoring accuracy of 97% and 96.8%. However, both models have relatively small training data sizes of 9 or 10 mice and did not include independent test validation. Therefore, their real-world performance remains unknown.

SPINDLE13 is another web-based model trained with a smaller dataset with 4-6 mice/rats leveraging convolutional neural networks. Validated across data from three independent sleep labs, SPINDLE achieved average agreement rates of 93%-99% with human expert scoring from different labs, mirroring human capability.

Finally, Somnotate14 demonstrated an accuracy of 0.97 ± 0.01 and a weighted F1 score of 0.97 ± 0.01 when evaluated on in-house datasets that included six 24-h recordings based on the consensus of at least three manual annotations. When tested on Somnotate's dataset, IntelliSleepScorer's performance decreased, with an accuracy of 0.75 ± 0.04 and a weighted F1 score of 0.73 ± 0.0514. This reduced performance may be attributed to differences in experimental set-ups or recording conditions across datasets. Given the variety of real-world use cases, we encourage users to evaluate the performance of IntelliSleepScorer on their datasets especially if their experimental set-ups or recording conditions differ significantly from ours. If the software's performance does not meet expectations, users have the option to fine-tune the pretrained model with their data as we have made the models and code used for development open source.

Besides the examples of existing models discussed above, there are some other developed automated sleep staging systems from which investigators could choose depending on specific and different needs for their studies15,16,17,18.

In future work, we aim to develop/train a new model that optimizes the automatic scoring time of sleep stages. Currently, the GUI processes 12 h of recordings sampled at 1000 Hz in approximately 10 min on an Intel Core i7-8550U CPU @ 1.80 GHz. However, processing time increases by approximately 2.5 times when scoring sleep stages with 4-s epochs compared to the default 10-s epochs. This new model could cater to the needs of some users who desire faster automatic sleep stage scoring in mice. We are also always listening to the users' feedback, and any new feature can be added upon request.

In summary, we provide a cost-free, publicly available, and user-friendly GUI software, IntelliSleepScorer, to create a convenient automated pipeline for mice sleep stage scoring. In addition, we went a step further by offering SHAP value visualizations that explain the scoring decision the model makes. Experienced users could also fine-tune our pre-trained model with their data, given that the model files and scripts for extracting the features for training/fine-tuning are all publicly available in the GitHub repository. We hope this openly available model can narrow the technology gap and facilitate the progression from data collection to novel findings using mouse models in sleep research while reducing labor-intensive work.

Disclosures

The authors have nothing to disclose.

Acknowledgements

We thank Kerena Yan and Jingwen Hu for manually scoring sleep stages and Eunah and Soonwiik for the recordings.

Materials

Canonical Unbuntu 18.04 Canonical https://releases.ubuntu.com/18.04/ Supporting Operating System for the software IntelliSleep Scorer: Windows, Mac, or Linux
Intel Core i7-8550U CPU @ 1.80 GHz 1.99 GHz; RAM: 24 GB  Intel Corp https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html Hardware requirment for the software: Both Inte Core listed here have been used to process the data. It takes around 10 min to process 12 h of recording sampled at 1000 Hz for both hardwares. Any similar or superior hardware would yield comparable or better performance.  
Intel Core i7-10610U CPU @1.80 GHz 2.30 GHz; RAM: 16 GB Intel Corp https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html Hardware requirment for the software: Both Inte Core listed here have been used to process the data. It takes around 10 min to process 12 h of recording sampled at 1000 Hz for both hardwares. Any similar or superior hardware would yield comparable or better performance.  
LightGBM Microsoft https://lightgbm.readthedocs.io/en/latest/index.html Machine learning-based algorithm that was used to train the software. 
MacBook Pro Apple https://www.apple.com/in/macbook-pro/ Supporting Operating System for the software IntelliSleep Scorer: Windows, Mac, or Linux
Windows Microsoft https://www.microsoft.com/en-in/windows/?r=1 Supporting Operating System for the software IntelliSleep Scorer: Windows, Mac, or Linux

References

  1. Wang, L. A., Kern, R., Yu, E., Choi, S., Pan, J. Q. Intellisleepscorer, a software package with a graphic user interface for automated sleep stage scoring in mice based on a light gradient boosting machine algorithm. Sci Rep. 13 (1), 4275 (2023).
  2. Astori, S., Wimmer, R. D., Luthi, A. Manipulating sleep spindles–expanding views on sleep, memory, and disease. Trends Neurosci. 36 (12), 738-748 (2013).
  3. Fraigne, J. J., Torontali, Z. A., Snow, M. B., Peever, J. H. Rem sleep at its core-circuits, neurotransmitters, and pathophysiology. Front Neurol. 6, 123 (2015).
  4. Huber, R., Deboer, T., Tobler, I. Effects of sleep deprivation on sleep and sleep eeg in three mouse strains: Empirical data and simulations. Brain Res. 857 (1-2), 8-19 (2000).
  5. Brown, R. E., Basheer, R., Mckenna, J. T., Strecker, R. E., Mccarley, R. W. Control of sleep and wakefulness. Physiol Rev. 92 (3), 1087-1187 (2012).
  6. Lacroix, M. M., et al. Improved sleep scoring in mice reveals human-like stages. BioRxiv. 489005, (2018).
  7. Rayan, A., et al. Sleep scoring in rodents: Criteria, automatic approaches and outstanding issues. Eur J Neurosci. 59 (4), 526-553 (2024).
  8. Yamabe, M., et al. Mc-sleepnet: Large-scale sleep stage scoring in mice by deep neural networks. Sci Rep. 9 (1), 15793 (2019).
  9. Katsuki, F., Spratt, T. J., Brown, R. E., Basheer, R., Uygun, D. S. Sleep-deep-learner is taught sleep-wake scoring by the end-user to complete each record in their style. Sleep Adv. 5 (1), zpae022 (2024).
  10. Allocca, G., et al. Validation of ‘somnivore’, a machine learning algorithm for automated scoring and analysis of polysomnography data. Front Neurosci. 13, 207 (2019).
  11. Jha, P. K., Valekunja, U. K., Reddy, A. B. Slumbernet: Deep learning classification of sleep stages using residual neural networks. Sci Rep. 14 (1), 4797 (2024).
  12. Barger, Z., Frye, C. G., Liu, D., Dan, Y., Bouchard, K. E. Robust, automated sleep scoring by a compact neural network with distributional shift correction. PLoS One. 14 (12), e0224642 (2019).
  13. Miladinovic, D., et al. Spindle: End-to-end learning from eeg/emg to extrapolate animal sleep scoring across experimental settings, labs and species. PLoS Comput Biol. 15 (4), e1006968 (2019).
  14. Brodersen, P. J. N., et al. Somnotate: A probabilistic sleep stage classifier for studying vigilance state transitions. PLoS Comput Biol. 20 (1), e1011793 (2024).
  15. Akada, K., et al. A deep learning algorithm for sleep stage scoring in mice based on a multimodal network with fine-tuning technique. Neurosci Res. 173, 99-105 (2021).
  16. Rytkonen, K. M., Zitting, J., Porkka-Heiskanen, T. Automated sleep scoring in rats and mice using the naive Bayes classifier. J Neurosci Methods. 202 (1), 60-64 (2011).
  17. Kam, K., Rapoport, D. M., Parekh, A., Ayappa, I., Varga, A. W. Wavesleepnet: An interpretable deep convolutional neural network for the continuous classification of mouse sleep and wake. J Neurosci Methods. 360, 109224 (2021).
  18. Crisler, S., Morrissey, M. J., Anch, A. M., Barnett, D. W. Sleep-stage scoring in the rat using a support vector machine. J Neurosci Methods. 168 (2), 524-534 (2008).
  19. Ke, G., et al. Lightgbm: A highly efficient gradient boosting decision tree. Neural Information Processing Systems. , (2017).
  20. Lundberg, S. M., Lee, S. I. A unified approach to interpreting model predictions. , 4768-4777 (2017).
This article has been published
Video Coming Soon
Keep me updated:

.

Cite This Article
Zhu, Z., Wang, L. A., Kern, R., Pan, J. Q. IntelliSleepScorer, a Software Package with a Graphic User Interface for Mice Automated Sleep Stage Scoring. J. Vis. Exp. (213), e66950, doi:10.3791/66950 (2024).

View Video