The presented protocol integrates various evaluation methods and demonstrates a method to evaluate the keyboard design on smartphones. Pairs matched by English characters are proposed as the input material, and the transition time between two keys is used as the dependent variable.
Keyboard input has played an essential role in human-computer interaction with a vast user base, and the keyboard design has always been one of the fundamental objects of studies on smart devices. With the development of screen technology, more precise data and indicators could be collected by smartphones to in-depth evaluate the keyboard design. The enlargement of the phone screen has led to unsatisfactory input experience and finger pain, especially for one-handed input. The input efficiency and comfort have attracted the attention of researchers and designers, and the curved keyboard with size-adjustable buttons, which roughly accorded with the physiological structure of thumbs, was proposed to optimize the one-handed usage on large-screen smartphones. However, its real effects remained ambiguous. Therefore, this protocol demonstrated a general and summarized method to evaluate the effect of curved QWERTY keyboard design on a 5-inch smartphone through a self-developed software with detailed variables, including objective behavioral data, subjective feedback, and the coordinate data of each touchpoint. There is sufficient existing literature on evaluating virtual keyboards; however, only a few of them systematically summarized and took reflection on the evaluation methods and processes. Therefore, this protocol fills in the gap and presents a process and method of the systematic evaluation of keyboard design with available codes for analysis and visualization. It needs no additional or expensive equipment and is easy to conduct and operate. In addition, the protocol also helps to get potential reasons for the disadvantages of the design and enlightens the optimization of designs. In conclusion, this protocol with the open-source resources could not only be an in-class demonstrative experiment to inspire the novice to start their studies but also contributes to improving the user experience and the revenue of input method editor companies.
Keyboard input is the mainstream method of the human-smartphone interaction1,2, and with the penetration of smartphones, keyboard input gets billions of users. In 2019, the global smartphone penetration rate had reached 41.5%3, while the United States, with the highest penetration, had come up to 79.1%4. Up to the first quarter of 2020, the Sogou mobile keyboard had about 480 million daily active users5. Up to May 6, 2020, the Google Gboard had been downloaded more than 1 billion times6.
Unsatisfactory keyboard input experience increases with the enlargement of the phone screen. Although the enlarged screen aimed to improve the viewing experience, it has changed the gravity, size, and weight of smartphones, causing users to change holding posture repeatedly to reach remote areas (e.g., button A and Q for right-handed users), thus leading to input inefficiency. The stretch of muscle may cause users to suffer from musculoskeletal disorders, hand pains, and different types of disease (e.g., carpal tunnel syndrome, thumb osteoarthritis, and thumb tenosynovitis7,8,9,10). Users who prefer one-handed usage are under worse conditions11,12.
Therefore, the evaluation and optimization of keyboard design have become hot topics of psychological, technical, and ergonomic research. Variable keyboard designs and concepts have constantly been proposed by input method editor (IME) companies and researchers to optimize input experience and efficiency, including layout-changed and character-reordered keyboards: Microsoft WordFlow Keyboard13, Functional Button Area in Glory of Kings14, IJQWERTY15, and Quasi-QWERTY16.
Existing evaluation methods of keyboard design vary from researcher to researcher except for several highly accepted indicators, and more accurate indicators are proposed. However, with a variety of indicators, there is not a summarized and systematic protocol provided to demonstrate the process of evaluating and analyzing the keyboard design. Fitts’ Law17 and its extended version FFitts Law18, which described human-computer interaction, were widely adopted to evaluate keyboard performance19,20,21,22. Moreover, the functional area of the thumb was proposed to improve keyboard design, and it described a curved motion area for the thumb to comfortably complete the input task23. Based on these theories, indicators including word per minute, word error rate, and subjective feedback (perceived usability, perceived performance, perceived speed, subjective workload, perceived exertion and pain, and intent to use, etc.), which were highly adopted, were partially used in previous studies24,25,26,27,28,29 except for modeling and simulation methods. In addition, the fitted ellipse of touchpoints on each button and its offset30,31 were used in recent years to investigate the accurate performance of inputting events. Also, the galvanic skin response, heart rate, electromyographic activity, hand gesture, and body movement32,33,34,35 were adopted to directly or indirectly evaluate muscle fatigue, comfort, and satisfaction of the users. However, these various methods lack reflection on the appropriateness of the indicators used, and a novice researcher may be confused to select the appropriate indicators for his or her research.
The research about keyboard design is also easy to be conducted, operated, and analyzed. With the boom of screen technology, more behavioral data could be easily collected to evaluate the keyboard design in-depth (e.g., the transition time between two keys and the coordinate data of each touchpoint). Based on the mentioned data, researchers could precisely explore the details of keyboard design and analyze its disadvantages and advantages. When compared with other human-computer interaction research, the research of keyboard design on portable smartphones also has high application value for its vast user base with no expensive equipment, complicated materials, or huge laboratory space needed. The questionnaires, scales, and Python script about the research are open-source and easy to access.
The purpose of this research is to summarize the previous methods to demonstrate a systematic, precise, and general protocol to evaluate and analyze the keyboard design on smartphones. The exemplar experiment and results aim to show whether the curved QWERTY keyboard with size-adjustable buttons could optimize the input experience of one-handed input on a 5-inch smartphone when compared with traditional QWERTY keyboard and share the visualization method and Python script of data analysis.
The study was conducted in accordance with the ethical principle and was approved by the Ethics Committee of Tsinghua University. Figure 1 shows the process of evaluating the keyboard design of smartphones.
Figure 1: General process of conducting a keyboard experiment and evaluating the keyboard design. Please click here to view a larger version of this figure.
1. Preparation
Figure 2: The measurement of the hand. Please click here to view a larger version of this figure.
2. Procedure
3. Data analysis
The representative study is mainly following the mentioned protocol. The study adopts a 2 (Keyboard layout: Curved QWERTY vs. Traditional QWERTY) × 2 (Button size: large, 6.3 mm × 9 mm vs. small, 4.9 mm × 7 mm) within-subject design to evaluate whether the curved QWERTY could improve the input efficiency and comfort when compared with the traditional QWERTY in different sizes of buttons by the character pair input task through our self-developed software (Figure 3). This study has not adopted the expensive physiological detector equipment or motion capture system, and the data analysis did not contain the modeling or simulation.
Figure 3: The interface of the traditional QWERTY keyboard and the curved QWERTY keyboard software.
(A) Traditional QWERTY keyboard with large button size (letter key size: 6.3 mm × 9 mm). (B) Curved QWERTY keyboard with large button size (letter key size: 6.3 mm × 9 mm). (C) Traditional QWERTY keyboard with small button size (letter key size: 4.9 mm × 7 mm). (D) Curved QWERTY keyboard with small button size (letter key size: 4.9 mm × 7 mm). The aspect ratio of each letter key is 7:10, and the width of each functional key (Delete, Space, Enter) is twice as that of the letter key. Delete and Space are unworked. Participants click the Enter key to shift to the next trial. Please click here to view a larger version of this figure.
A total of 24 right-handed healthy students from Tsinghua University were involved in this study (12 females, M = 22.46 years, SD = 3.04 years). For them, the length of right hand (M = 17.98 cm SD = 1.20 cm), the length of right thumb (M = 6.00 cm, SD = 0.68 cm), and the circumference of right thumb (M = 5.14 cm, SD = 0.52 cm) were measured. The sample size was calculated by G*Power 3.1.9.2 (effect size f = 0.25, α = 0.05, power = 0.80, correlation among repeated measures = 0.5). The experiment smartphone is a 5-inch smartphone (weight 138 g, screen size 5.0 inch, ppi 294, px 1280 × 720, phone size 143.5 × 69.9 × 7.6 mm).
Input performance (transition time between two keys, word error rate), subjective feedback, and fitted ellipse of each button were collected and analyzed by repeated-measures ANOVA. Transition time between two keys instead of word per minute is used in this study because the input material is the character pairs, and the transition time between two keys could evaluate the transition touch event more precisely. The representative results are as follows (Table 1).
Keyboard layout | Button size | Keyboard layout × Button size | ||||||||
F | p | F | p | F | p | |||||
Word error rate | 48.90 | <.001*** | 0.68 | 30.57 | <.001*** | 0.57 | 2.63 | 0.12 | 0.10 | |
Transition time between two keys | 10.19 | .004** | 0.31 | 43.57 | <.001*** | 0.66 | 12.75 | .002** | 0.36 | |
Perceived exertion and pain | 2.33 | 0.14 | 0.09 | 1.36 | 0.26 | 0.06 | 0.28 | 0.60 | 0.01 | |
Intent to use | 7.41 | .012* | 0.24 | 3.62 | 0.07 | 0.14 | 0.63 | 0.44 | 0.03 | |
Perceived accuracy | 1.32 | 0.26 | 0.54 | 2.94 | 0.10 | 0.11 | 0.69 | 0.42 | 0.03 | |
Perceived speed | 0.56 | 0.47 | 0.02 | 0.98 | 0.33 | 0.04 | 0.25 | 0.62 | 0.01 | |
Perceived usability | 0.63 | 0.44 | 0.03 | 5.48 | .028* | 0.19 | 0.03 | 0.87 | 0.001 | |
Subjective workload | Mental | 19.30 | <.001*** | 0.46 | 8.88 | .007** | 0.28 | 0.01 | 0.91 | 0.001 |
Physical | 2.41 | 0.13 | 0.10 | 5.55 | .027* | 0.19 | 0.07 | 0.78 | 0.003 | |
Time | 0.02 | 0.9 | 0.001 | 10.26 | .004** | 0.31 | 0.37 | 0.55 | 0.02 | |
Actuación | 11.51 | .003** | 0.33 | 12.25 | .002** | 0.35 | 0.02 | 0.90 | 0.001 | |
Effort | 4.66 | .042* | 0.17 | 16.33 | .001** | 0.42 | 0.13 | 0.72 | 0.006 | |
Frustration | 9.32 | .006** | 0.29 | 8.87 | .007** | 0.28 | 2.11 | 0.16 | 0.08 | |
Area of fitted ellipse | 90.00 | <.001*** | 0.78 | 1368.78 | <.001*** | 0.98 | 31.99 | <.001*** | 0.56 | |
Offset of fitted ellipse | X-direction | 10.94 | .003** | 0.30 | 1.4 | 0.25 | 0.05 | 6.08 | 0.21 | 0.19 |
Y-direction | 23.49 | <.001*** | 0.48 | 0.48 | 0.50 | 0.02 | 13.74 | .001** | 0.36 |
Table 1: Statistical analysis of the input performance, subjective feedback, and fitted ellipse of each button. Item with * means p < 0.05, item with ** means p < 0.01, and item with *** means p < 0.001.
In the input performance, the interaction between keyboard layout and button size is only significant in the transition time between two keys (Figure 4), and it shows that in the curved QWERTY, the transition time between two keys of small button size was significantly longer than that of large button size (p < 0.001). The main effect of keyboard layout is significant in both word error rate (Figure 5) and transition time between two keys, and it indicates that these of the traditional QWERTY are significantly lower than those of the curved QWERTY. The main effect of button size is significant in both word error rate and transition time between two keys, and it indicates that these of the large button size are significantly lower than those of the small button size. No other significant result is found.
Figure 4: The 3D bar graph is the visualization of transition time between two keys (the left is the first character while the right is the second character) in four keyboards.
The height of each bar represents the value of transition time. The gradient colors (blue, green, yellow, and red) are used to show the situation of numerical distribution (see Supplementary Coding File 1). Please click here to view a larger version of this figure.
Figure 5: The word error rate of each keyboard. The error bars represent 95% CI. Please click here to view a larger version of this figure.
In the subjective feedback (Figure 6 and Figure 7), all the interactions between the keyboard layout and button size are not significant. The main effect of keyboard layout is significant in intent to use and subjective workload (mental, performance, effort, and frustration), and it shows that participants perceive less subjective workload (the above four facets) and have more likelihood to use the curved QWERTY when compared with the traditional QWERTY. The main effect of button size is significant in perceived usability and all facets of subjective workload, and it indicates that participants perceive less subjective workload and higher usability in the large button size when compared with small button size. No other significant result is found.
Figure 6: The perceived exertion and pain, intent to use (left Y-axis), perceived accuracy, perceived, and perceived usability (right Y-axis) of each keyboard.
The high score of perceived exertion and pain indicates the unsatisfactory experience, while the other indicators show the opposite. The error bars represent 95% CI. Please click here to view a larger version of this figure.
Figure 7: The six dimensions of subjective workload.
The error bars represent 95% CI. Please click here to view a larger version of this figure.
In the area of the fitted ellipse (Figure 8), the interaction between keyboard layout and button size is significant, and it shows that for both small and large button size, the area of the traditional QWERTY is larger than that of the curved QWERTY (p < 0.001), while for both keyboard layouts, the area of the small button is smaller than that of the large button (p < 0.001). The main effect of button size and keyboard layout is significant, and it indicates that those areas of the traditional QWERTY and the large button are larger than those of the curved QWERTY and the small button, respectively. No other significant result is found.
Figure 8: The fitted ellipses (95% CI) of four keyboards.
They are drawn by fitting the pixel positions of the touchpoints in four keyboards. The coordinate of the center of the ellipse is the average value of all touchpoints on each button (see Supplementary Coding File 2). Please click here to view a larger version of this figure.
In the offset of the fitted ellipse (Figure 9 and Figure 10), the interaction between keyboard layout and button size is only significant in the offset in the y-direction, and it shows that in the curved QWERTY, the offset in the y-direction of the small button is significantly shorter than that of the big button (p < 0.001), while in both sizes of the button, the offset in the y-direction of the curved QWERTY is significantly shorter than that of the traditional QWERTY. The main effect of keyboard layout is significant in both x- and y-directions, and it indicates that the offset in the y-direction of the curved QWERTY is significantly shorter than that of the traditional QWERTY. No other significant result is found.
Figure 9: The offset of fitted ellipses in the x-direction.
The length of the arrow, which is enlarged 1.2 times in proportion in the figure because of the visualization, represents the value of the offset. And different colors visualize the value of standard deviation (±) from the average offset of each button to the offset in the x-direction. The value less than -1σ is green, and the value more than +1σ is red, while the value between -1σ and +1σ is orange (see Supplementary Coding File 3). Please click here to view a larger version of this figure.
Figure 10: The offset of fitted ellipses in the y-direction.
The length of the arrow, which is enlarged 1.2 times in proportion in the figure because of the visualization, represents the value of the offset. And different colors visualize the value of standard deviation (±) from the average offset of each button to the offset in the y-direction. The value less than -1σ is green, and the value more than +1σ is red, while the value between -1σ and +1σ is orange (see Coding File 3, and the script of the y-direction is familiar to that of the x-direction). Please click here to view a larger version of this figure.
The practice effect is tested using the t-test to compare the input performance (word error rate and transition time between two keys) between the first half and the second half of the character pairs. As for error rate, there is no significant difference between the two groups of character pairs in the curved QWERTY with small button size, t(46) = 2.03, p = 0.05, the curved QWERTY with big button size, t(46) = -0.47, p = 0.64, the traditional QWERTY with big button size, t(46) = 0.31, p = 0.76, and the traditional QWERTY with small button size, t(46) = 0.05, p = 0.97. As for transition time between two keys, there is no significant difference between the two groups of character pairs in the curved QWERTY with big button size, t(46) = 0.33, p = 0.74, the curved QWERTY with small button size, t(46) = 0.22, p = 0.83, the traditional QWERTY with big button size t(46) = 0.66, p = 0.51, and the traditional QWERTY with small button size, t(46) = 0.09, p = 0.93. The results indicate that there is no practice effect or fatigue effect during the main process of the input task, and participants have reached and kept the highest effort for each keyboard. The absolute value of the highest effort for different keyboards may be different because the highest effort only indicates that they have been familiar with the keyboard by 100 percent.
This representative study indicates that on the 5-inch smartphone, the curved QWERTY is worse than the traditional QWERTY, and the big button size is better than the small button size. In this representative study, the best keyboard is the traditional QWERTY keyboard with large button size, while the worst keyboard is the curved QWERTY keyboard with small button size. All the results have not been affected by the practice effect and fatigue effect. The word error rate and the transition time between two keys indicate that the curved QWERTY design increases the reaction time of participants between two characters and may augment the recognition workload to characters because of the position of keys and mental rotation, thus leading to unsatisfactory input performance, and the results are the same as the size-reduced button size (QWERTY keyboard with small button size) on a 5-inch smartphone. Although most indicators and dimensions of the subjective feedback are not significant, the subjective workload shows the higher perceived workload of the QWERTY keyboard with the size-reduced button and the curved QWERTY keyboard. However, from the analysis of fitted ellipses, the results, and Figure 8 and Figure 10 show that the curved QWERTY has less offset and its touchpoints are less dispersive, and its offset is mainly toward the upper-left corner for right-handed usage. The results indicate that the curved QWERTY design could be optimized by adjusting the curvature of the keyboard, adding the function of automatic correction, and moderating the size of the buttons. In addition, from the Figure 8 and Figure 10, a curved T9 keyboard, which takes the place of "R, T, Y, U, I, O, D, F, G, H, J, K, X, C, V, B, N, and M" of the curved QWERTY keyboard, may be a potential optimized keyboard, i.e., each key of the curved T9 keyboard takes the place of two letter keys of the curved QWERTY.
Therefore, this representative study only roughly demonstrates the protocol of the evaluation of keyboard design with open-source Python scripts, and the analysis and optimization method could be discussed in-depth based on the research purpose of researchers in the future studies.
Supplementary Coding File 1: 3D plots of the transition time between two keys. Please click here to download this file.
Supplementary Coding File 2: The fitted ellipse and its area. Please click here to download this file.
Supplementary Coding File 3: The offset of the fitted ellipse. Please click here to download this file.
In this study, based on the development of screen technology, we presented a summarized and general protocol of keyboard design evaluation to assess the keyboard design systematically and precisely. Existing indicators and methods from previous studies, pairs matched by English characters, and transition time between two keys are integrated and modified to generate an effective protocol.
Several critical points need to be noticed in this protocol. The selection of variables and indicators is essential because they decide the perspective of analysis, and it could be used to build the evaluation model in the later stage of the keyboard design evaluation experiment. Except for the objective variables, the subjective variables should also be carefully considered in the experimental design from multiple dimensions, since the subjective data plays a vital role in helping us improve user experience. Coordinate data can be optionally collected and calculated in the protocol through the self-developed application and Python scripts, e.g., fitted ellipse (95% CI) of touchpoints on each button and the offset from the center of the fitted ellipse to the target center of each button. The analysis and visualization of the fitted ellipse may enlighten the optimization method of the keyboard design. In addition, although physiological measurement and movement measurement, which depend on the wearable equipment, are also optional, they could indeed help to explore the inexpressible experience of keyboard users in-depth.
One crucial step in the procedure of keyboard study is asking participants to wash their hands and clearing the screen before the experiment (the same as the wearable detectors), since hand grease and sweat may affect the sensitivity of the screen sensory, thus influencing the results. The physical data (hand length, finger length, and thumb circumference) of the participants also needs to be measured or reported because the physical differences between participants may affect the experiment results and the reproducibility as well.
The protocol also cannot escape from the following limitations. All the input materials proposed in this study may mainly concentrate on the language of English without the consideration of other languages. In addition, self-developing a keyboard software to collect the experiment data may be suggested in this protocol, instead of using the traditional manual collection and measurement method. Because a self-developed software could collect and calculate more precise and attributional indicators and help to provide a clear optimization suggestion about the keyboard design rather than only to conclude the effect of the current keyboard design under experimental conditions. Besides, other expensive devices or equipment adopted by previous studies have not been included in the representative results, such as the portable wireless physiological detector or motion capture system, and researchers should choose their specific experimental devices based on their research problem and hypothesis. Finally, followers of the New Statistics or Bayesian enthusiasts could try to adopt more statistical methods to analyze and evaluate the keyboard design.
For future applications and directions, this protocol can be adopted in the keyboard design evaluation process on other smart devices. In addition to smartphones, more and more intelligent devices have gained popularity, for instance, wearable smartwatches and bracelets (iWatch), tablet PC (iPad), and virtual reality devices (VR glasses). This protocol can be used to evaluate various keyboard designs on these devices and helps optimizations (indicators and processes may be slightly adjusted). In this sense, this study opens up new opportunities to re-examine the benefits and importance of keyboard design evaluation study in the touch screen of smart devices. Therefore, it provides an inexpensive and easy-to-conduct research method with the open-source resources in the field of human-computer interaction, computer science, and psychology, thus making contributions to helping the novice researchers and students to start their studies or being an in-class demonstrative experiment.
The authors have nothing to disclose.
This research is supported by the Tsinghua University Initiative Scientific Research Program (Ergonomic design of curved keyboard on smart devices). The authors appreciate Tianyu Liu for his kind suggestions and coding assistance on figures.
Changxiang 6S smartphone | Huawei | Smartphone used in the examplar study | |
Curved QWERTY keyboard software | Tsinghua University | Developed by authors | |
SPSS software | IBM | Data analysis software | |
G*Power software | Heinrich-Heine-Universität Düsseldorf | Sample size calculation | |
E4 portable wireless wristband | Empatica | Recording galvanic skin response and heart rate | |
Arqus | Qualysis | Motion capture camera platform | |
Passive marker | Qualysis | Appropriate sizes: 2.5 mm, 4 mm, and 6.5 mm | |
Trigno sEMG | Delsys | Recording electromyographic activity | |
Visual Studio Code | Microsoft | Python editor |