Issue |
Article |
Vol.28 No.3, July 1996 |
Article |
Issue |
User-Centered Design puts the users at the center of the design activity by involving them from the very beginning in the process and by iteratively testing and re-designing the product. In every testing and evaluation phase human error analysis plays an important role. Although it is not possible to design systems in which people do not make errors, much can be done to minimize the incidence of error, to maximize error detection, and to make easier error recovery. However, the qualitative analysis on human error has not received the attention that it deserves. In the paper the main features of the user-centered approach are sketched and a set of guidelines for handling human error is presented. An example drawn from our design experience is reported for each guideline.
User-Centered Design (UCD) is mainly aimed at developing systems that are easy to learn and to use so that they may facilitate users activities (Norman & Draper, 1986). In order to achieve this goal the user-centered design approach differs from the previous methods (e.g., system-centered design), as far as it subverts the direction of development. While former methods worked out from the engineering of the system to eventually arrive at the end-user, the user-centered approach starts from users' activity and comes to system engineering last (Redmond-Pyle & Moore, 1995). UCD involves users from the very beginning in each phase of the design process in order to improve the architecture of the system, trying to remove, as soon as possible, difficulties and drawbacks that can negatively affect the human-computer interaction. To reach this aim the most effective strategy so far known is to evaluate as early as possible the proposed design, to redesign the system and to go iteratively in this process toward the final product by continuously considering the users needs, abilities, and knowledge. In the design process there are some relevant phases, which include identifying users, analyzing tasks, setting usability specifications, that are preliminary to the system development phase and to the prototype testing phase. These latter activities develop in iterative cycles until the final product reaches the designed usability specifications. (Dumas & Redish, 1993; Bertaggia, Montagnini, Novara, & Parlangeli, 1992).
Usability evaluation is thus a fundamental phase of the UCD approach since it is aimed at verifying the attainment of the usability requirements and the adequacy of the system.
In what follows the role of error analysis in usability evaluation is first introduced, then a set of guidelines aimed at supporting a qualitative analysis of human error will be presented.
There are several approaches and techniques for evaluating product usability. They can however be roughly grouped in two different methods: usability testing, which involves real end-users, and heuristic evaluation, based on the support of different experts including end-users.
On the one hand, usability testing aims at evaluating and verifying if the product satisfies users' expectations and whether its usage is easy and pleasant (Bagnara & Stajano, 1987). Generally this kind of testing is carried out within a laboratory environment by setting experiments where user's performance is evaluated by considering a number of quantitative variables (e.g., time needed to learn, time needed to complete an activity, percentage of users who complete successfully the task, number of errors, number of repeated errors, time needed to recover from error, time spent finding help information, numbers of calls to the help, number of time the user expresses frustration or satisfaction). In addition, qualitative factors are also considered (e.g., system pleasantness, acceptability of the system, suitability to the system).
On the other hand, heuristic evaluation implies that several experts use the system separately, taking into account some guidelines or basic principles, singled out to achieve a usable interface. On this purpose, many kind of guidelines have been individualized by experts. For instance, Nielsen & Molich (1990) have summed up their general heuristics in the following nine principles: use simple and natural dialogue, speak the user's language, minimize user memory load, be consistent, provide feedback, provide clearly marked exits, provide shortcuts, provide good error messages, prevent errors.
It is shown by experimental studies (Dumas & Redish, 1993) that usability testing (user-based) finds out mostly global design errors and that, on the contrary, heuristic evaluation (expert-based) finds out the most specific ones. By and large, notwithstanding the peculiarities of these two methods, there are some common characteristics that highlight how the user-centered design approach is based on some main points which are generally appreciated by even unrelated methodologies. Both usability testing and heuristic evaluation: i) consider an experimental approach, ii) are aimed at providing feedback to designers, and iii) focus upon error. In particular, human error is quoted as a main factor in almost all the usability checklists, and most of the design processes include error analysis as a main tool for developing usable and reliable systems (e.g., Preece, 1994; Newman & Lamming, 1995: Shneiderman, 1992).
The focus on error is stressed for two main reasons. First, errors have been deeply investigated from different theoretical perspectives and in a number of environmental settings (Reason, 1990). It is so possible to refer to already collected data and to consolidated theories to interpret and even to classify a given deviation from the correct course of actions. Second, errors are worthwhile indicators signalling all those circumstances in which environmental characteristics do not match with cognitive abilities. As a consequence, from their detection and interpretation may stem the design of more usable systems.
However, with few exceptions (e.g., Norman, 1984; Rasmussen & Vicente, 1987), all the error analyses address mainly the quantitative issue. They cope with the number or the rate of errors that are produced in a given session of interaction, but leave to the design team all the charge for finding a solution for the different types of errors. Moreover, the few approaches that have been devised to address the qualitative issue mainly focus on the occurrence of the error, not on its detection and recovery. In other words, they provide guidelines for reducing the occurrence of errors or for minimizing their impact but rarely for supporting their detection and recovery.
One of the few set of guidelines that also cope with the issue of error detection is that one put forward by Norman (1983) and subsequently updated by Lewis and Norman (1986). This set of principles has, however, some drawbacks that make them only partially applicable to all the types of human errors committed in interacting with computers and with other artifacts. Indeed, the Lewis and Norman's guidelines concern mainly with Slips, i.e. the error characterized by a mismatch between intention and action: the intention is satisfactory, but the actions are not carried out as planned (Norman, 1981). On the contrary, they pay less attention to Mistakes, i.e. the error characterized by wrong intentions, so that the action is properly executed but its result does not fit with the activity.
The main reason for this is that Norman (1984) suggested that there can be essentially two ways to detect an error: either through "self monitoring", when the action is monitored and compared with expectations (if the action was not as planned, then the error would be detected), or through "deleterious outcome" (if the error causes some deleterious results, then it may be detected). Both modes rely upon a feedback mechanism with some monitoring functions that compare what is expected with what has occurred, and upon the ability of the cognitive system to catch a discrepancy between expectations and occurrences (Norman, 1981). However, it is deceptive to assume: i) that the knowledge used as a frame of reference to evaluate the feedback collected along the course of the actions remains stable and constant; and ii) that action's results are assessed only in comparison with formerly well established frames of knowledge. Both assumptions apply quite well in the case of Slips but are not satisfactory in the case of Mistakes. Human knowledge is continuously activated and inhibited so the frame of reference is dynamic in nature, and there is experimental evidence (see, Kahaneman & Miller, 1986) that the incoming information can call by itself the knowledge on which to be evaluated. Some of us, (Rizzo, Ferrante & Bagnara, 1995) proposed a model for error detection process based on the idea that a stimulus can be evaluated with respect to the reference system it evokes after the fact, rather than in relation to pre-established expectations. The suggested process includes four main phases: i) mismatch emergence (i.e., a breakdown in the perception-action loop; it consists in a conflict or clash of knowledge in the working memory); ii) detection (i.e., the awareness that an error occurred; in this case the undesired result is properly attributed to the own activity); iii) identification (i.e., individuation of the source of the breakdown); iv) overcoming of the mismatch (i.e., strategies for either reducing the mismatch, or to get rid of it, or to undo its cause). The four steps do not necessarily occur in all the error detection episodes, instead the contrary is often the case.
In the proposed model a critical role is played by the dynamics of knowledge activation that occur in the working memory. It is wrong to assume that, if a person shows to possess a piece of knowledge in a circumstance, this knowledge should be available under all conditions in which it might be useful. Often, the opposite effect is observable: knowledge accessed in one context remains inert in another. "Thus, the fact that people possess relevant knowledge does not guarantee that this knowledge will be activated when needed. The critical factor is not whether the agent possesses domain knowledge but rather the more stringent criterion that relevant knowledge be accessible under the conditions in which the task is performed" (Woods, 1991, pp. 253). Knowledge activation and its transformation are the crucial points which support the human error handling process. Indeed, on one side, most of the errors depend on the mis-activation -- conscious or subconscious -- of knowledge. On the other side, mismatches consist in conflicts or clashes of knowledge active in the working memory, and the identification of the breakdown is the possible resolution of the conflicts. But, sometimes it is, or it seems, not possible to resolve the conflicts. On the contrary, other times the identification of the source of breakdowns may be even irrelevant, so that the direct overcoming of the mismatch, without paying attention to the source of the mismatch can be the best way for a smooth restarting of the activity.
Thus, the problem of error handling is, at least at a basic level, a problem of supporting the activation of relevant knowledge by modulating the conditions in which tasks are performed.
Following this approach, a set of seven guidelines has been proposed. They are expression of a general strategy aimed at reducing cognitive underspecification (Reason, 1990) and at considering focusing effects by augmenting the redundancy of objects, procedures and feedback. In what follows we report the seven guidelines with examples of how we exploited them to solve specific errors made by users while interacting with applications developed or still under development at the Siena University.
The guidelines have been applied on the basis of: i) the specific errors made by users in testing prototypes of systems or even their mock ups, ii) the external information involved in the mismatch, iii) the knowledge active during the mismatch, iv) the attempted recovery path. Some of the reported examples might not appear as particularly innovative, but they have been selected with the aim to make explicit the guidelines rather then to show clever solutions. In many cases the solution to a given error requires the merging of more guidelines, this pushes toward innovative solutions but makes it hard to use it as an exemplifying case.
This implies that designers should improve the match between actions and their outcome. Thus, actions with different goals should also be different in their pattern. The resulting advantages could be that a) the possibility of activating the execution of another likely wrong action should be reduced; b) a given action should provide unambiguous information on the ongoing activity even before the attainment of the end state.
In our laboratory we are currently developing a multimedia program specifically designed as a tool to support the teaching of history to children with learning disabilities. The adopted interface displays many icons in order to allow the activation of actions such as "go to the next page", "show me the movie", "read the words aloud" and so on. In order to facilitate the understanding of the functions linked to these icons another icon was designed -- the help icon -- that, when clicked, displays all the icons in the middle of the screen. Clicking on these icons the subject obtains information on their functions.
After a first usability evaluation of this program, it was clear that though subjects had spontaneously clicked on the help icon, they had not been able to appreciate that through other clicks on the icons -- displayed in the middle of the screen -- they could have got information on their functions. The action of clicking on an icon was appearing too much connected to the triggering of an action, this being an activity having an end state that is quite different from getting information. Clicks on the icons are generally considered as producing changes in the state of the program, while asking for information is seen as a neutral activity on this respect. As a consequence, some changes have been implemented in order to highlight the difference between these two different activities. In particular, as for many other software products, we have now designed a solution in which clicking on the help icon changes the state of the program. The result is that in the new state it is sufficient to move on an icon, now remaining on its position, to obtain information on the linked function. Another click on the help icon brings the system again to the former state.
Human beings usually exploit multiple and indirect feedback for evaluating the output of their actions. Feedback can be very different in nature since an action may have effects producing noise, light, vibration, proprioceptive stimulation. In addition, our actions may produce more abstract though perceptible results as, for instance, a variation in the frequency of information updating. However, most of the human skilled activities can be performed with a reduced amount of information flow coming from the environment. Thus, it is just in these cases that we must provide an articulated and multimodal feedback in order to allow the early detection of possible mistakes.
The design of acoustic icons can be of special help for improving the level of redundancy of the information provided for a given task. Many interfaces already adopt this solution to facilitate the communication with the computer and to carry out an interaction less prone to error (e.g., Gaver, Smith & O'Shea, 1991). All the systems developed in our laboratory are designed considering acoustic feedback to be provided as a consequence of each action. In particular, in the system for teaching history to children with learning disabilities all the icons are also acoustic, and the sounds have been chosen to be similar, though well distinguishable, for those actions being viewed as similar (for instance "move to the next page" and "move to the former page").
A less explored source of feedback for multi-sensory interaction is the modification of the button of the mouse from a device having just input functions to an input-output device. To provide multi-sensory feedback to the subject, for example, the surface of the button could be modified in such a way to produce specific tactile sensations as a function of the pointing mark position on the graphical interface.
When a forcing function (Norman, 1983) is supported by a message this should provide information on what objects are involved and what values are improper for each object. Much effort has been put in improving the error messages provided by the system when it cannot interpret the commands given by the user. However, in these type of messages only the first information item that does not fit the previous one is usually considered. If the aim is to help the user to activate alternative knowledge, information should be provided on possible and impossible states of all the items involved in the un-interpreted command. In designing the system for scheduling the daily production of an Hot Strip Mill in an Italian Iron and Steel Industry we set up a two-levels error message. The first level presented the message "Sorry, I am not able to accomplish your request" with a generic message concerning one invalid argument, and two buttons "OK" and "Since...". Clicking on the "Since..." button the second level of the error message became visible. On this second level the constraints put forward by previous actions or by the syntax of the command interface language were made explicit. For example, assume that the user is setting up the features and the values of a given Slab (i.e. WPR46) to be processed by the Hot Strip Mill in a sequence, and that he/she makes errors both in defining the features and in selecting the values. Assume that the first error concerns the syntax of the commands and the second the constraints established by his/her own former activity. In this case the second level error message will present the following message with specific content:
This second level error message has been developed taking into account all the variables in the un-accomplished command and not just the first detection of invalid arguments.
People heavily depend on external memory aids. Supporting HCI by mean of an activity log that record time, action, and response for activities that do not leave a clear and permanent trace of their occurrence is a substantial tool for improving memory for actions and for intentions. There are mainly two ways to provide activity log. One concerns the record of time, action, and system modification by labels. The other way for recording activity is sampling configuration of interactions in relation to given events. In this case information is recorded not only for the objects that change in state but it also concerns all the information potentially available during a change of state.
In designing the application for the scheduling of the daily production of the Hot Strip Mill we made use of both. The textual activity log was very basic; it concerned the time, the name of the window on which the command was entered, and the command itself. The second activity log was also simple but very effective. It concerned the record of the screen dump after every main transaction in the planning activity or in the supervision of the process. The system for the daily scheduling involved six main screen configurations, each one made by three to five windows; every time that the operator shifted from a configuration to another the system recorded the picture or the whole screen with the relative time.
The comparison between outputs is a useful source of information for action evaluation. Often the completion of a task depends upon evaluations of partial results, thus outputs that can be related among them should be presented in a way allowing easy comparisons. This implies that a) differences must be stressed: people can perceive differences among states but may be unable to single out what the crucial differences are or what the causes are; b) classifications must be facilitated: a new output may be mis-classified or an old output may be classified as a new one; c) recovery from mis-classifications must be allowed: when the global result relies on classifications of partial outputs, the comparison of these classifications and their correction must be allowed.To better clarify this suggestion, we can refer to the work that we are currently doing on a paper and pencil questionnaire (Task Workload Index), proposed by NASA, for the assessment of the workload experienced by subjects performing different tasks. This questionnaire considers six different variables -- mental demand, physical demand, temporal demand, performance, effort, frustration -- that must be evaluated on a one hundred point scale. These variables must also be evaluated by pairwise comparisons in order to define their relative importance (weights). In our laboratory we are developing a hypertext version of this questionnaire. In this program the six variables are displayed one at a time. When the subjects evaluates one variable, the next variable is automatically displayed. Thus, to give to the subjects the possibility to carry out comparisons, after each variable evaluation on the one hundred point scale, the name of the variable and the relative evaluation are displayed at the top of the screen. While performing this phase the subject can always drag the pointing mark on the score of a formerly evaluated variable. This in fact is shown as an active field and the subject can input a new value at any time. The subjects move to the next section of the questionnaire only when all the six variable names are shown on the screen and the relative evaluations are considered the right ones. Similarly, the variable pairs are all together displayed on the screen, a subject can put a mark on the variable name that he/she considers as the most relevant one, inside that pair, in determining the workload experienced while performing the task. These judgements can be continuously changed until the subject, after all pairs have been evaluated, decides to go on to the next section.
Feedback is crucial not only to allow good execution of the desired actions but also to decide about what action is to be performed or state of world pursued. To this aim, it is important that the user can manipulate the display of the results since it is not possible to know in advance which are the intentions of the users or their idiosyncratic way to evaluate outcomes (so far the tools developed for catching user's intention have not produced sensible results, i.e., plan recognition). We suggest three possible ways by which result presentation can be improved: i) Exploiting layout. Space is a formidable artifact that provides visibility, perceptual cues and organizing structure (see Norman, 1993). ii) Exaggerating the differences. Variation in the optical flow are adequate stimuli for our perceptual system, and variation in the proportion and relationship among objects are adequate stimuli for our categorical processes. Both can be used for producing salience in the system responses to the user. iii) Stressing the aspects relevant to the just performed task. As reported above only a subset of the available information is used for action control and result evaluation, so the information strictly related to the task should be privileged in result presentation.
The best strategy for coping with these principles is to provide the users with several possibilities for each of these aspects (layout management, difference exaggeration and stress on the aspect relevant to the task) and allow them to manipulate the resulting presentation according to their needs and wishes.
For example, in a database developed for reporting malfunctions, including human error, in chemical plants, the operator after describing the event in a free text field has to classify the event, or part of it, according to a taxonomy of the conditions of malfunctions. The taxonomy includes more than 100 items organized in not mutually exclusive classes, plus the list of all the components of the plant. The taxonomy is organized in windows that respect the classes of the taxonomy, the windows can be free distributed on the screen and the classification is made with drag and drop utilities. During the audit session, when two or more events are selected, the related conditions of malfunctions are reported and common conditions are highlighted. Vice versa the operator can highlight specific condition for an event or for any subset both of the event or of the conditions.
To avoid conflicts with guidelines that claim consistency or modelessness, and to avoid collateral effects, like focusing always on the same features of the output even when they are not relevant any more, it should be possible to make clear which are the selecting principles.
The best way to support error identification and error diagnosis after the occurrence of a mismatch is to give specific answers to the user. In some software packages (e.g. Mathematica) when the user asks the system to do something that it is not able to understand, the system then responds with a forcing function signalled by a Beep. The user can then open a constrained dialogue session by calling the "Why the Beep" command. As a consequence, the system presents an error message where it provides information on the mismatch encountered by the system in interpreting the previous request of the user. Further help is also available to the user by pushing a Help button which would produce template examples of the correct commands that can be formulated with the key words used in the un-interpreted command.
A similar strategy may be adopted for each output of the system that could be inadequately evaluated by the user. For example, the user could have the possibility to ask "Why the result?" in all those cases in which he/she is unable to understand why a given output has been produced. This could be of particular help when the user is interacting with a hypertext. In this case, moving from a page to another or from an environment to another through the activation of some buttons may often produce a feeling of confusion. The possibility of obtaining explanations on the links between different subsystems should then be provided to the user.
A more general approach is to provide detailed explanations for all those results obtained through a long sequence of action. For instance in our implementation of the Task Workload Index questionnaire (see guideline n\xa1 5) the global result, that is the score expressing the workload experienced by the subject on a task, is obtained by summing up the evaluations provided for each variable multiplied the weight given to that variable. This score is then divided by 15, the sum of the weights. The computation necessary to obtain the global result is summarized on a matrix in which all the steps are shown. This matrix is displayed at the end of the process and is intended to allow the subject a thorough understanding of the evaluation process.
System evaluation and usability testing are fundamental phases of the User-Centered Design approach. In these phases and in many of the associated methods, error analysis is considered an important activity, even if not adequately put into practice. A set of seven guidelines for managing error has been presented. The use of these guidelines in the error analysis, which included also the nature of the mismatch and the recovery path, led us toward solutions that eliminated the errors originally made by the users or reduced drastically their impact. Moreover, the qualitative analysis of human error required a smaller number of users testing the systems, with important benefits for the costs of the evaluation process.
Multimedia Communication Laboratory
University of Siena, Via del Giglio, 14
53100 Siena, Italy
Email:
rizzo@gauss.uniba.it
parlangeli@gauss.uniba.it
marchigiani@gauss.uniba.it
bagnara@gauss.uniba.it
Issue |
Article |
Vol.28 No.3, July 1996 |
Article |
Issue |