Keith S. Karn, Thomas J. Perry, Marc J. Krolczyk
Usability studies are usually conducted in a compressed time scale (measured in hours) compared with a user's eventual experience with a product (often measured in years). For this reason, typical usability evaluations focus on success during initial interactions with a product (see for example Dumas & Redish, 1994 and Nielsen & Mack, 1994). Familiarity with similar products may predict success on initial use of a new product. Are what we call "intuitive" user interfaces really just familiar user interfaces? This familiarity effect can often swamp the usability differences between design alternatives. If usability evaluations continue to emphasize initial success with a product we may inhibit innovation in user interface design. There is a tension between initial usability (measured by success at first encounter) and efficiency of skilled performance. Initial learning of a product's user interface often results in quite rapid increases in efficiency of use whether it might be with a computer game or a business software application. The time spent with a product once up on a learning plateau typically greatly exceeds the time on the steeper part of the learning curve. Thus, traditional usability evaluation techniques that emphasize initial product use, may fail to capture usability problems that affect users the most. We do not question the importance of testing a products learnability, (see Usability Sciences Corporation, 1994) but feel that the human-computer interaction community should ensure usability throughout the product life-cycle. A narrow focus on initial usability elevates learnability above efficiency once up the learning curve. While this approach may be appropriate for products targeted primarily for casual or occasional users, it fails to capture the usability issues associated with power users (those with significant experience, training, or a professional orientation to their interaction with the product). Initial interactions with a product may affect the purchase decision, but usability over the longer term may determine whether a user will become a truly satisfied customer.
The goals of the workshop were to exchange and develop techniques to address usability testing of:
- New features and incremental design improvements to an existing product with an established, highly-experienced, power user population. Example: a new version of a computer aided design program.
- Innovative or novel products replacing older technologies used by an established power user population. Example: replacing a command line interface with a graphical user interface.
- Entirely new technologies that may include a new and unfamiliar user interface and have no established user population. Example: the first Web TV appliance.
Thirteen highly experienced user interaction designers and usability engineers participated in the workshop (see Figure 2). In extensive pre-workshop e-mail communication, we shared our views of the critical issues in testing for power usability and each described in detail one real-world testing example.
From our electronic correspondence we learned that the products we design and test differ greatly, ranging from hand-held data entry devices to fighter aircraft. We come from many countries and work in different settings; from design groups in multinational corporations to consulting firms to academia. Despite these differences, we united with a desire to improve the state of the art in design and usability testing of products with an experienced user population and mission-critical requirements for error-free performance and productivity.
The workshop began with each participant presenting a real-world example of testing for power usability. These presentations included the nature of the task and work environment, description of the product (its revolutionary versus evolutionary nature, i.e., preexistence of a power user population), techniques used for usability testing, and lessons learned. During the presentations, all participants recorded brief descriptions of causes and potential solutions related to the problems of testing for power usability. Participants wrote these causes and solutions with large markers on color coded, self adhesive paper to facilitate later posting for group interaction.
From the presentations we agreed on a problem statement:
Power usability is not adequately addressed in product testing.
We then used an adaptation of the Ishikawa Root Cause Analysis technique (Ishikawa, 1982; also see Kimbler, 1995) to dissect the underlying causes of this problem and identify potential solutions. This analysis technique (also known as a Fish Bone Diagram) is a problem solving tool used in various quality management programs. Our adaptation elaborated on the fish bone analysis technique (see Figure 1) and included fish-related themes for the four steps in the analysis method:
- Fish Bone: We posted the brief descriptions of the underlying causes of the stated problem that we had recorded during the presentations. We roughly clustered these causes on bones of a large model fish skeleton.
- Name that Bone: With additional discussions and critiques we refined the arrangement and clustering of the causes and provided a name for each bone. This provided succinct statements of the underlying causes of the problem.
- Filet-O-Fish: We posted solutions on the fish bone diagram; aligning these with associated problems.
- Name that Fish: We reflected on the fish and identified conclusions and questions for further work
The activities generated the following categories (bones) of problems and solutions:
- Revolutionary Products (how to test when tasks are uncertain and no users exist yet)
- Barriers to Customer Access (for customer research and testing)
- Resistance to Change (users and developers)
- Resources and Schedule (insufficient time and funding to do the job right)
- Usability Goals (specifying them; in a meaningful / testable way)
- Identifying Scenarios & Tasks (what do highly experienced users actually do)
- Simulating Scenarios & Tasks in Tests (how to duplicate the context of real work in a lab setting)
- Identifying the Power User Profile (defining critical attributes)
- Selecting Users for Testing (how to ensure that test subjects match the power user profile)
- Designing Tests of Power Usability (subject training, metrics, data analysis, etc.)
The appendix presents the complete list of problems and solutions generated by the fishbone analysis. Note that the list is probably incomplete and the solutions may not be appropriate to every situation.
The Name That Fish activity produced the following summary thoughts.
During the course of the discussion, workshop participants identified a partial set of power user attributes.
- Experienced in the task domain
- Experienced (and usually trained) with the product
- Have a professional orientation to the product, typically in an occupational context
- Emphasize efficiency and minimizing errors; frequently skilled at problem solving
- Frequently in a state of flow -- at one with the machine, able to focus on task goal without thinking about the specifics of the tool.
In addition, we listed some system and environmental factors that often affect power users:
- Context of use can be complex (e.g., complex task scenarios, multiple users, concurrent operation of multiple systems, collaborative work)
- For some systems, power users are the only intended user group; but for other systems, they are a minority whose needs may conflict with a larger group of basic users.
- Usability is often mission critical for systems designed for power users.
All the issues identified are common to any usability test. However, there are differences in emphasis that come with assessing skilled performance
As with most areas of user-centered design, power usability must be considered in the initial phases of the design to have maximum impact. Here are some issues to consider in this regard.
- Get the best data possible about the users, their tasks, flow behavior and the surrounding context
- Test the current release or competitive product to identify problems before design starts
- Set meaningful and measurable goals for power usability
- Involve users in design (task domain experts, user review panels, focus groups, etc.)
- Turn designers into users (train on prior or similar products)
- Test early and often against the goals, making appropriate use of many testing tools
- Validate user profiles, use scenarios and workflows at each opportunity
- Tasks and subjects used in testing must be representative of real work and real users
Of course, no amount of laboratory testing will reveal all the usability issues with a product. To anticipate this, plan and budget for data collection after the product is introduced. Consider building data capture capabilities into the product for data collection in the user's environment. Focus on identifying long-term use problems to be fixed in the next version of the product. Remember that some problems may be architectural in nature, requiring more major changes and several releases to fix.
During discussions, we identified several questions for further inquiry:
- Can the same product satisfy requirements of both power and non-power users?
- Is power usability more "contextual" than normal usability or is the context more complex?
- Is there a Power User "personality?"
- When faced by a problem, do power users work collaboratively (ask for help) or try to solve the problem by themselves?
- Do power users have a more complete, accurate mental model of a product?
- Can "training wheels" facilitate novices becoming power users over time?
- Operators working in a flow state often take sub-optimal paths. Does "good design" optimize efficiency in these situations?
- Is "good design" adequate for skilled performance in unpredictable situations?
- Can power use of a product be simulated when experienced users are not available -- either before power users exist or when they cannot participate? The emphasis is on how we should train users prior to participation in testing.
Participants in the workshop agreed that there should be broad interest in a workshop on designing for power usability at a future CHI meeting.
Dumas, J., Redish, J. (1994) A Practical Guide to Usability Testing. Ablex, Norwood, NJ
Ishikawa, K. (1982) Guide to Quality Control. Asian Productivity Organization, Tokyo
Kimbler, D. (1995) Cause and effect diagram. http://deming.eng.clemson.edu/pub/tutorials/qctools/cedm.htm
Neilson, J., Mack, R. (1994) Usability Inspection Methods. John Wiley & Son, NY
Shneiderman, B. (1990) User interface races: Sporting competition for power users. In Art of Human-Computer Interface Design. B. Laurel (ed.) Addison Wesley, Reading MA
Usability Sciences Corporation (1994) Windows 3.1 and Windows 95 Quantification of Learning Time and Productivity. http://www.microsoft.com/windows/product/usability.htm
All workshop participants (Figure 2) contributed to the content of this article. The workshop organizers and editors of this article were:
Keith Karn is a senior user interface designer at Xerox Corporation. His background is in human-machine interaction design and usability evaluation of airplane cockpits and office products. He has an M.S. in industrial engineering and Ph.D. in experimental psychology.
Tom Perry is a senior user interface designer at Xerox Corporation. He is a Certified Professional Ergonomist with an M.A. in Human Factors Psychology from California State University, Northridge.
Marc Krolczyk is a user interface designer at Xerox Corporation. He has worked in the field of graphic design and received his M.A.H. from SUNY at Buffalo in visual communication.
Keith Karn
Industrial Design / Human Interface Department
Xerox Corporation
Mail Stop 0801-10C
1350 Jefferson Road
Rochester, NY 14623 USA
E-mail: Keith_Karn@mc.xerox.com
Telephone: 716-427-1561
Tom Perry
(same address as above)
E-mail: Thomas_ J_Perry@mc.xerox.com
Telephone: 716-422-5524
Marc Krolczyk
(address same as above)
E-mail: Marc_Krolczyk@mc.xerox.com
Telephone: 716-427-1879
These are the results for the root-cause analysis: a list of causes
underlying the problem of inadequate testing for power usability
and related potential solutions. The headings evolved as the
bones of the fishbone diagram (see text for details on the
root-cause analysis technique). Problems are marked with a
bullet, solutions with a bullet and a "!".
- No users to test when product is innovative
- No existing scenarios and tasks
- No existing users to profile
- Test subjects lack motivation to use unfamiliar product
- ! Simulate future environment
- ! Educate users about your vision of the future
- ! Use surrogate users.
- ! Pay users to participate
- ! Find / train users with advanced mind-frame
- ! Scene setting / theater / role-playing / simulations
- Customer site confidentiality
- Non-disclosure limitations
- Lack of access to individual power users
- Difficult to get into customer sites (various reasons)
- High cost of customer visits (especially for global products)
- ! Capture customers after purchase and prior to product delivery
- ! Motivate customers by explaining why you need to observer them in realistic context
- ! Combine customer visits with other trips (vacations, conferences, unrelated business trips)
- ! Train surrogate users
- Programmers may not see benefits in collaborating with usability specialists (Cowboy programmers don't need no stinkin' usability)
- Power users are opinionated & defensive about change (may have career built around product)
- Strong preference bias for the familiar inhibits innovation
- Users may have to switch back and forth between old and new products
- ! Involve developers in testing
- ! Explain reasons for changes / describe the new metaphor
- ! "Buddy System" (pairing participant with a test administrator assistant)
- ! Design for transfer of learning
- ! Test early and often
- Product is going to ship no matter what
- Limited resources (people, money, equipment)
- Tight development schedules (not enough time)
- No tool / technique to select critical portions of interface for testing
- Tests lag behind critical design decision points (results are too late to impact current release)
- ! Realize it may be necessary to delay solution till future release (i.e., leap-frogging may be necessary)
- ! Find a champion at a senior level
- ! Advertise successes to increase support
- ! Cosponsor tests with marketing organization
- ! Integrate usability tests into iterative design process
- ! Focus testing on a critical subset of tasks or features
- ! Reserve some resources for longitudinal studies
- ! Hire users to generate something truly needed by the program team during the usability test (e.g., documentation, tutorial examples, sample output, etc.)
- ! Use early and frequent designer / peer evaluations and task walk-throughs as an inexpensive alternative to full tests
- Lack appropriate usability goals.
- Usability goals can be discrete but are always interacting and frequently conflicting.
- Conflicts between needs of power and casual users.
- Definition of "usable" varies over time / design process.
- Usability goals don't include power user requirements.
- Designs don't support comfort "in the long haul."
- Usability goals don't include support of customers' business objectives.
- Difficulty understanding links in "food chain" of product use.
- Tests don't provide valid measurement of goals.
- ! Define usability for all variables in the beginning of design process.
- ! Usability goals table includes functions and tasks
- ! Set priorities for discrete goals prior to starting design
- ! Be willing to refine goals
- ! Study what makes power users happy: interviews, observations, questionnaires.
- ! Discuss the usability goals with identified power users, using real examples.
- ! Get data on customer goals (business value to their customers).
- ! Get business goals from customer councils.
- ! Get marketing group to find out customer goals.
- ! Set measurable criteria for usability goals - test against those criteria.
- ! Compare test data to benchmarks of prior versions or competitive products.
- No tools for study of real users, tasks, work context, time frame.
- Customized solutions and patterns of use (products and use differ from site to site.)
- Easy to overlook need for cooperation between related tasks (applications).
- International usage contexts can differ dramatically.
- Some power users focus on tools instead of task.
- Tool expertise vs. process expertise - how can we study the latter?
- ! Visit / interview users.
- ! Have users describe difficult tasks.
- ! Session logging, think aloud and later playback / review.
- ! Other contextual inquiry tools.
- ! Visit representative sample of sites, supplement with phone or mail surveys
- ! Enlist international sales and marketing.
- ! Learn about cultures.
- ! Get international customer and user data, e.g., when they travel (travel to them if you can.)
- ! Study real things (e.g., real jobs, real workflows)
- Difficult to create real-world context when testing
- Simulation is difficult and tedious
- How do you simulate "real" work and work flows in tests
- No tools for simulating flow interactions
- Hard to test when tasks are problem solving or fault handling tasks
- How are artifacts represented?
- Do tasks define the job?
- ! Do more contextual inquires
- ! Heuristic evaluation of prototypes (paper or whatever)
- ! User panel / focus group to work with designers
- ! Make users feel like designers (participation in design)
- ! Make the designers "feel" like the users (use the product)
- ! Train designers and developers on existing product
- ! Get software developers to create flow simulations
- ! Have users send in samples of work made with product
- ! Site visits before design - collect jobs; use them in tests
- ! Collaborative prototyping tools (Group Kit from Calgary)
- Customer council and user group members may be decision makers not real users
- How to learn power users' actual tasks, (jobs) so we can test for them
- Power users are not a homogeneous group making it difficult to define selection criteria
- How to create power users when not available
- Power users are a minority of users for many products
- Is power usage defined by type of tasks, efficiency, ability, or scope of knowledge?
- What is the role of training in defining power users?
- Don't have screening process to select users
- ! Have customers nominate power users
- ! Use support and sales to identify power users
- ! Define user profile collection projects explicitly, to identify real user tasks
- ! Watch task flows of power users even on other products
- ! Test with different groups of users
- ! Dismiss users that do not fit profile (at any time)
- Available subjects don't match user profile
- How will you learn how surrogates differ from real users?
- Customer contacts tend not to be real users
- Users who want to be involved in the design process may not be representative users
- How to address differences in international users and usage contexts?
- ! Select subjects that match user profile
- ! Use surrogates for real users, e.g., system test operators, system analysts, trade show demonstrators, etc.
- ! Train surrogates to more closely match the user profile
- ! Offer "Free" / early training to lure real user participants
- ! Novices with good professional orientation to the product can find many problems
- ! Novices with expert task domain knowledge find many problems
- Tests don't measure power usability goals
- How to measuring power use with low-fidelity prototypes?
- Difficult to test novel solutions for power users
- How to handle rich amount of data?
- Hard to do field tests in real contexts
- Context of use varies dramatically for a multinational product
- Designers blind spots
- Users may become more like designers with repetitive experience in usability assessment
- When is a sample size large enough?
- Difficulty motivating participants in simulated tasks
- How to avoid over-simplifying the product complexity and get good data?
- How to avoid "over training" users so learnability and conceptual model problems are found in testing.
- ! Design test to include measures of goals criteria
- ! Do phased testing - refine for 2nd, (3rd) passes
- ! Test release x to inform design of release x+1
- ! Try everything - a mix of methods is very important
- ! Try anything (as opposed to nothing) then improve it
- ! Use tools for study of real work in context: interviews, artifact capture, data logging, sessions logging, thinking aloud, audio journals, user diaries
- ! Continuous context and data collection with key customers
- ! Get real customer jobs and use in tests
- ! Include an "own work" phase (test subjects bring in their own work to do with new system)
- ! Highlight "outside the box" practices
- ! "Critical incident" software
- ! Start analysis immediately to get changes into next release (e.g., when study is of extended use)
- ! Products can include data analysis / logging capability
- ! Separate the roles to minimize blind spots: designer, trainer for test, test administrator
- ! User interface races (Shneiderman, 1990) to motivate skilled subjects
- ! Use designers as confederates in multi-user test contexts
- ! Hands-on training prior to test session
- ! Let user explore prior to testing
- ! Wait a week between training and test
- ! Gather anecdotal data during training
- ! Collect learnability data during power user training sessions
Figure 1. Workshop participants clustering related statements of problem causes onto the fishbone diagram
Figure 2. Workshop Participants posing with the completed fish-bone cause and effect diagram. Left to right: Keith Karn, Xerox Corporation, USA; Tim White, Eastman Kodak Company, USA; Jill Drury, The MITRE Corporation, USA; Marc Krolczyk, Xerox Corporation, USA; Vibeke Jorgensen, Kommunedata, Denmark; Anette Zobbe, Kommunedata, Denmark; Gerard Jorna, Philips Corporate Design, The Netherlands; Tom Perry, Xerox Corporation, USA; Julianne Chatelain, Trellix Corporation, USA; Ronald Baecker, University of Toronto, Canada; Jose Madeira Garcia, Autodesk, USA; Kaisa Vaananen-Vainio-Mattila, Nokia, Finland; Stephanie Rosenbaum, Tec-Ed, Inc., USA; (not pictured: Judith Rattle, Philips Corporate Design, The Netherlands).