Issue

Article

Vol.30 No.4, October 1998

Article

Issue

Competitive Testing

Competitive Testing Issues and Methodology:
A CHI 98 Special Interest Group

Kristyn Greenwood, Suzy Czarkowski

Introduction

Defining Competitive Testing

Discussion Topics

Discussion topic: Methodological Issues

Number of Participants

Time for Tasks

Missing Features

Training

Discussion topic: Useful Techniques

Retrospective Thinking Out Loud

Discussion topic: Useful Guidelines

Plan early and often

Choose and design tasks carefully

Involve stakeholders at an early stage

Have someone else design and run your test

Choose the participants for the test carefully

Limit the number of products compared within a single test to three

Provide product training to all test participants before the test session

Consider classroom training

Consider alternative think aloud techniques

Focus on the usability of a task, not the usability of performing a task the first time

Consult your legal department

Test in stages

Do not perform a competitive test as the product's first usability test

Do not make a competitive test your first usability test

Conclusion

About the Authors

Author's Addresses

Introduction

The CHI 98 Special Interest Group (SIG) titled `Competitive Testing: Issues and Methodologies' met on Tuesday, April 21, in the Los Angeles Civic Center.

The organizers were Kristyn Greenwood and Kelly Braun of Oracle Corporation, and Suzy Czarkowski of the American Institutes for Research.

The purpose of this SIG was to define comparative and competitive testing and discuss goals, methodologies and techniques for handling unexpected problems. We concluded the SIG by drafting a basic set of guidelines for conducting competitive and comparative tests.

Approximately 35-40 people attended the SIG. The majority of the attendees worked for corporations that conduct competitive usability tests for internal uses. The remainder of the attendees represented independent consulting organizations who conduct competitive tests for other companies, and individuals who had yet to perform a competitive test, but were interested in learning about competitive testing methodology.

SIG participants work in the software, hardware, consumer product, and medical products industries. Some of the companies they represented were Kodak, Symix, Hewlett-Packard, Diamond Bullet Design, Microsoft and Media Linx Interactive. About 20% of the SIG attendees work for consulting organizations such as National Physical Laboratory, Behavioristics, and Tec-Ed Inc. Finally, there were a few participants from academia who represented Michigan State University, University of Linz, and Carnegie Mellon University.

SIG participants expressed a variety of reasons for performing competitive usability tests. The majority of the participants said they conduct competitive tests in order to improve product design, to create an internal benchmark of products (test different versions of the same product), or to compile results for marketing and press releases (compare their product against a competitor product). The attendees who had performed competitive tests said they perform approximately 1-3 tests per year.

The report that follows summarizes the topics discussed and presents a preliminary list of competitive testing guidelines suggested by SIG participants. Due to the active participation of the attendees, we discussed many topics, so not all of the issues we discussed were resolved.

Defining Competitive Testing

The first task of this SIG was to craft a comprehensive definition of competitive and comparative testing. We quickly realized that this would not be an easy task.

SIG participants practice widely diverging types of usability testing and for a multitude of reasons. The group discussion focused on clarifying the similarities and differences between (1) competitive and diagnostic tests and (2) competitive and comparative tests.

After much debate, we decided that the major difference between diagnostic usability testing and competitive testing involved the number of products being tested. Both methods require collecting performance measures (such as time on task or number of errors) and subjective measures (such as user reported satisfaction scores) as users perform a series of tasks. However, a diagnostic usability test collects this data for one product or a component of a product, while competitive and comparative tests gather this data for two or more products with the goal of comparing them.

We also debated whether `testing' was the correct term for the competitive and comparative activities being discussed. Some activities were not really `tests' as no measures were collected. For example, someone may perform a heuristic analysis of several similar products without collecting any measurable data. SIG participants thought that the term `analysis' was appropriate for comparisons that did not collect measures to differentiate the products.

Finally, we looked for similarities and differences between comparative and competitive tests. We concluded that they are similar in that they both collect data for more than one product. However, they differ in the way that the data are used. Competitive test results are often used to make claims about a product's superiority, whereas comparative test results are generally used for internal purposes.

The type of test or analysis that one conducts depends upon the goals of the test and the resources available. An organization might plan to run a competitive test if it intends to publish the results or distribute them outside of the organization. An organization might decide to run a comparative test for several reasons: to use the results for internal purposes, or to get a quick answer rather than an answer they have to defend in public.

Discussion Topics

We engaged in an active discussion that covered a wide range of topics. The discussion was divided into three sections. In the first section, SIG participants discussed methodological issues. In the second section, they described personal testing techniques that they use for conducting competitive tests. In the third section, they recommended guidelines and advice for anyone who plans to run a competitive test.

Discussion topic: Methodological Issues

The discussion of competitive test methodologies revolved around issues of experimental methodology for competitive and comparative tests.

Number of Participants

Unfortunately, SIG participants did not come to a consensus on a formula for determining the correct number of test participants. Some suggested that to ensure experimental validity there should be at least 10 participants per group. Others pointed out that 10 per group seemed too small and that for "press" validity (popular acceptance,) you should test over 20 participants per group.

Time for Tasks

The SIG participants could not decide upon a definitive method for determining the maximum time to allow for the completion of a task. It was generally agreed that it was not practical to let test participants work until they are done as that might exceed the time available for the test. However, there was no consensus on a precise formula for determining the maximum available time. We suggested allowing three times the amount of time that it takes an expert to perform the task. However, many who viewed this as too short a time suggested multiplying the amount of time it takes an average user to complete the task by three to determine the maximum task time.

Missing Features

SIG participants brought up the issue of whether you can test features that are unique to one product. Opinions were divided on this issue. Some recommended never including a unique feature in a competitive test because it would lead to claims of bias -- that is, unfairly favoring one product over another product. They suggested acknowledging the feature in the summary report but not including it in the test. Other participants disagreed and stated that as long as the task can be accomplished in both products it is perfectly legitimate to include unique features in a competitive test.

Training

The final topic was on the issue of training before a test. Many competitive tests require that test participants receive some training before performing the tasks. This could range from half an hour's work with a product tutorial to many days of classroom training. This raises some complex issues, such as ensuring equal training across products and determining the level of training required. We were unable to address these issues in the time available.

Discussion topic: Useful Techniques

SIG participants discussed some of the techniques they use while performing competitive and comparative tests. We did not discuss all of these techniques in depth so some may not be as methodologically sound as others. We have reported all the techniques that arose during the discussion, but we do not recommend their use for all testing situations.

Retrospective Thinking Out Loud

Thinking aloud may hinder task completion and distort task times. This is primarily a problem with between- subject methodologies. An alternative to thinking aloud during the task is retrospective thinking out loud. Using this technique, users review a tape of their test session when finished and explain what they were thinking at different times during the test. Knowing what participants are thinking while performing a task is very important. However, the post hoc explanations provided by the test participants may not be accurate as they may not be unaware of or unable to recall the reasons for their actions or may devise a reason in order to impress the test administrator.

Repeating a Task

Often the tasks that users perform are complex. Consequently, measures taken while they perform the task for the first time may not necessarily reflect the actual usability of daily usage. A suggested solution is to have users perform the task twice. The task is not timed the first time but is timed the second. This technique allows testers to gather data on the usability of doing a task rather than how intuitive it is to perform it the first time.

Classroom Training

To ensure that data is gathered on the usability of performing a task rather than how intuitive it is to do the task the first time, another suggested technique is to have all test participants attend classes on the product before the test session.

Long Tasks

Some tasks, such as monitoring the weather, extend over time and do not fit within a single test session. If you require data on long- term usage of a product, a technique suggested by one SIG participant is to send the product to the users and have them keep a diary as they work with it. Unfortunately, this technique may provide incorrect data as users may not record all of their experiences or may not understand what actions led to the incident they are reporting. They may also embellish their experiences in order to impress the test administrator.

Prior Experience

Some SIG participants brought up the issue of including test participants with prior experience with one of the products. These individuals may indicate greater ease of use or a preference for their current product, not because it is more usable but because of its greater familiarity. A suggested method of obtaining a more accurate usability rating is to ask them which product they would recommend to a friend who is unfamiliar with either of the products. However, it is possible that other elements besides usability may affect a test participant's recommendation.

Discussion topic: Useful Guidelines

SIG participants created a set of guidelines for individuals interested in competitive testing.

Plan early and often

Competitive tests are complex and it is important to keep track of all the details. Planning should begin a minimum of three months prior to the first test session. Detailed plans of the test prevent confusion among all involved parties and help the tester to foresee any potential problems.

Choose and design tasks carefully

Make sure that your tasks are representative and realistic. If the tasks you design are not representative of the tasks performed on the products, then it weakens the claims that can be made about the test. The tasks chosen should also reflect the high-level goals of the study. If you wish to know which product is most usable for report generation, you would not use a calculation task. Finally, it is important that all tasks be reviewed. This means that a credible domain expert, who is not involved with the test but knows the products, should review the tasks and confirm that they are representative tasks for all tested products.

Involve stakeholders at an early stage

Stakeholders are the individuals for whom the results of the test will have meaning, such as marketing or product managers. Frequently, they also commission or request the competitive test. It is important that they be involved in the planning so that you can be sure that the test will meet their needs. Early involvement will also help them to realize the limitations of the claims that can be made about the test. One method of involving them is to have them serve as pilot test participants.

Have someone else design and run your test

The possibility of accidentally biasing the results of the test and the perception of bias are greatly reduced if an outside person/agency runs the test. Tasks that an outside agency might perform include participant recruitment, task creation, running the test, and analyzing the results. How strict you are about controlling the perception of bias depends upon the type of testing you perform. If you are performing a pure 100 percent comparative test or analysis, then you don't need to be as concerned about outside perceptions of bias. If there is a chance that you might you plan to publish or distribute the results of the test outside of your company, controlling bias and the perception that there might have been bias is of primary importance.

Choose the participants for the test carefully

You want test participants who are representative of product users but will not favor one product over another. Sometimes it helps to have an outside domain expert or independent testing organization create the user profile and perform the recruiting for the test.

Limit the number of products compared within a single test to three

If you need to perform comparisons between more than three products, then divide the test into multiple tests of 2-3 products. With more than three products, statistical analysis becomes more complex making it difficult to draw conclusions. In addition, subjects may become tired when using more than three products.

Provide product training to all test participants before the test session

This will help ensure that data is gathered on the usability of performing a task rather than how intuitive it is to do the task the first time.

Consider classroom training

To ensure that data is gathered on the usability of performing a task rather than how intuitive it is to do the task the first time, one suggested technique is to have all test participants attend classes on the product before the test session.

Consider alternative think aloud techniques

Thinking aloud may hinder task completion and distort task times. This is primarily a problem with between subject methodologies. Knowing what participants are thinking while performing a test is very important. An alternative to thinking aloud during the test is retrospective thinking out loud. Using this technique, users review a tape of their test session when finished and explain what they were thinking at different times during the test.

Focus on the usability of a task, not the usability of performing a task the first time

Consult your legal department

Check with your legal department to determine if there are any issues pertinent to the software that you are testing that may affect the design of the competitive test. Software licenses are difficult to read and understand. Your legal department can also provide guidance on the type of claims that can be drawn from the test results.

Test in stages

Divide the test into phases so that it is easier to postpone or cancel the remainder of the test if there are problems with the test methodology or if the results of the test are not favorable to your product. This prevents you from spending additional money and resources on a competitive test.

Suzy Czarkowski
American Institutes for Research
490 Virginia Rd.
Concord, MA 01742, USA

email: sczarkowski@air-ne.org
Tel: +1- 978-371-5885

Issue
Article
Vol.30 No.4, October 1998
Article
Issue