Issue

Article

Vol.29 No.4, October 1997

Article

Issue

Usability Testing of World Wide Web Sites: A CHI 97 Workshop

Michael D. Levi and Frederick G. Conrad

Introduction

Summary of Position Papers

Brainstorms

What Are the Known Usability Problems in Web Sites?

What Are the Special Characteristics of the Web?

What Organizational Issues Make it Difficult to Perform Usability Testing on Web Sites?

Guiding Themes of Workshop

Testing as Part of a Whole

Taxonomy of Sites

Rapid, Remote, Automated

Organizational Hurdles

Cross Reference of Problem / Method / Stage in Life Cycle

Workshop Participants

About the Authors

Introduction

World Wide Web site usability -- for better or worse -- affects millions of users on a daily basis. Although usability engineering has come to play an increasingly important role in conventional software development, it is rarely part of Web site development. While some good Web style guides have now appeared to aid the site designer, adherence to up-front usability guidelines by itself does not guarantee a usable end product. A distinct evaluation process is required: usability testing. This workshop, Usability Testing of World Wide Web Sites, was held to explore three major topics:

What are the known usability problems in Web sites, and what testing methods successfully uncover these problems?
What are the special characteristics (limitations and opportunities) of the Web as a medium, and how can we exploit these characteristics in our testing?
What are the major organizational hurdles that need to be overcome to permit usability testing to take place, and how can they be addressed?

The organizers hoped that group interaction during the workshop would yield novel methods and adaptations of existing methods that are tailored specifically to testing Web site usability.

Summary of Position Papers

Workshop participants were each asked to write a position paper summarizing (1) the methods they had used for usability testing Web sites, (2) the types of usability problems these methods had uncovered, and (3) their overall assessment of the methods. The papers were intended to serve as a starting point for group discussion.

The participants reported evaluating numerous sites, designed for diverse purposes, intended for an extremely heterogeneous group of user populations. Some of the usability problems that these evaluations exposed resembled usability problems associated with conventional software: unclear labeling and vocabulary, unreasonable memory demands on the user, the design was not informed by an analysis of the tasks it required of users. Other kinds of problems were more characteristic of graphics intensive or hypertext applications, though not necessarily on the Web, e.g. cluttered, overused graphics and inadequate "fan-out" (links to other pages and sites). Still other problems were quite specific to Web site structure and design, for example, users did not understand the site structure as the designers intended, the site was difficult to navigate, the site was not designed for a clear user population. Finally, the assessments raised several issues that affect usability indirectly but are particularly serious in Web sites: users' concerns about privacy and security and insufficient buy-in from stakeholders at the design stage

The authors reported using a variety of evaluation methods. Most of these were developed for testing conventional software usability, though some were particularly appropriate for Web site evaluation. Heuristic evaluation and other inspection methods were widely used, and for the same limited resource issues that they are used with conventional software. Paper prototypes and storyboarding were used in the early stages of design, much as they are used with conventional software. Checklists of important design considerations were used to confirm that these considerations were, in fact, implemented. End user evaluations were also used to some extent, with and without the user thinking aloud, with and without the evaluator participating; these provided the same insight into the users' thinking that they do with conventional software evaluation. Satisfaction judgments and focus groups provided subjective measures of usability, analogous to their use with conventional software.

In contrast, sorting and mapping tasks -- card sorts, site drawing, linking icons to concepts, listing expectations for particular labels -- seemed to have been particularly well suited to uncovering issues of Web site structure. Similarly, analysis of usage logs, search strings, and feedback to site contacts provided information about users' experience and goals that would not have been available with a conventional desktop application.

Several methodological obstacles to evaluating Web sites were reported by various authors. One obstacle is site size: Many sites are too large to test exhaustively with any method and it is likely that, unless the testing is automatic, some paths will not be evaluated before the site is opened to the public. Another set of obstacles involves the inadequacy of built in tools for collecting usability data: (1) browser history lists do not capture order or time; (2) server logs provide too much, mostly irrelevant information and cannot be confidently segmented into "sessions"; server logs also do not capture information about cached pages; (3) search strings and feedback to site contacts are indirect measures of goals and pertain to users who are having trouble not those who are more successful. Other methodological obstacles included: the lack of tools for remote testing; target users and actual users are more likely to differ than they are for traditional software; the developer does not have control over platforms, browsers and speed of connection; organizations may be reluctant to support usability testing for Web sites, even if they support it for traditional software.

A tension ran throughout the position papers between the view of Web sites as software and Web sites as something else. By the first view, Web sites have the same usability problems as conventional software, but in different proportions. For example, Web sites seem to be afflicted with navigation problems far more often than other types of software. The same methods that have been effective at detecting software usability problems should be equally effective in determining where Web sites are hard to use.

The other view maintains that Web sites have unique characteristics that make them different from conventional software and that they need to be evaluated differently than conventional software is evaluated. One version of this view holds that Web sites are more like publications than conventional software -- except in transaction sites where they really are front ends to applications -- so the same considerations that make documents readable and usable are directly relevant. Another version of the view holds that because Web sites are created to provide information to people, an appropriate measure of their usability is the quantity and quality of knowledge people actually acquire from a site. When evaluating desktop applications, we rarely ask how the users' knowledge has changed as a result of her interaction (though two notable exceptions are educational and information retrieval software when user knowledge is frequently assessed as part of usability evaluation).

Another way in which Web sites differ from conventional software is the speed with which they change after they are made available to the public. This means that rapid evaluation methods are essential. It might also mean that seasoned Web site visitors expect less predictability than conventional software users. Finally, Web site usage differs from software usage because the identity of the users can be difficult to establish. Consequently, the conventional software design principle, "know thy user", is hard to apply in Web site design and evaluation.

With these ideas as backdrop, the participants collectively addressed the major topics of the workshop.

Brainstorms

Over the course of the two-day workshop, three brainstorming sessions were held to initiate discussion of the three major topics. These brainstorming sessions generated lists of 30 to 70 items. A representative sample of each list follows.

What Are the Special Characteristics of the Web?

A variety of users
Rapid evolution
Professionals and amateurs share same space
Unknown entry points
Lack of gatekeeping
Importance of first impressions
Page as unit
No taxonomy
Automatic usage logging
A variety of platforms
Boundaries are fuzzy
Decentralization of development and maintenance
Variety of skills required does not match developer skills
Competition for space with other sites
Time to load
Browser as intermediary
Early in development

What Organizational Issues Make it Difficult to Perform Usability Testing on Web Sites?

Multiple owners
Unclear goals
No time or money for testing
Web perceived only as advertising medium
It appears to be too easy to create or change sites
Tight deadlines (because development is so easy)
HCI folks called in too late
HCI folks are understaffed, overtasked
Lack of understanding of Web by management
Lack of standards
Inertia -- resistance to change
Evaluator & designer are the same person
Lack of development infrastructure
Perception that evaluation is fast and easy (and can be done intuitively)
Fear of loss of control
Corporate instead of user view of purpose

[Note: Many of these organizational items are not unique to the Web. See discussion below]

Guiding Themes of Workshop

As the brainstormed lists were further discussed, refined, and categorized, several guiding themes emerged:

Testing as Part of a Whole

While defining the parameters of discussion for the workshop as a whole, the participants agreed that there is a certain artificiality in separating usability testing from usability engineering at large. The group agreed that it is impossible to "test in" usability, that to be effective usability evaluation must be part of a larger effort starting with user-centered analysis and design.

Taxonomy of Sites

No comprehensive taxonomy of Web sites currently exits. But issues related to Web site usability, and to testing Web sites for usability, are related to the type of Web site being analyzed. The two primary types of site that workshop participants had personal experience with were informational sites and transactional sites, which were perceived as being quite different. The prototypical Web site is probably one people visit to obtain information. Increasingly, however, Web sites are being designed for doing business -- buying products, downloading software, etc. Transactional sites actually are interfaces to Web applications -- software running in the background. This introduces many of the software usability issues that might not be evident for retrieval sites.

Very large Web sites often have their own, unique, set of issues. Large sites frequently have many designers and no single repository of design information. Large sites are hard to change because the consequences of change are hard to anticipate and change will disrupt many loyal users.

A third dimension along which one can categorize Web sites has to do with site structure and might be called "webiness," that is the degree to which a site is hierarchical versus totally interconnected. This is relevant because certain user tasks may lead to particular search strategies but certain site structures may interfere with those search techniques; for example, a depth first approach may not fit a highly interconnected site.

Rapid, Remote, Automated

One mantra for Web usability testing could be "Rapid, Remote, Automated."

"Rapid" because organizational pressures often cause extraordinarily tight deadlines, and testing must fit into these parameters or it will not be employed.

"Remote" because so many Web users are geographically dispersed, and it would facilitate empirical testing with real users if tools and techniques could be developed to promote testing at a distance. Some of the ideas that were floated at the workshop included using the Web to recruit subjects (through advertising or registration), using the Web for scheduling, using the Web to transmit test data, including possibly video or voice accompaniment. What all of these discussion points had in common was exploiting the capabilities of the medium even while we are testing that medium.

Another thought was to organize cooperation among geographically distant organizations, and arrange for reciprocal user testing.

"Automated" because Web sites can easily become large enough that manually examining every page and every link would be exhausting and error-prone. Automated tools can never replace testing with users, but could detect easy and routine usability problems (broken links, conformance to a crude style guide) as well as generate metrics on site complexity, page hits, time on page, etc. which would help in determining where to look for potential problems. This would free humans to probe the harder, more cognitive difficulties, and inform their analyses. Automated tools might also speed up testing.

In addition, the wealth of data captured in Web server logs can be used to recreate and study user interactions within a site, both under controlled circumstances and under conditions of real use. The typical server logs do not capture everything one would want, and can be supplemented by additional tools. One participant was working on a Java-based logging tool to do just that.

Table 1: Summary Cross Reference of Problem/Method/Stage in Life Cycle

Management/ Maintenance Technology Constraints Navigation Structure Content Mismatched Goals Page Layout
Expert UI Inspection XO+ XO+ XO+
Paper Prototypes and Storyboarding O X O O
Exploratory End-User Testing X X XO X
Scenario-based End-User Testing X X X
Sorting and Mapping XO
Opportunistic Log Analysis +
Intentional Log Analysis + X+ +
Feedback Analysis + + + +
Search Term Analysis X+ + +
Questionnaires + + X
Focus Groups O O+ XO+
User Interviews XO+ O+
Check Lists X+ XO XO+
Automated Checking Tools X+ X X+

O : Analysis; X : Functional Prototype; + : Post Release

Organizational Hurdles

Many, perhaps most, of the organizational hurdles facing usability testing Web sites are the same as those facing usability engineering in other software domains: insufficient resources allocated, defensive developers, management ignorance, suspicion, fear, and doubt. The HCI community has been dealing with these for a long time, and can expect to continue to do so.

But the Web brings some new wrinkles to the effort. Three aspects of the Web, in particular, exacerbate existing struggles: the perception of ease in development and use, decentralized development, and extremely rapid development.

While traditional software development is generally perceived as the domain of professionals, the Web is currently seen as a very accessible medium in which anyone can create pages or sites. One consequence of this perception is that many Web authors are new to system development, and have no knowledge or understanding of HCI (or any other aspects of software engineering).

Coupled with a new developer community is the fact that many Web sites are developed by multiple groups, with different goals, often in different physical locations, and often without any central gatekeeping function. Web sites have a tendency to expand incrementally and without coordinated planning.

Finally, based largely on the perceived ease of development, Web development efforts tend to be unusually rapid, with insufficient staff, time, or other resources devoted to initial analysis, design, or evaluation efforts. This is not new, but the magnitude has increased.

Workshop participants generally agreed that HCI professionals need to continue (perhaps step up) existing education efforts. Old arguments need to be made in this new context: focus on the bottom line (long term cost savings, increased customer satisfaction, desire to attract and retain discretionary users, etc.)

Cross Reference of Problem / Method / Stage in Life Cycle

As the final exercise of the workshop, the group worked on identifying which techniques are likely to detect what type of problem during what phase of the life cycle.

Three points in the Web site development process were examined: the analysis phase; late in the design process, when a functional prototype has been built; and after a site has been released.

Seven broad categories of usability problems were identified:

Management/Maintenance. Examples include broken links and obsolete data
Technology Constraints. Does the site work on the intended platform(s)? Has the most appropriate technology been used to support the design?
Navigation. Examples include scrolling, back button use, dead ends.
Structure. Does the site organization match the users' mental model?
Content. Does the content of the site support the intended users' goals? Is the language appropriate?
Mismatched goals. Do the users' goals coincide with the site designers' intent?
Page Layout. Does the page design facilitate users accomplishing their tasks? Is the site consistent?

Workshop Participants

Nigel Bevan, National Physical Laboratory, UK
Frederick G. Conrad, US Bureau of Labor Statistics (organizer)
Laura L. Downey, US National Institute of Standards and Technology
Kathy Frederickson-Mele, US Bureau of Labor Statistics
Laurie Kantner, Tec-Ed Inc.
Irvin R. Katz, Educational Testing Service
Carol Kilpatrick, Georgia State University
Sharon J. Laskowski, US National Institute of Standards and Technology
Michael D. Levi, US Bureau of Labor Statistics (organizer)
Gary Marchionini, University of Maryland
Richard C. Omanson, Ameritech
Janice (Ginny) Redish, Redish & Associates, Inc.
Alain Robillard, Computer Research Institute of Montreal
Mary Hunter Utt, Open Market, Inc.
Richard W.F. Whitehand, Nomos Management AB
Gregory A. Wilt, Bell Atlantic

About the Authors

Michael D. Levi is a project manager at the U.S. Bureau of Labor Statistics (BLS), where he led the development of the BLS Web site. A past chair of the BLS User Interface Working Group, Mr. Levi was instrumental in the development of HCI guidelines for BLS interactive systems, as well as the design and implementation of an internal HCI training curriculum for BLS analysts. He can be reached at:

US Bureau of Labor Statistics
Division of Federal/State Systems
Room 5890, 2 Massachusetts Ave., NE
Washington, DC 20212, USA
email: levi_m@bls.gov

Frederick G. Conrad is a cognitive psychologist in the Office of Survey Methods Research at BLS. His current work includes the development and evaluation of methods for collecting and disseminating statistical data. This entails both human-computer interaction and human-human interaction. Dr. Conrad can be reached at:

US Bureau of Labor Statistics
Office of Survey Methods Research
Room 4230, 2 Massachusetts Ave., NE
Washington, DC 20212, USA
email: conrad_f@bls.gov

Issue
Article
Vol.29 No.4, October 1997
Article
Issue

	Management/ Maintenance	Technology Constraints	Navigation	Structure	Content	Mismatched Goals	Page Layout
Expert UI Inspection		XO+		XO+			XO+
Paper Prototypes and Storyboarding			O	X	O		O
Exploratory End-User Testing			X		X	XO	X
Scenario-based End-User Testing			X		X		X
Sorting and Mapping				XO
Opportunistic Log Analysis	+
Intentional Log Analysis	+		X+	+
Feedback Analysis	+	+	+		+
Search Term Analysis				X+	+	+
Questionnaires		+	+		X
Focus Groups		O			O+	XO+
User Interviews					XO+	O+
Check Lists	X+	XO					XO+
Automated Checking Tools	X+	X					X+