Cognitive Architectures and HCI

Susan S. Kirschenbaum, Wayne D. Gray, Richard M. Young

Introduction
Represented Architectures
HCI Tasks
References
About the Authors

The Cognitive Architectures and Human-Computer Interaction Workshop examined computational cognitive modeling approaches to human-computer interaction issues (HCI). The five major architectures and variations represented were briefly summarized. Participants compared approaches to a set of selected HCI problems and alternative solutions, and compared the strengths and weaknesses of the architectures. A list of additional issues was generated and discussed.

Keywords: computational cognitive modeling, user modeling, simulation, cognitive architectures, Soar, ACT-R, EPIC, 3CAPS, construction-integration.

Introduction

The use of cognitive architectures for modeling is beginning to have an interesting, but little noted, impact on the types of modeling applied to HCI issues. In the past the software used to construct models was either very generic (as in the use of OPS5) or hand-crafted by the individual modeler. Recently this has changed. At first generic connectionist architectures and, more recently, symbol manipulation and hybrid architectures such as Soar, ACT-R, CAPS, construction-integration (CI), and EPIC have emerged as systems that are centrally supported and whose use has (or has the potential to) spread beyond their developers. Most of the models presented at recent CHI conferences have been developed and presented as an instance of a modeling architecture (e.g., Kitajima & Polson, 1992; Peck & John, 1992; Rieman, Lewis, Young & Polson, 1994).

The latest generation of models incorporate cognitive theory and have been shown to account for basic cognitive phenomena, within architecture-defined boundaries. The ability to use a mechanism without needing to demonstrate its validity, has freed modelers to focus on how various cognitive mechanisms interact with particular instantiations of tools (such as the software interface) in accomplishing particular tasks. It is this freedom and concomitant power that has fueled the current interest in applying computational cognitive modeling to HCI tasks

This workshop brought together 14 HCI researchers who have expertise in one or more cognitive architectures. Together, we sought to identify ways in which the choice of cognitive architecture bears on the practice and prospects for modeling in HCI. Represented architectures and approaches included Soar, ACT-R, construction-integration, 3CAPS, EPIC, and rational analysis. A list of participants can be found in Appendix A.

Represented Architectures

A review of the major features of the architectures represented at the workshop is presented below.

ACT-R (Anderson, 1993):

ACT-R is a production system architecture with network-like associations among working memory elements. It assumes that time is continuous and maintains a declarative/procedural memory distinction with productions equated with procedural memory.

New productions are learned by analogy. New declarative memory chunks are created as a result of production firings. Productions and chunks have an activation level that may affects speed of matching and increases with use or decreases with disuse. The important role of the various types of activation is both a strength and weakness of ACT-R. While it allows many interesting cognitive processes to be modeled and examined, setting these parameters for individual tasks can be difficult.

ACT-R's conflict resolution mechanism is based on the value PG-C computed for each production where: P is the probability of success, G is the value of the goal, and C is the cost associated with the production.

CAPS/3CAPS(Just & Carpenter, 1992):

CAPS originated as a natural language text (sentence) processing system. It is an activation based production system in which any production that is above threshold fires. There is no conflict resolution.

Byrne (Kirlik & Byrne, 1994; Byrne, 1994) has modified the CAPS architecture to account for HCI phenomena. In 3CAPS, capacity-constrained pools of activation cause slower processing as capacity limits are exceeded. The issue of selecting numbers for setting activation and thresholds was raised in discussion. One solution is to select initially a uniform value.

Construction-Integration (CI) (Kintsch, 1988; Mannes & Kintsch, 1991):

Construction-Integration theory was developed to solve the combinatorics problem among alternative understandings in natural language text comprehension. It proceeds in cycles of understanding that are approximately one phrase in length (1-2 sec). In each cycle it constructs a network of linked possible meanings. Integration, occurring by spreading activation, provides context-sensitive comprehension.

Kitajima and Polson (1992, 1994, in press) use this architecture to solve problems in action planning with display-based HCI. They use linked propositions of task knowledge to yield a network with constraints on possible actions. These models, with cycle time on the order of 1-2 seconds, make meaningful choices and errors. The system does not yet incorporate learning.

EPIC (Kieras & Meyer, 1994):

EPIC is a human performance model based on the Model Human Processor (Card, Moran, & Newell, 1983). It accounts for parallel, multiple task performance with well elaborated perceptual-motor processing. There is no explicit goal stack and goals are treated as just another kind of working memory element. There is currently no learning.

Rational analysis (Russell, Stefik, Pirolli, & Card, 1993):

Since ACT-R will only fire one production per cycle, when multiple productions match existing conditions, a conflict resolution process is needed to select the one that will fire. Conflict resolution is a subsymbolic process that is governed by a form of rational analysis (Anderson, 1991, 1993). Pirolli and Card discussed substituting ACT-R's rational analysis mechanism with an alternative formulation arising from ecological foraging theory (Smith & Winterhalder, 1992; Stephens & Krebs, 1986). They argued that this formulation better fit the process of search for information in large databases. They pointed out that this alternative formulation uses all of ACT-R's theoretical mechanisms except its algorithms for rational analysis.

Information foraging theory has developed a Cost-of-Knowledge Characteristic Function (COKCF) (Card, Pirolli, & Mackinlay, 1994) of information and information diet that accounts for adaptive information search strategies. These analyses lead to a scatter/gather interaction with an information cluster hierarchy. The COKCF is used to predict when to spend time searching within one information source versus when it is time to switch sources as the quantity of information in the current source decreases.

Rational analysis is interested in behavior that spans a range from the middle of the cognitive band of activity (seconds), across the rational band (minutes to hours), and perhaps into the social band (days to months) (Newell, 1990). Discussion raised the issue of how the modeler determines the appropriate level of analysis.

Soar (Newell, 1990):

Soar is a purely symbolic architecture with productions written at the single operator level (?50 ms for decision cycles and ?10 ms for elaboration cycles). This is a much lower level of analysis than other architectures, requiring many more productions. Production instantiations are fired in parallel, with most conflicts resolved by subgoaling. Learning builds new chunks, i.e., new production rules, from the results of subgoaling. Of the architectures discussed during the workshop, Soar is the best known, with the greatest presence in the CHI proceedings, and was represented in the workshop by the greatest number of participants.

HCI Tasks

To focus the discussion on the specifics of HCI issues, in contrast to architectural generalities, participants were asked to analyze, outline, or model how their architecture would handle a specific HCI task. Of the tasks suggested by the organizers, most participants chose to model first time users of a graph-drawing program, Cricket Graph. Data for this scenario were available from Franzke and Rieman (1993) and Franzke (1994). Others chose to model the typical walk-up-and-use automatic teller machine (money machine or ATM). A few chose tasks of their own devising.

After the different approaches were outlined and discussed, the workshop analyzed the strengths and weaknesses of the different architectures for different HCI tasks and problems. Specific topics included:

Which architectures are best for what kinds of analyses and problems
Methodology appropriate for the development and use of cognitive models
Comparison of cognitive modeling and other approaches for solving similar HCI problems
The theoretic basis for candidate cognitive architectures

Table 1 summarizes the selected tasks by architecture.

Table 1: Selected Tasks by Architecture, Comparing Architecture with Approach/emphasis

Task: ATM

ACT-R
Modeled both task and customer
Soar
Modeled as a learning problem
EPIC
Modeled as a perceptual motor performance problem
CAPS
Modeled the forgotten card problem and redesigned the ATM to solve problem
Soar
Used a backtracing mechanism for learning from instruction to solve problem
Task: Cricket Graph:

CI
Used knowledge about possible actions, the device goal, and the task goal
Rational analysis
Used situated approach to explain foraging within known class of operating systems (e.g., Mac). Modeled task in ACT-R & Soar
ACT-R
Used learning by analogy to learn new version
Soar
Used detailed subgoaling
Task: Other -- Data search

Rational Analysis/ACT-R
Showed how a scatter/gather interaction technique clusters documents according to queries. LTM of queries and COKCF used by ACT-R model

Perhaps inevitably, direct comparisons between different models of the same device were hindered because of differences in interpretation of the task. For example, in the ATM case, one Soar modeler chose to model the learning involved in first time use of an ATM, while the CAPS modeler chose to model and resolve the "forgotten card" problem. Equally inevitably, these differences highlighted the relevant issues and promoted discussion on whether the different architectures facilitated the modeling of different tasks. A list of comparisons and issues was generated from this discussion.

Comparisons among architectures and models in capabilities and emphasis included questions of grain size; ability and mechanism(s) for handling learning and motor performance; completeness of the models (e.g., disembodied cognition or inability to account for learning); and the questions of how to set parameter values in principled and theoretically valid ways.
Discussion about whether/when to use cognitive models for HCI design problems included examining the motivation for modeling (e.g., useful vs. scientifically interesting); choice of cognitive versus engineering, task-analysis models; applicability of theories; error and goal management; workload control; and the value added above and beyond task analysis.
Practical issues such as support environments, an organized user community (or lack thereof), and usability of the modeling tools are important determinants of how quickly the use of the different architectures spreads beyond their developers.
Contributions that modeling makes to HCI today include better understanding of the nature of information and knowledge used for tasks with high volumes of information, predicting error proneness, good vs. bad graphic layout, choice of metaphor, appropriate pacing, and device intelligibility.
Lastly, there were practical and principled questions that are largely unanswered today. These include issues such as the ability (or lack thereof) to integrate models built in the same architecture but constructed independently (by different developers or at different times), the use and desirability of truly hybrid (connectionist and symbolic) architectures; selecting parameter values; and grain size of analysis and modeling.

Resolution of these question may depend on the context, inclination of the modeler, and practical considerations. However, it is hoped that these and other issues, along with some success stories, will be addressed in papers for a special issue of the Human-Computer Interaction journal (currently in preparation) on the topic of "Cognitive Architectures and HCI."

References

Anderson, J. R. (1993).: Rules of the Mind. Hillsdale, NJ: Erlbaum.
Byrne, M. D. (1994a).: Integrating, not debating, situated action and computational models: Taking the environment seriously. Proceedings Of The Sixteenth Annual Conference Of The Cognitive Science Society, 118-123. Hillsdale, NJ: Lawrence Erlbaum.
Card, S. K., Moran, T. P., & Newell. A. (1983).: The Psychology of Human-Computer Interaction. Hillsdale, NJ: Erlbaum.
Card, S. K., Pirolli, P., & Mackinlay, J. (1994).: The cost-of-knowledge characteristic function: Display evaluation for direct walk information visualizations. In Proceedings of the CHI '94 Conference on Human Factors in Computing Systems (pp. 238-244), Boston, MA.
Franzke, M. and Rieman, J., (1993).: Natural training wheels: Learning and transfer between two versions of a computer application. Proceedings of Vienna Conference on HCI, VCHCI'93. Berlin: Springer-Verlag, 317-328.
Franzke, M., (1994).: Exploration and Experienced Performance with Display-Based Systems. Ph.D. Dissertation, University of Colorado, ICS Tech. Rpt. 94-07, Institute of Cognitive Science, C.B. 344, University of Colorado, Boulder, CO 80309-0344.
Just, M. A., & Carpenter, P. A. (1992).: A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122-149.
Kieras, D. E., & Meyer, D. E. (1994).: The EPIC architecture for modeling human information-processing: A brief introduction. (EPIC Tech. Rep. No. 1, TR-94/ONR-EPIC-1). Ann Arbor, University of Michigan, Department of Electrical Engineering and Computer Science.
Kintsch, W. (1988).: The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95, 163-182.
Kirlik, A., & Byrne, M. D. (1994).: Identifying Environmental Contributions to Skilled Interaction. Presented at the NASA Ames Cognitive Modeling Workshop, February 23, 1994.
Kitajima, M., & Polson, P. G. (1992).: A Computational Model of Skilled Use of a Graphical User Interface. In P. Bauersfeld, J. Bennett, & G. Lynch (Eds.), ACM CHI'92 Conference on Human Factors in Computing Systems, (pp. 241-249). New York: ACM Press.
Kitajima, M. & Polson, P.G. (1994).: A comprehension-based model of correct performance and errors in skilled, display-based human-computer interaction. Technical Report # 94-02, Institute of Cognitive Science, University of Colorado.
Kitajima, M. & Polson, P.G. (1995).: A comprehension-based model of correct performance and errors in skilled, display-based human-computer interaction. International Journal of Human-Computer Studies, 43, 65-99.
Mannes, S., & Kintsch, W. (1991).: Routine computing task: Planning as understanding. Cognitive Science, 15, 305-342.
Peck, V. A., & John, B. E. (1992).: Browser-Soar: A Computational Model of a Highly Interactive Task. In P. Bauersfeld, J. Bennett, & G. Lynch (Eds.), ACM CHI'92 Conference on Human Factors in Computing Systems, (pp. 165-172). New York: ACM Press.
Rieman, J., Lewis, C., Young, R. M., & Polson, P. G. (1994).: "Why is a Raven Like a Writing Desk?" Lessons in Interface Consistency and Analogical Reasoning from Two Cognitive Architectures. In B. Adelson, S. Dumais, & J. Olson (Eds.), ACM CHI'94 Conference on Human Factors in Computing Systems, (Vol. 1, pp. 438-444). New York: ACM Press.
Smith, E. A. & Winterhalder, B. (1992).: Evolutionary Ecology and Human Behavior. New York: de Gruyter.
Stephens, D. W. & Krebs, J. R. (1986).: Foraging Theory. Princeton, NJ: Princeton University Press.

About the Authors

Susan S. Kirschenbaum is an Engineering Psychologist the Naval Undersea Warfare Center, Division Newport. Her principal interests include expertise, decision making, and decision support systems. One current research project will use ACT-R to model the pre-decisional information search of submarine commanders as a way of determining decision support requirements.

Wayne D. Gray is the coordinator of George Mason University's Human Factors and Applied Cognition program. He has worked in both private industry and government. His research interests include the design of reasoning congruent interfaces; usability evaluation methods (UEMs); and understanding how human cognition interacts with the way in which an artifact is designed to facilitate or hinder the accomplishment of tasks. His research tools include computational cognitive modeling (using ACT-R) and cognitive task analysis (using GOMS).

Richard M Young is a Research Scientist at the UK Medical Research Council's Applied Psychology Unit in Cambridge. His background is in Artificial Intelligence and Experimental Psychology, and his research involves the computational modeling of human cognition. His current projects employ Soar as a cognitive architecture for modeling the user in HCI.

Authors' Addresses

Susan S. Kirschenbaum
Code 2214, Building 1171/1
Naval Undersea Warfare Center Division
Newport, RI 02841, USA
Kirsch@c223.npt.navy.mil

Wayne D. Gray
George Mason University
m/s 3f5
Fairfax, VA 22030-4444, USA
gray@gmu.edu

Richard M. Young
MRC APU
15 Chaucer Road
Cambridge CB2 2EF, UK
richard.young@mrc-apu.cam.ac.uk

Issue
Article
Vol.28 No.2, April 1996
Article
Issue