No earlier issue with same topic
Issue
Previous article
Article
SIGCHI Bulletin
Vol.28 No.2, April 1996
Next article
Article
No later issue with same topic
Issue

Interactive Video

Michael K. Stenzler and Richard R. Eckert
What Does Interactive Mean?
Levels of Interactivity
Examples
Other Uses
Conclusion
Bibliography
About the Authors

A new and powerful method of communicating information is now creeping up on millions of unsuspecting future users. This technology is interactive video. The merging of video with computers has opened up a plethora of possibilities for making video more interactive.

Many people might ask why interactive video is desirable. What advantages would it give? The answer is that interactive video is a very effective tool for conveying information. When most people hear the term multimedia, they think that it just means adding more extensions onto a computer such as sound boards, CD-ROMs, or video overlay boards. However, when one looks at the majority of multimedia applications, one discovers that the driving force behind multimedia is better, more effective, ways to convey information. Adding sound enables more effective communication. For example, hearing a tiger roar has much more impact than a written description of a tiger's roar. Furthermore, if a picture is worth a thousand words then how much more is a motion picture worth (thirty pictures a second)? One five-second video clip could convey information that could not be adequately described in words. For example, describing the correct way to serve a tennis ball can be expressed in words and shown with photographs, but playing a video that demonstrates an expert player serving is much more effective.

Having more effective communication devices allows one to create better teaching tools. Many studies have demonstrated that multimedia instructional applications are more effective than traditional teaching methods [1-5]. Interactive video will make these educational tools even more effective. One of the programs described in this paper gives an excellent example of this. That program allows a person to take video clips of herself speaking Chinese phrases and then compares those clips with clips of native Chinese speakers saying the same phrases. Because the program shows the user a video clip of the native speaker, the user can see how the speaker's mouth moves while speaking as well as being able to hear the correct pronunciation. This is helpful when trying to mimic the sounds. With this program, a Chinese language student can practice and perfect her pronunciation without the need of anyone else's assistance.

What Does Interactive Mean?

A very important question is: what does it mean to say that a video application is interactive? One of the reasons why people have so many different views when trying to define interactive video is that there is no formal definition[6]. There is, therefore, no standard of comparison against to determine whether a video application is interactive. The dictionary's definition of interact is "To act upon each other, have reciprocal effect or influence." The definition of interactive is "Mutually or reciprocally active" [7]. It is clear that a person can act upon a video by giving commands such as "play" or "pause," but how can video act upon a user? In the case of video, the two agents involved are the user and the video application. Therefore, to satisfy the dictionary definition of interactive, the video application must influence the user's actions. It could do so by stopping and offering choices of what is to be viewed next.

Almost everyone in America has a VCR. With it, one can watch movies, rewind to watch a segment over again, fast forward to skip over parts, pause the movie, or even watch it in slow motion. People can also watch movies on laser disk players and can go directly to specific segments of the movie. Some might say that these capabilities make the movie interactive because the user can control the video. In some senses this is true, but the interaction is severely limited; for the most part, the user is acting on the video by controlling it. To be truly interactive, the video must present the user with choices of what to see next. While the user is certainly influenced by what is being viewed (for example, a particularly spectacular stunt might cause the user to rewind to watch the scene over again), the video is not taking an active part in stimulating the user to view another segment of video. In fact, all the video is doing is passively and blindly obeying the user's commands. To be "reciprocally active" the user's actions on the video must, in turn, cause the video to offer new choices of what to view next. Playback, rewinding, and pausing are simply responses to user's actions and do not offer new choices. Thus, these capabilities might be thought of as making the video interactive in the sense that the user provides applied action while the video provides passive action (doing exactly what the user commands). In order for the video to be active, it must respond to actions by providing her with choices that will lead to the viewing of different video clips.

For example, a movie might allow the user to determine the main character's actions at a critical point, such as whether to shoot the bad guy or let him escape. This choice will not only determine what video will be shown thereafter, but what future choices the user will be presented with as well. In this way, the user acts on the video, and the video offers the user new choices of what to view next, so it is therefore reciprocally active or interactive.

We propose the following formal definition of interactive video: A video application is interactive if the user affects the flow of the video and that influence, in turn, affects the user's future choices.

This means that the application does not simply feed the user clips of video while the user sits idly by and watches what is presented. While the video is being presented, the user must be provided an opportunity to give the program that controls the video input that will determine what will be shown next. Furthermore, the choice of video shown, based on the user's input, must affect what future choices the user will have.

A simple example of this would be an application that allows a user to tour a building. The video would be taken so that it shows what a person would see if actually walking through the building. The application might start opening a door and entering a lobby. There are three doors exiting the lobby and the user can choose to enter door number one, door number two, or door number three. The user chooses door number two and the application then shows a video of opening door number two and traveling down a hallway. At the end of the hallway the user sees a hallway to the right and a hallway to the left and has a choice of which direction to travel. If the user had originally chosen door number one, then the video shown would have been a clip of entering a room with one other door in it; the user would then have been given a choice of entering the new door or going back out of the room the way she came in. The user's initial choice determines which video clip the application plays, and that clip, in turn, determines what the user's next list of choices will be. Thus the user and the application each influence each other through their actions. Hence, the application is interactive.

Levels of Interactivity

Now that we have a working definition for interactive video, we propose criteria for levels of interactivity. Some video applications are more interactive than others, just as some programs are more user-friendly than others. How can we give the application a degree or measure of interactivity? Below is a list of capabilities that video applications might possess. Each item is given a number that specifies the degree of interactivity that the capabilities give to the application. "1" specifies the least degree of interactivity and "5" specifies the greatest degree of interactivity.

  1. Can branch to different video segments by presenting the user with a list of choices at certain points (spatial branching).
  2. Can branch to different video segments by allowing the user to tell the program to branch at any time; the program will use the point within the clip at which the user requested the branching to determine the video clip to show (temporal branching).
  3. Employs spatial branching and allows the user to extend the program by adding her own video clips to it.
  4. Employs temporal branching and allows the user to extend the program by adding her own video clips to it.
  5. Total immersion in the video so that it surrounds the user and changes as she changes the position and orientation of her head (video-based virtual reality).

The previously mentioned examples which allow the user to make choices in a movie and tour a building are examples of level one interactivity. At certain points in the presentation, the user is offered the opportunity to make decisions that determine which sequence of video clips will be shown next.

An example of the second level would be an application that tells all about the lions of the Serengeti so that whenever the user sees or hears something about which she wants more information, she could hit a "more info" button, and the program would automatically branch to a segment that gives more detailed information on whatever subject was the main focus at the moment the user pushed the button. In other words, the content of a specified section of the current video corresponds to a certain time interval within that segment. If the user presses the "more info" button within this temporal window, a predetermined clip will be shown next; if the button is pressed within some other time window, a different clip will be shown. This makes it temporal branching. As can be seen from the example of the lions of the Serengeti, temporal branching can be employed to create context sensitive video files.

A new level of interactivity is achieved when the user can extend the application by incorporating her own video clips. Here the structure that represents the different video clips and how they are linked to each other must be dynamic so that it can grow as the user adds to it. This allows the user to actually define the video that is played by the program. The user can make her own clips of whatever she wants to add to the program. A form of interaction takes place when the user makes the video. At this time the user has complete control over the contents of the video clip. The program "Chinese.tbk" that is detailed later in this paper is an example of an application that allows the user to add video clips to pre-prepared video.

Third level interactivity requires both spatial branching and the ability for the user to extend the program. This means that when a user adds a video clip, the program must also let the user tell it how that clip is to be linked to the clips already present. The program must also be able to branch to the new clip when it is appropriate. The user must be able to add new links to the old clips; i.e., change the pre-existing linking structure.

Fourth level interactivity is similar to third level except that the program must employ temporal branching instead of, or in addition to, spatial branching, and it must be able to branch out of or into a new clip. Fifth level interactivity is virtual reality. The user is immersed in the video (or so it seems). All the user sees is the video, and this changes as the user moves around; it appears that the user is actually a part of the video. At this time almost all work in virtual reality is done with computer graphics in which a computer-synthesized model of a scene can be recalculated and rendered from any point of view. Video-based virtual reality poses the problem of requiring video clips of every possible user vantage point. It is not likely that we will see any consumer-affordable virtual reality systems using video anytime soon, but some experts believe that three-dimensional television will be in existence shortly after the year 2003, and that this technology might give rise to virtual reality using video images [8].

Some might say that allowing a user to add video clips to a program would be more interactive than a virtual reality application (which would not allow a user to add clips), but this is not so. When a user adds clips, she has total control over the contents of the video clip while it is being shot, but after it is incorporated into the program, the user can no longer affect the contents of the clip. The user can affect how that program is linked to other clips, but cannot change how that clip will look. All the interaction, in this case, happens while extending the program. In virtual reality, all the interaction happens while the user is using the program. Virtual reality goes beyond just showing sequences of video clips. The scene changes with every movement of the user's head. If the user looks left, looks up, or walks forward, the scene shifts accordingly. In short, virtual reality is almost total interaction between the scene and the user.

Examples

To demonstrate the practical uses of interactive video, two programs were written using Multimedia Toolbook within the Microsoft Windows operating system environment [9]. The first program ("Chinese.tbk") is a Chinese language teaching aid that uses video to help the user learn Mandarin Chinese. The program consists of two "pages." The first page allows the user to specify video clips of native Chinese speakers speaking phrases in Mandarin. These clips will be referred to as "canned clips." The user then adds clips of herself speaking the same phrase. These will be referred to as "user clips." User clips could have been captured earlier or could be captured immediately as the program runs. Each phrase can be assigned a number of different user clips.

The second page of the "Chinese.tbk" program allows the user to view the clips that have been specified on the first page and compare her performance with that of the native speaker. After viewing the canned clip and the user clips for a given phrase, the user might want to capture another clip of herself. By returning to the first page and clicking on the "Vidcap" button, the user can capture another clip and then add it to the program.

Another program ("Vidtool.tbk") is a tool that enables the author of an interactive video application to put video clips together into a temporally linked video file. The author specifies the video clips and how each clip is to be linked to the other clips in the file. The position within each video clip is referenced by the frame numbers that make up the video. Thus, a three second video clip, displayed at thirty frames per second would have a length of 90. A position of 40 would refer to the fortieth video frame. The author specifies a time window by designating a starting position and an ending position. If, for example, a starting position of 40 and an ending position of 90 were to be specified, then all of the frames between frame number 40 and frame number 90 within the clip would constitute one time window. After specifying a time window, the author must designate a destination. The destination is the clip that will be shown next if the person viewing the video file clicks on the playing window while the current clip is within that time window. The current clip is within the time window if the frame being displayed at the moment the user clicks on the window is one of the frames between the starting and stopping positions of the time window. As the author enters the video clips, time windows, and destinations, the program puts all of the information into a single button that can be pasted into other Toolbook programs. "Vidtool.tbk" was used to make the video help function on the first page of the "Chinese.tbk" program.


Figure 1. Page one of "Chinese.tbk"


Figure 1 shows the screen for the first page of "Chinese.tbk." The text fields in the center of the screen display the item number, English translation for the phrase, and number of user clips for the phrase. Each phrase is considered an item. When the "Add Item" button is pressed, the program asks for the path names of the canned clips and the English meaning of the phrase before adding the new phrase as another item. The program calls up a dialog box to allow the user to search through various directories to find the file names of the video clips. To enter the English translation, the user must use the keyboard. The "Add Users" button enables the user to add user clips for each item. When the user clicks on this button, the program asks for the item number to which it will add the clip and then uses a dialog box to obtain the path name for the clip. The "Delete Item" button deletes an entire item from the list. The "Load" and "Save" buttons respectively load and save an entire list of items from and to a file. The "Clear" button clears all of the current items from the program. The "Vidcap" button causes Microsoft's Video for Window's Vidcap program to run enabling the user to capture a new piece of video. A video camera is connected directly to the video capture board and pointed at the user so that she can immediately capture a clip of herself speaking a phrase. When the user exits Vidcap, she is returned to the main program so that she may add and view the new clip. The button labeled "View Clips" takes the user to the second page. The "Help" button will be described below.

The second page of "Chinese.tbk" consists of three sections: the loading area on the bottom, the canned clip viewing area on the upper left side, and the user clip viewing area on the upper right side. Figure 2 shows the screen for this page. The loading area allows the user to load in the canned and user clips. The text field labeled "Sentences" displays text for each phrase. The "User Clip" field displays a number for each user clip that is associated with the selected phrase. The user loads in a clip by double clicking on the appropriate selection within the text field. The canned clip viewing area displays the video clip of the native Chinese speaker. The text fields below the viewing window display the English translation of the phrase and the length of the video clip. The buttons below behave like traditional VCR controls. The user clip viewing area is the same as the canned clip viewing area except that it displays the user clips. The "User Clip" text field displays the number of the user clip for the selected phrase. Clicking on the playing window of a video clip will close the clip, and clicking on the "Add Clips" button will take the user back to the first page.


Figure 2. Page two of "Chinese.tbk"


Figure 3 shows the screen for "Vidtool.tbk." There is only one page for this program. The "Add Clip" button adds a video clip to the program. When the user adds a clip, the number, clip name, and clip path name is put into the appropriate fields at the top right hand side of the screen. The "Make Link" button requests a starting point, stopping point, and a destination clip to link to. This specifies a time window within the current clip where it will link to another clip. The current clip is always the clip whose name appears in the "Current Clip" field in the center of the screen. When the "Make Link" button is used, the information is put into the appropriate fields at the bottom right hand side of the screen. The "Clear" button clears all information from the screen. The left hand side of the screen is divided into three sections. The top section is a playing window in which video is displayed. The middle section controls the video being displayed and shows the current position of the video and the length of the clip. Time intervals are specified in terms of numbers indicating the frames within the video clip. The author employs the two upper sections of the left hand side of the screen to determine which positions to use as starting and stopping points. The lower section of the left side of the screen contains the buttons for pasting the finished video file into another application. The "Play" and "Close" button play the video file and close it. When the user clicks on the "Play" button, the video is shown in the playing window. The "Paste" button copies the "Play" and "Close" buttons onto the clipboard so that they can be pasted elsewhere.


Figure 3. Page one of "Vidtool.tbk"


"Vidtool.tbk" was used to make the video help function in "Chinese.tbk." After being pasted, the "Play" button's name was changed to "Help." When "Help" is clicked, a video clip of the program's author appears to describe the elements of the screen. The first clip gives a general description of the functions of the text fields and then the buttons. If the user clicks on the playing window during the first half of the clip when the author is talking about the text fields, the clip is closed and replaced with a clip in which the author gives a more detailed description of the text fields. If the user clicks on the playing field during the second half of the clip in which the author is talking about the buttons, the clip is replaced with one that gives more details about the buttons.

So what ratings would these programs get for interactivity? The "Chinese.tbk" program allows users to add their own clips to the program and the user can choose which clips to view. This alone gives the program an interactivity rating of three by the rating scheme introduced in this paper. The program furthermore contains a temporally linked video file and this brings the program up to an interactivity rating of four. The Vidtool.tbk" program also has a fourth level rating because it allows the user to make temporally linked video files and lets her use her own video clips in the files. In addition, the user can stop a video clip at any time in both programs. Hence both of these programs have the highest rating achievable aside from virtual reality.

Other Uses

The purpose for presenting these programs is not only to demonstrate how to use video more interactively, but also to show the many different ways that video can be used in computer applications. The Chinese language teaching aid can be used in teaching any foreign language. It can play any video clip so that if, for example, the user got clips of Japanese people speaking Japanese, then it would be a Japanese language teaching aid.

The "Vidtool.tbk" program has even more potential uses. For instance, it could be used to make video help files, personal video mail files, or training files. In fact this program is really just a prototype for an even more useful program that could be used by people other than programmers. It would be a variation on "Vidtool.tbk" that would have a more elegant user interface designed so that any Windows literate user can use it. (A system using icons to represent the video clips and lines among them to represent the links would be ideal.) The program would allow the user to specify the video clips to be employed and the links among them. When the user is finished, this information would be saved in a file with an extension like ".hvf" for hyper-video file. A separate program would read and run the hyper-video file. This second program might be called an hvf reader. The hvf reader, hvf file and video clips would all be required to create the final interactive program. This is obviously the kind of application that would require a CD-ROM medium, so it will not be feasible for the home market until CD-ROM recorders are a household item or until some other large storage medium becomes readily available. It would be extremely practical, however, for use with CD-ROM publishers. A company would not even need a programmer to create hyper-video presentations for whatever application in which they might be needed. Ultimately, this kind of technology could lead to a whole new medium for conveying information using a video processor to generate hyper-linked video documents just as we now use a word processor to generate text documents.

Conclusion

It is hoped that in this paper we have posed some provocative questions for people to think about concerning the merging of digital video and computers. What does it mean for video to be interactive? How can we use video interactively? Why should we use video in computer programs at all? We have proposed a definition for interactive video, set forth a rating scheme to determine how interactive an application is, given some examples of programs that use video interactively, and attempted to provide some insight as to how interactive video will used in the future. The point of the paper, however, is not to try to preach answers to questions that we have just barely started thinking about, but to steer people's minds in new directions when approaching these questions and, hopefully, to inspire new, more creative questions as well as new and creative answers. It is the hope of the authors that the ideas presented here will act as a springboard from which much more research and creative thinking will be applied to this new and exciting field.

Bibliography

[1]
Bangert-Drowns, Robert L., Kulik, James A., and Kulik, Chen-Lin C., Effectiveness of Computer-Based Education in Secondary Schools, Journal of Computer-Based Instruction 12, no. 3 (Summer, 1985), 59-68.
[2]
Kulik, Chen-Lin C., and Kulik, James A., Effectiveness of Computer-Based Education in Colleges, AEDS Journal (Winter/Spring, 1986), 81-108.
[3]
Kulik, Chen-Lin C., and Kulik, James A., Effectiveness of Computer-Based Instruction: An Updated Analysis, Computers in Human Behavior, 7, (1991), 75-94.
[4]
Kulik, James A., Meta-Analytic Studies of Findings on Computer-Based Instruction. In Eva L. Baker and Harold F. O'Neil, Jr., eds., Technology Assessment in Education and Training, Hillsdale, NJ, Lawrence Erlbaum, in press.
[5]
Kulik, James A., Kulik, Chen-Lin C., and Bangert-Drowns, Robert L., Effectiveness of Computer-Based Education in Elementary Schools, Computers in Human Behavior, I (1985), 59-74.
[6]
Schneiderman, Ben, Personal E-mail communication. 22 June, 1994.
[7]
Webster's Third New International Dictionary of the English Language, 1971 ed.
[8]
Child, Jeffery, "Real-time Video Compression Poses Challenges to Designers and Venders Alike," Computer Design, July, 1993, 67-76.
[9]
Asymetrix Corporation, Toolbook User Manual, 1994.

About the Authors

Richard R. Eckert received the B.S. degree in physics in 1964 from Case Institute of Technology, the M.S. degree in experimental solid-state physics in 1966, and the Ph.D. degree in experimental high-energy physics in 1971 from the University of Kansas. He was a Professor of Physics and Computer Science at Catholic University of Puerto Rico from 1971 to 1983 and since then has been an Associate Professor of Computer Science in the Thomas J. Watson School of Engineering and Applied Science at Binghamton University, Binghamton, New York. His professional interests are computer graphics, multimedia computing, microprocessor-based systems, computer architecture, and computer science education. Currently he directs the department's Human Computer Interface Laboratory, whose World Wide Web home page URL is http://hcirisc.cs.binghamton.edu/.

Address: Dept. Computer Science, Thomas J. Watson School of Engineering and Applied Science, Binghamton University, PO Box 6000, Binghamton, New York 13902-6000, USA. Email: reckert@bingsuns.cc.binghamton.edu

Michael Stenzler currently works in New York City as a programmer for Downtown Digital, a division of AT&T. He received his BS (1992) and MS (1994) in Computer Science from Binghamton University. During his graduate schooling he made multimedia systems a special topic of study and wrote his masters thesis on interactive digital video. For more information, his WWW home page can be found at http://interport.net/~stenzler/.

No earlier issue with same topic
Issue
Previous article
Article
SIGCHI Bulletin
Vol.28 No.2, April 1996
Next article
Article
No later issue with same topic
Issue