Issue |
Article |
Vol.30 No.1, January 1998 |
Article |
Issue |
The question is considered whether it is possible for humans to program computers that can mimic human mental activities. Vision as a perceptual process has been selected for this purpose. In order to answer this question the conceptual model called E-B-C is proposed. In the E-B-C model E, B, and C are three observational points whose metaphorical meaning can be changed. From the one aspect, E, B, and C can be understood as the eye, the brain, and the consciousness, respectively; from the other aspect, they can be seen as an input device, a data processing system, and information interpretation. In both aspects E and B imply data transformations which are crucial for the information at C. It is concluded that the process of seeing is a data reduction process, a pattern recognition process, a feature extraction process, as well as a goal oriented process. These features can not be described sufficiently in terms of computational processes that are carried out on present-day computers. A list of some classic papers is given in the references.
A Slice of the Present
An electronic device with a built-in program (software) telling it how to control itself (hardware) is called a computer. From its first appearance up until now the computer has gone through a number of technological phases which have made it smaller, faster, and cheaper; but basically the philosophy of its functioning has remained the same. The computer has become an inevitable part of modern society. It is used in many fields of human activity and it is hard to say where it cannot be used.
First of all, it can be used as a calculating machine. When the computer came along science got an excellent tool. Humans were able to solve for the first time a lot of computational problems by means of the computer. The key to success was not its `smartness' but its speed. Basically, the solutions were already given by humans, for example, in the form of mathematical formulas transformed or translated into programs. The `mechanized' work was then performed in accordance with the program `injected' into the computer hardware. In many cases programs were written in high-level programming languages that bridge the gap between human language and that of the machine. However, there exist a number of programming languages that differ basically in their linguistic power, ranging from high-level to assembly languages.
A number of books have been written about programming techniques, telling us which approach (method) to choose for a specific problem in order to `squeeze' the most from the computer, in terms of memory requirements, execution time, precision of calculation methods, etc. Programming -- problem solving by means of the computer -- requires from humans a complete, detailed, and concise description of the problem under investigation. For a number of problems this kind of programming is sufficient.
There are situations, however, where this `classical' approach to problem solving is inefficient. Humans are simply not able to describe certain kinds of problems using a language. Does this mean that language as a human-to-human interface, and nowadays also as a human-to-machine interface, has not enough descriptive power? There are many everyday problems that are solved by human intuition, such as the degree of precision with which one can estimate, without knowing mathematical formulas, the time an approaching vehicle will need to pass us. Evidently, there are problems that are easy to solve for humans and very difficult to describe (program) for the computer.
Capabilities of today's computers and new transdisciplinary sciences, such as robotics, have motivated professionals in technical domains to be involved in questions concerning human mental activities where vision and speech recognition are of primary interest. In order to accelerate the rate of progress toward more capable computers there is a strong tendency to make computerized devices mimic `laws' that relate to the ways humans `function,' especially when a perceptual process is involved in solving the problem [14]. To illuminate the underlying perceptual process it is a technical necessity to investigate it by means of models that can to some extent simulate it.
One of the most important human perceptors is vision -- the window into humans' external environment. In order to mimic the process of human vision by means of the computer technical methods are used. Data bearing pictorial information imbedded in the optical form of energy must first be converted to an equivalent digital form in order to be processed later in the computer. Concerns about how to construct computer vision, what principles to use in order to extract a kind of human sense from a cluster of data, lead to questions such as: What mechanisms underlie the experience of seeing? [12] What does `I see' mean? In order for the human system to detect an external signal imbedded in the `visual' form of energy (light), it must be transformed into the `bio' form of energy. The term `bio' is to be understood as to comprise those forms of energy that are intrinsic to the human system. The transformation is performed by the visual system, of which the eye is a part. After this transformation is completed the question arises as to what is happening in the human system so that one can say `I see'? However, to see objects humans must recognize them. It follows that humans see only those objects that they somehow know, and cannot `see' those they have not seen before. How should the term `see' be understood?
To illustrate the process of seeing let us take the following metaphor, a simple conceptual model called E-B-C. For the first case assume that E, B, C, represent three different persons, and imagine that you are person C. Person E is to make an image of what (objects) she sees in the environment she is looking at. This image depicts the `visual' surfaces of objects viewed in the environment. Person B is to make a replica of this image with elements -- constituents -- available to him from his repository of knowledge. For the present, assume that what is available to him are constituents and knowledge (methods) of how constituents are to be used to assemble the replica. So you, as person C, look at B's replica of the image originally made by E.
Let us now assume that E is the human eye system and B is the brain, in which case C is your consciousness. The E system transforms optical data about the environment into bio signals by means of the E-transformation. The essence of this transformation is to obtain bio signals as possible counterparts of the light signals. However, these `optical' data imbedded in bio signals are not apparent to C as long as they remain unprocessed by B. During the B-transformation an ethereal construct is acquired and as a phenomenon becomes immediately available to C. The process performed by E can be called the perception or imitation process, and the process performed by B the recognition or reproduction process. The information made available to C obviously depends on E and B. For the sake of clarity I will call the process at E the E-transformation and the process at B the B-transformation.
Having this model in mind we can conclude that C never `sees' reality -- the external environment -- and that C's understanding of the environment depends entirely on E and B. E's output is the truth for B and B's output is the truth for C. It follows that C never knows whether E or B or both of them make a mistake with regard to the image, its replica, or both.
Rephrasing E-B-C in terms of the computer system, E can be represented with an optical input device and the E-transformation corresponds to the preprocessing of data. The way data are preprocessed may depend on their usage. The E-transformation, also called image analysis, is assumed to be domain independent, that is, the same processing methods may be used on data regardless of the information carried. The B-transformation, also known as scene analysis, is domain dependent, which means that certain objects and presumably certain relations among them are expected to be recognized and reconstructed from the data. What precisely is encompassed in image analysis or scene analysis depends on a particular implementation of a computer-based vision system. At best, present experiences with computer vision systems indicate that there are numerous limitations, and a great deal of complex software is required for a relatively simple vision task to be performed.
In the E-B-C model constituents are consolidated into a unitary whole. Moreover, combinations of these preconstructed elements make the replica possible. Constituents resulting from a B-transformation can be seen in a number of different ways.
In one way, constituents can be viewed as counterparts of geometrical properties of the image. In that case the B-transformation results in geometrical types of constituents. The replica is a geometrical interpretation of the external world. Such a replica is structurally isomorphic to the objects it represents. Assembled with constituents of this type a replica offers an inherently uniform aspect of understanding the external environment. It can be seen as a combination of a limited number of primitive forms. In other words, the external environment as depicted by the replica appears to be highly structured.
In another way, constituents can be seen as counterparts of the logical properties of the image. The replica does not represent a geometrical description of the external environment as in the previous case, but it represents a `logical' description of the image. For example, constituents provide to C information for spatial orientation. The external environment as `seen' on the image by B is the basis for the B-transformation resulting in logical types of constituents which bear no geometrical information about the external world.
Comprehension of the above two views on constituents, geometrical and logical, rests on the notion that humans use a `qualitative' casual calculus in reasoning about the behavior of the external environment [1]. Driving a car in heavy traffic could be an example in which geometrical and logical space are present. Obviously, geometrical and logical complexities are intertwined. Consequently, constituents can be defined in terms of the quality of information as opposed to its quantity, and the quality of information depends on the E- and B-transformations. The transformations can be seen as data-reduction processes. The B-transformation has a special impact since details which are not in B's `interest' are not recognized during the process of the replica's emergence.
The replica can be synthesized from a certain number of constituents which can represent `detailed', `general', or both kinds of information about the external environment. The number of constituents corresponds to the maximum information B can assemble on the replica, and can be viewed as C's capacity.
With regard to the replica as seen by C, two diametrically different views can be considered. In the first one, it can be assumed that the B-transformation generates a replica with constituents of a lower quality, that is constituents which bear general information about the external environment, and, therefore, E's image is not necessarily required to be `perfect.' The degree of perfection of E-transformation must be within the limits to which the B-transformation is insensitive, i. e., a change of a `part' of the image causes no change of its replica. This condition postulates that there exists a many-to-one correspondence between E's image and B's replica.
According to the other view, it can be assumed that the B-transformation is good enough to make a trustworthy replica of E's image, in which case the constituents are of adequate quality. This means that there exists a one-to-one correspondence between E's image and B's replica. However, the cause of imperfection in this case can be found in the inadequacy of the E-transformation. In other words, there exists a many-to-one correspondence between the external environment (E's input) and the image (E's output). It follows that in the latter, but not in the previous case an increase in terms of a better B-transformation will not lead to any improvements or changes in the quality of the replica.
The problem of vision was chosen in an attempt to make a comparison between humans and computers. In order to describe the process of vision, humans are faced with answering questions about mechanisms that underlie this process. The E-B-C model has been given as an explanation of how the process of vision as a perception task can be understood. In this model two constructs, that is quasi-pictorial entities, are suggested, the image and its replica as the consequence of the E- and B-transformation, respectively. As opposed to its replica which is active, the image is impotent. In fact, the image needs an interpreter and the interpretation is what gives use to the replica. The interpretation process is the process in which `pattern' recognition takes place. The replica as seen by C has a dual interpretation, both a geometrical and a logical one. Which one is more in focus than the other depends on B's goal.
It was pointed out that there are constraints on the nature of the E- and B-transformations. In essence transformations capture constraints. Due to constraints, it is possible that there are many visual data that have no counterparts in the replica, and also that the replica might contain counterparts belonging to non-visual data, data that do not exist in the external environment at all. In the consecutive process of E- and B-transformations the final projection of the external environment appears on the hypothetical canvas narrowly constrained.
Studies about human interpretation of the external environment can be found in a number of publications. Also, research has been made on the computer-based interpretation of the external environment. Simple natural forms can be represented as combinations of volumetric primitives of various sizes.[8,9]. The idea to use cylinder-like primitives of different sizes has been maintained for quite a long time. Unfortunately, this kind of representation becomes awkward, however, if complex objects are to be considered. There are, of course, other suggestions in the literature.
A particular kind of computer image processing to be mentioned at this point is one which in most cases encompasses only the E part of the E-B-C concept. It is found in medical scanning devices called scanners and in computer tomographs. These computer-based devices -- capable of detecting the propagation of electromagnetic waves (of high frequency) caused by radioactive substances -- can take pictures of human inner organs. If a human organ contains radioactive material (entered into the blood stream) the data about dissipated radiation are detected by means of a collimator -- a device that produces a beam of parallel rays of light or other radiation -- and fed forward for medical image processing.
Investigations into human vision raise an avalanche of questions whose answers have roots in the human reasoning system -- the driving force of human cognitive action. Vision can be viewed as a consequence of common sense reasoning. The visual input into the system has the purpose of supplying the system with data from the external environment. The process of seeing is no doubt a data reduction process. The apparent complexity of the external environment can be understood as B's limitation -- all the `relevant' facts from the environment cannot be reproduced with a single replica.
To embody the principles of human vision, certain natural conditions must be met to achieve the phenomenon -- a conscious construct -- and nature does have a say about what they are. What mechanisms actually underly the experience of human vision is not in the realm of competence and will be left to the experts. It is believed, however, that questions concerning some human mental activities are too complex to be solved from the perspective of a single professional discipline.
It seems that computer simulations of human cognitive capabilities notably lack the most human `attributes' that are manifest in: (i) the capability to find a path between seemingly unrelated problems such as organized geometrical complexity, that appears to be confusing in problem solving, and a `logical' solution to the problem, (ii) the ability to deal with imprecise or partial information to draw reliable, trustworthy conclusions. The question is: are the precise operations of computers, as contrasted with the techniques used in decision-making, creating the fundamental difficulties in imitating human mental behavior?
The process analogous to the B-transformation in the E-B-C model appears to be different in humans and the computer. This process is, in the case of humans, not a stable process, whereas the execution of an `equivalent' computer program gives stable and a priori determined behavior. Since human reasoning is a goal-oriented activity some requirements are to be met. In the course of goal-seeking, a particular recognition set might succeed in one situation and fail in an other. In case of failure another recognition set is retrieved presumably according to certain sub-goals. Each `problem' might be addressed from several points of view. If one fails in a particular situation another one is chosen. With regard to the computer, when a particular computation fails the process corresponding to the B-transformation collapses, unless heavily constrained.
Nothing has been said about how the knowledge necessary to carry out the B-transformation is acquired. This question can be generalized to how knowledge is acquired at all [10]. As research stands, most of the computer applications have built-in knowledge. A computer with built-in know-how is called an expert system where the knowledge from a specific domain of human expertise is given in the form of computer programs and a repository of data. Much expert knowledge is of an ill-defined nature. There are a number of reasons indicating why `transplantations' of knowledge from human experts to computers are not as successful as had been expected. On the one hand, mechanisms for knowledge description have no provision for dealing with uncertainty; on the other hand, human verbal description of knowledge is inaccurate, and after all, the transplantation process is too slow. Expert systems usually cannot solve simple versions of the problems they are designed to solve; they have no common sense and show satisfactory results only when used in narrow applications.
There is no baby computer nor an adult one. However, a variety of machine-learning theories and applications exist that make computers adaptable to certain kinds of isolated situations [6,15]. Furthermore, not much is known about how humans learn and memorize [4,5,7]. Specifically, how do humans acquire concepts, and is human learning after all nothing else but cross-induction? [6,11] Last but not least, perception and reasoning processes in general are strongly influenced by human emotions. Since it is always difficult to explore a system of which one is a part, it is hard to make any kind of educated guess about how far humans can investigate themselves and mimic their behavior on an artifact -- the computer.[16,17].
[1] Bobrow, Daniel G. (Ed.), Qualitative Reasoning About Physical Systems. (Elsevier Science Publishers, Amsterdam, 1984.)
[2] Chomsky, Noam, Rules And Representations, The Behavioral and Brain Science (1980), 3, 1-61.
[3] Fodor, Jerry, Fixation Of Belief And Concept Acquisition. In Piattelli-Palmarini, M. (Ed.) Language and Learning. (Harvard University Press, 1980.)
[4] Klix, Friedhart, On Recognition Process In Human Memory. In Klix, F. and Hagendorf, H. (Eds.) Human Memory and Cognitive Capabilities. (Elsevier Science Publishers, 1986.)
[5] Mandler, George, Reminding, Recalling, Recognizing: Different Memories? In Klix, F. and Hagendorf, H. (Eds.) Human Memory and Cognitive Capabilities. (Elsevier Science Publishers, 1986.)
[6] Mitchalski, R. S., Carbonell, J. G. and Mitchell, T. M., Machine Learning: An Artificial Intelligence Approach. (Tioga Publishing Co., 1983.)
[7] Minsky, Marvin, K Lines: A Theory Of Memory, Cognitive Science 4, 117-133 (1980).
[8] Nishihara, H. K., Intensity, Visible-Surface, And Volumetric Representations, Artificial Intelligence 17 (1981) 265-284.
[9] Pentland, Alex P., Perceptual Organization And The Representation Of Natural Form, Artificial Intelligence, 28 (1986), 293-331.
[10] Piaget, Jean, The Psychogenesis Of Knowledge And Its Epistemological Significance. In Piattelli-Palmarini, M. (Ed.) Language and Learning. (Harvard University Press, 1980).
[11] Putnam, Hilary, What Is Innate And Why? In Piattelli-Palmarini, M. (Ed.) Language and Learning. (Harvard University Press, 1980).
[12] Pylyshyn, Zenon W., Imagery And Artificial Intelligence. In Minnesota Studies in the Philosophy of Science, Vol. 9., (University of Minnesota Press, 1978).
[13] Shepard, Roger N., The Mental Image, American Psychologist, (1978) 2, 125-137.
[14] Simon, Herbert A., Search And Reasoning In Problem Solving, Artificial Intelligence 21 (1983)
[15] Soklic, Milan, Adaptive Model For Decision Making, Pattern Recognition, Vol. 15, No. 6, pp. 485-493, 1982.
[16] Turing, A. M., Computing Machinery And Intelligence, Mind: A Quarterly Review of Psychology and Philosophy, October 1950, 433-460.
[17] Winograd, Terry, What Does It Mean To Understand Language?, Cognitive Science, 4, 209-241 (1980).
Milan Soklic has a Ph.D in Computer Science and a Ph.D in Electrical Engineering. Currently he is teaching courses in the Computer Science Department at the Southern Polytechnic State University. Before he taught at several universities in the United States and abroad. He has international experience working in industry and research laboratories. He is on the list of experts of the International Atomic Energy Agency. His academic, research and industrial interests are in the ares of computer architectures, software engineering, real-time systems, and parallel computing.
Milan E. Soklic
Computer Science Department
Southern Polytechnic State University
Marietta, GA, U.S.A.
Issue |
Article |
Vol.30 No.1, January 1998 |
Article |
Issue |