Neural Networks
Neural network (NN) modeling has developed as a major component of science's attempt to understand the brain. The fundamental question is, how do the brain's formidable information-processing abilities emerge from the self-organizing behavior of a collection of relatively simple neurons? NN modelers aspire to develop artificial systems with some brain-like abilities. Much of the progress in NN modeling has been made by physicists, who have extensive experience formulating and analyzing complicated mathematical models [1-4].
I. Introduction
A neural network is an information-processing system consisting of a collection of simple processing elements or "nodes." If the network has real neurons for nodes, it is a "biological neural network" (for example, the brain); otherwise it is an "artificial neural network" (ANN). The elements are interconnected so that the input to each node is determined by the outputs of some or all of the other nodes, and the whole accomplishes some useful "computation." In contrast with traditional electronic circuits, the nodes in a NN are generally approximately identical, and the collection's "program" is encoded in the architecture (pattern of connectivity) and in the strengths of the inter-node connections. Such an approach to computation is often called "parallel distributed processing" [5].
In neural network modeling, one constructs a mathematical model of such a system and attempts to answer questions like "What behavior is this model capable of producing?" and "How does the displayed behavior depend on the details of the model and its history?" Research usually involves both formal analytic derivation and computer simulation.
Much of the current work in NN modeling attempts to answer specific questions about the practical applicability of ANNs, such as "What is the most efficient algorithm for training an ANN to categorize input patterns into desired classes?" [6] Research also commonly seeks to illuminate the biological basis of brain functioning by matching neural network behavior to observable brain behavior, for example by asking "How does the ability of a network to recall stored 'memories' degrade as inter-node connections are destroyed?" [7]
More fundamentally, research with NN models can address questions like "How do sophisticated-seeming patterns of organization and behavior emerge from the interactions of simple elements acting on limited or local information, without any master blueprint or directing agent?" In the context of this latter sort of question, NNs are seen as merely one example of a class of "complex systems" [8] which includes ecosystems composed of interdependent and co-evolving species [9-11], economies composed of cooperating and competing agents [12], and embryos undergoing morphogenesis, composed of differentiating and structure-forming cells [13]. Research in NN modeling is therefore intimately linked to research in so-called "complex systems theory" or "complexity theory." This is appropriate, since the human brain is the most impressive example we know of sophisticated and organized behavior emerging from the operations of many simple interacting elements.
Complex systems theory is a recent interdisciplinary field of research with contributions from biologists, ecologists, economists, computer scientists, chemists, and physicists [14-16].
Perhaps the most profound "emergent property" of the human brain, and to a lesser extent the brains of other species, is the ability to construct (explicitly or unconsciously) models of itself and its environment. Loosely speaking, "modeling" as used here means the process of observing sensory data, discovering regularities of various sorts within the data stream, inferring general rules from these correlations, and using predictions of these general rules to guide behavior. The brain in fact seems to "folds back" this process upon itself, by abstracting and observing its general rules and rule-making with a higher-level description, discovering regularities at this level, inferring meta-level rules, and so forth for level upon level. It remains to be seen whether a neural network model can be constructed which demonstrates the emergent property of modeling its environment in this way.
If NN models are to demonstrate such sophisticated emergent behavior, we clearly must progress beyond the rigid and limited models currently being studied. We must seek models which allow a wider range of behavior if we are to find less trivial emergent and self-organizational properties. In so doing we will likely sacrifice the use of many conceptual approaches and analytic techniques currently in use, so we must attempt to develop new ways of conceptualizing and analyzing NN behavior. Viewing NN from a complex systems perspective may aid us to do this.
II. Selected Background
A. Neural Network Dynamics
One can designate a subset of the nodes of a NN as "input nodes," whose states are externally controlled, and another subset as "output nodes," whose states represent the model's observables. One can then attempt to encode an input-output mapping into the network's connection strengths such that when any particular set of input states is applied, the dynamics of the network generates the desired output states at the output nodes. Many tasks we would like NNs to perform, such as pattern recognition and similar input classification tasks, can be represented as an input state to output state mapping, so much of NN research concerns so-called "rule learning" [17, 18]. Networks designed for rule learning are generally (but not necessarily) arranged in a layered, "feed-forward" architecture, in which information propagates in one direction through the network. Such an architecture has no feedback loops anywhere, so that the states of all nodes are determined and stable as soon as signals from the input nodes have had time to propagate through the network. This type of network has no nontrivial dynamics at all, and research is almost exclusively concerned with developing and characterizing algorithms for setting the connection strengths by "training" a network on a set of representative input/output mappings [6, 19-22].
In 1981, J.J. Hopfield, a physicist by training, presented the first example of a NN model with nontrivial dynamics (state evolution) and predictable, useful behavior [4]. As an added attraction, his model was biologically plausible (if oversimplified) and quite tractable to formal analysis. Through both mathematical analysis and computer simulation, he demonstrated that a network of two-state nodes with symmetric bidirectional connections between all node pairs and with a simple threshold function for each node's updating rule always evolves to a stable fixed-point attractor. Furthermore, he presented a simple algorithm for determining the link strengths that would encode any desired set of node states ("patterns") as attractor states.
As a result, he showed that the network can be used as an "associative memory": the link strength rule can be used to encode a set of patterns representing memories, and if the network is presented with an "input pattern" initial state sufficiently close in state space to one of the stored patterns--say, the stored pattern plus noise or minus missing sections--the network state evolves until it stabilizes at the stored pattern, thus "recalling" the memory from partial information. The network's dynamics is less trivial than that of simple rule-learning input-output networks because the system's location in state space evolves dynamically, traveling from the initial state to a stable fixed-point attractor at the bottom of the appropriate basin of attraction.
Because the Hopfield model is formally identical to an Ising spin glass [1], Hopfield's work attracted many physicists from the study of random magnetic systems to NN modeling, with the result that a great deal was learned about the behavior of NNs. In particular, by casting NN models into a form addressable by statistical mechanics, one can derive results indicating the existence and nature of "phase transitions" between useful and useless network behavior as a function of various network parameters; the existence of such phase transitions has strong implications for the maximum number of patterns which can be stored in a network [23].
If one allows asymmetric connections in a NN model, dynamics more complicated than evolution to a fixed-point attractor can result: the network trajectory can limit to state cycles or chaotic attractors [24]. Herz et al. [25, 26] have shown that if the Hopfield model is extended so that the connections between nodes each have a characteristic signal propagation time, and if the propagation times are chosen so that a suitable distribution of times are represented in the network, then a straightforward extension of Hopfield's connection strength rule allows the network to encode sequences or cycles of node-state patterns. Static patterns can still be stored as fixed-point attractors, and pattern cycles are stored as limit cycles of the dynamics. The existence of limit cycle attractors is made possible by the fact that when encoding temporal information, the connection strength rule produces asymmetric connections.
B. Dynamical Systems and Emergent Computation
We have been describing NN models in terms of their regime of dynamical behavior: trivial (almost immediate stability), evolution to a fixed-point attractor, evolution to a limit-cycle attractor, or evolution to the discrete-state equivalent of a strange (chaotic) attractor. In complex systems theory, the prevailing view is that a system's regime of dynamical behavior largely determines the system's capacity to process information, perform computations, and generally exhibit sophisticated behavior and self-organization of complex structure [14]. Langton, Packard, and others have argued that cellular automata (CA) with nontrivial computational abilities, including all those that have been proven computation-universal, lie at the "edge of chaos" (EOC), a sort of critical line in CA parameter-space between CA with static or periodic asymptotic behavior and CA with chaotic asymptotic behavior [27-32]. This EOC has many of the properties of a second-order phase transition, including power-law behavior in spatial and temporal correlations and divergent transient times ("critical slowing down").
Preliminary results suggest that, in general, complex systems which demonstrate sophisticated behavior such as information-processing, adaptation to environmental changes, and self-organization of complex structural interrelationships exist at the EOC in an appropriately defined space of dynamical behavior. This is an intuitively reasonable statement. Static or periodic systems demonstrate simple behavior which is relatively robust against external influences. Chaotic systems are extremely sensitive to any sort of noise and are effectively random in their behavior. Only systems on an extended transient trajectory are capable of evolving in a way that encodes significant information. Moreover, strong parallels can be drawn between the behavior of dynamical systems and the properties of computational systems [33-36], and these also point to the significance of the EOC.
The behavior typical of systems at the EOC, evolving along extended transients, is termed "complex," as opposed to static, periodic, or chaotic. Such behavior is typically characterized by the formation of quasi-stable, interacting structures localized in time and space. The analysis of complex behavior is one of the outstanding problems of complex systems theory.
Most NN models studied to date operate in the static or periodic regime of behavior, and encode information in the location of the attractors in state space and the shapes of their basins of attraction; the few exceptions are models whose purpose is to study the existence and nature of chaotic behavior in a NN [24, 37]. Models with static or periodic behavior can be carefully tuned to provide desirable behavior, by "sculpting" state space behavior via connection strengths so that attractors and basin boundaries are in useful locations. However, such models are inherently limited in the sophistication of the behavior they can model. If we wish to investigate the possibility of designing networks which can self-organize and develop increasingly deep behavior, particularly the ability to interact with a changing environment and "learn" more than simple input/output mapping or pattern recall, we should consider networks with complex, EOC dynamics.
Designing or analyzing the behavior of a system at the EOC, however, is difficult; no general techniques have been developed. One usually discovers such a system by fortuitous observation, and then develops a phenomenological description of the system in question in terms of the various observed types of local metastable structures and their interactions. One example of this approach is the study of the John Conway's "Game of Life" CA, and in particular the proof that this CA is capable of constructing a Turing machine and thus of supporting universal computation [38, 39]. Another example is economics, where one begins with a general economic theory and then spends most of one's time classifying exceptions to the theory [12]. For surface waves in shallow water, an example drawn from nonlinear dynamics, one can study solitons and their interactions; in this case, however, the inverse scattering transform allows a complete theory of nonlinear, interacting modes [40].
Mitchell, Crutchfield, et al. have had some success using genetic algorithms to "evolve" CA systems capable of performing a specified computational task [32, 33, 41]. They found that genetic algorithm methods "discovered" CA rules which operated at the edge of chaos, and which used propagating and interacting "particles" (localized, stable structures) to transmit information and perform computation.
C. Modeling Emergent Modeling
Crutchfield, Mitchell, and their associates have suggested a framework, the "calculi of emergence," for discussing the modeling capabilities of adaptive dynamical systems with computational abilities [34]. In this framework, one considers a dynamical system as an "adaptive agent" which monitors a data stream representing measurements of its environment, and which attempts to represent or encode the data stream as efficiently as possible by discovering regularities within it. One classifies the modeling abilities of an agent by the computational class of its regularity-detecting algorithm; the computational classes used are an extension of those developed in formal computation theory, e.g. "finite stack automaton," "universal Turing machine," etc. The complexity of the data stream, as perceived by the agent, is defined to be a measure of the computational resources required under the agent's current computational scheme.
In this framework, the most significant aspect of "emergent" modeling capabilities is the ability of an agent to "discover" or "innovate" a more powerful modeling algorithm (computational class) when it has insufficient computational resources to model its environment with its current algorithm. Crutchfield does not discuss any particular dynamical system nor propose specific mechanisms by which a system might innovate a new algorithm and develop the required structures; his intent is to provide a language for describing and analyzing the process. One might hope to apply this framework to the modeling abilities of neural networks.
References
(It has recently come to our attention that the reference list for this essay is incomplete, and is missing numbers 35 through 41. Unfortunately, the earliest archived copy we have (1999) is similarly incomplete. Since the piece is over ten years old, and the field has evolved dramatically since then, we don't see much point in investing the time to reconstruct the original list. Apologies for any inconvenience. — IDB, 2005-05-27.)
- Sompolinsky, H., Statistical Mechanics of Neural Networks. Physics Today, 1988. 41(12 (Dec)): p. 70-80.
- Garrido, L., ed. Statistical Mechanics of Neural Networks: Proceedings of the XIth Sitges Conference, Sitges, Barcelona, Spain, 3-7 June 1990. Lecture Notes in Physics, ed. W. Beiglbock. Vol. 368. 1990, Springer-Verlag: New York. 477.
- Gardner, E., The space of interactions in neural network models. Journal of Physics A, 1988. 21: p. 257-270.
- Hopfield, J.H., Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences USA, 1982. 79(April): p. 2554-2558.
- McClelland, J.L. and D.E. Rumelhart, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986, Cambridge, MA: MIT Press.
- Watkin, T.L.H., On optimal neural network learning. Physica A, 1993. 200(Proceedings): p. 628-635.
- Brunel, N., Effect of synapse dilution on the memory retrieval in structured attractor neural networks. Journal de Physique I, 1993. 3(August): p. 1693-1715.
- Farmer, J.D., A rosetta stone for connectionism. Physica D, 1990. 42: p. 153-187.
- Kauffman, S.A. and S. Johnsen, Coevolution to the edge of chaos: Coupled fitness landscapes, poised states, and coevolutionary avalanches. Journal of Theoretical Biology, 1991. 149: p. 467-505.
- Bak, P. and K. Sneppen, Punctuated equilibrium and criticality in a simple model of evolution. Physical Review Letters, 1993. 71(24): p. 4083-4086.
- Flyvbjerg, H., K. Sneppen, and P. Bak, Mean field theory for a simple model of evolution. Physical Review Letters, 1993. 71(24): p. 4087-4090.
- Anderson, P.W., K. Arrow, and D. Pines, ed. The Economy as an Evolving Complex System. Santa Fe Institute Studies in the Sciences of Complexity, 1988, Addison-Wesley: Reading, MA.
- Goodwin, B., Development as a robust natural process, in Thinking About Biology, F. Varela and W. Stein, Editor. 1992, Addison-Wesley: Reading, MA.
- Waldrop, M.M., Complexity: The Emerging Science at the Edge of Order and Chaos. 1992, New York: Simon & Schuster. 380.
- Lewin, R., Complexity: Life at the Edge of Chaos. 1992, New York: Collier Books.
- Anderson, P.W., Is complexity physics? Is it science? What is it? Physics Today, 1991. 1991(July): p. 9-11.
- Watkin, T.L.H. and A. Rau, The statistical mechanics of learning a rule. Reviews of Modern Physics, 1993. 65(2 (April)): p. 499-556.
- Grinaisty, M., "Cavity-Approach" analysis of the neural-network learning problem. Physical Review E, 1993. 47(6): p. 4496-4513.
- Barto, A.G., S.J. Bradtke, and S. Singh P., Learning to act using real-time dynamic programming. 1993, University of Massachusetts, Dept. of Computer Science:
- Engel, A. and C. Van den Broeck, Systems that can learn from examples: Replica calculation of uniform convergence bounds for perceptrons. Physical Review Letters, 1993. 71(11 (13 Sept.)): p. 1772-1775.
- Engel, A. and C. Van den Broeck, Replica calculation of the Vapnik-Chervonenkis bound for the perceptron. Physica A, 1993. 200(Proceedings): p. 636-643.
- Gullapalli, V. and A.G. Barto. Shaping as a method for accelerating reinforcement learning. in 1992 IEEE International Symposium on Intelligent Control. 1992. Glasgow, U.K.:
- Hertz, J., A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation. Lecture Notes: Santa Fe Institute in the Studies of Complexity, 1991, Redwood City, California: Addison-Wesley. 327.
- Crisnati, A., M. Falcioni, and A. Vulpiani, Transition from regular to complex behaviour in a discrete deterministic asymmetrical neural network model. Journal of Physics A, 1993. 26: p. 3441-3453.
- Herz, A., et al., The Hebb rule: Storing static and dynamic objects in an associative neural network. Europhysics Letters, 1988. 7(7): p. 663-669.
- Herz, A., R. Kuuml;hn, and J.L. van Hemmen, Hebbian learning reconsidered: Representation of static and dynamic objects in associative neural nets. Biological Cybernetics, 1989. 60(6): p. 457-467.
- Langton, C.G., Studying artificial life with cellular automata. Physica D, 1986. 22: p. 120-149.
- Packard, N.H. Adaptation toward the edge of chaos. in Dynamic Patterns in Complex Systems. 1987. Fort Lauderdale, Fl.: World Scientific.
- Wootters, W.K. and C.G. Langton, Is there a sharp phase transition for deterministic cellular automata? Physica D, 1990. 45: p. 95-104.
- Binder, P.-M., Parametric ordering of complex systems. Physical Review E, 1994. 49(3): p. 2023-2025.
- Li, W., N.H. Packard, and C.G. Langton, Transition phenomena in cellular automata rule space. Physica D, 1990. 45: p. 77-94.
- Mitchell, M., P.T. Hraber, and J.P. Crutchfield, Revisiting the edge of chaos: Evolving cellular automata to perform computations. Complex Systems, 1993. 7: p. 89-130.
- Crutchfield, J.P. and M. Mitchell, The evolution of emergent computation. 1994, Santa Fe Institute:
- Crutchfield, J.P., The calculi of emergence: Computation, dynamics, and induction. Physica D, 1994. 75: p. 11-54.