Bruno A. Olshausen

Selected Publications


Olshausen BA, Field DJ (1996). Sparse coding with an overcomplete basis set: A strategy employed by V1? Submitted to Vision Research.

Previously, we have shown that the receptive field properties of simple-cells in area V1 may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images (Nature, 381:607-609). Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcomplete---i.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions overlap (i.e., are non-orthogonal and not linearly independent of each other) sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the input-output function will deviate from being purely linear. Interestingly, these deviations from linearity provide a potential explanation for the weak forms of non-linearity observed in the response properties of cortical simple cells. Further predictions of the model and proposed experimental tests are discussed.

Olshausen BA, Field DJ (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381: 607-609.

The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and bandpass (selective to structure at different spatial scales), comparable to the basis functions of wavelet transforms. One approach to understanding such response properties of visual neurons has been to consider their relationship to the statistical structure of natural images in terms of efficient coding. Along these lines, a number of studies have undertaken to train unsupervised learning algorithms on natural images in the hope of developing receptive fields with similar properties but none has succeeded in producing a full set that spans the image space and contains all three of the above properties. Here, we investigate the proposal that a coding strategy which maximizes sparseness is sufficient to account for these properties. We show that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the striate cortex. The resulting sparse image code provides a more efficient representation for later stages of processing because it possesses a higher degree of statistical independence among its outputs.

Olshausen BA, Field DJ (1996). Natural image statistics and efficient coding. Presented at the Workshop on Information Theory and the Brain , September 4-5, 1995, University of Stirling, Scotland. Network, vol. 7.

Natural images contain characteristic statistical regularities that set them apart from purely random images. Understanding what these regularities are can enable natural images to be coded more efficiently. In this paper, we describe some of the forms of structure that are contained in natural images, and we show how these are related to the response properties of neurons at early stages of the visual system. Many of the important forms of structure require higher-order (i.e., more than linear, pairwise) statistics to characterize, which makes models based on linear hebbian learning, or principal components analysis, inappropriate for finding efficient codes for natural images. We suggest that a good objective for an efficient coding of natural scenes is to maximize the sparseness of the representation, and we show that a network that learns sparse codes of natural scenes succeeds in developing localized, oriented, bandpass receptive fields similar to those in the primate striate cortex.

Lee CW, Olshausen BA (1996). A nonlinear Hebbian network that learns to detect disparity in random-dot stereograms. Neural Computation, 8, 545-566.

An intrinsic limitation of linear, Hebbian networks is that they are capable of learning only from the linear pairwise correlations within an input stream. In order to explore what higher forms of structure could be learned with a nonlinear Hebbian network, we have constructed a model network containing a simple form of nonlinearity and we have applied the network to the problem of learning to detect the disparities present in random-dot stereograms. The network consists of three layers, with nonlinear, sigmoidal activation functions in the second layer units. The nonlinearities allow the second layer to transform the pixel-based representation in the input into a new representation based on coupled pairs of left-right inputs. The third layer of the network then clusters patterns occurring on the second layer outputs according to their disparity via a standard competitive learning rule. Analysis of the network dynamics shows that the second-layer units' nonlinearities interact with the Hebbian learning rule to expand the region over which pairs of left-right inputs are stable. The learning rule is neurobiologically inspired and plausible, and the model may shed light on how the nervous system learns to use coincidence detection in general.

Olshausen BA, Koch C (1995). Selective visual attention. In M. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks. MIT Press.

(Review article.)

Olshausen BA, Anderson CH, Van Essen DC (1995). A multiscale routing circuit for forming size- and position-invariant object representations. The Journal of Computational Neuroscience, 2, 45-62.

We describe a neural model for forming size- and position-invariant representations of visual objects. The model is based on a previously proposed dynamic routing circuit (Olshausen et al., 1993, J Neurosci, 13: 4700-4719) that remaps selected portions of an input array into an object-centered reference frame. Here, we show how a multiscale representation may be incorporated at the input stage of the model, and we describe the control architecture and dynamics for a hierarchical, multistage routing circuit. Specific neurobiological substrates and mechanisms for the model are proposed, and a number of testable predictions are described.

Olshausen BA, Anderson CH, Van Essen DC (1993). A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. The Journal of Neuroscience, 13(11), 4700-4719.

We present a biologically plausible model of an attentional mechanism for forming position- and size-invariant representations of objects in the visual world. The model relies on a set of control neurons to dynamically modify the synaptic strengths of intracortical connections so that information from a windowed region of primary visual cortex (V1) is selectively routed to higher cortical areas. Local spatial relationships (i.e., topography) within the attentional window are preserved as information is routed through the cortex. This enables attended objects to be represented in higher cortical areas within an object-centered reference frame that is position and scale invariant. We hypothesize that the pulvinar may provide the control signals for routing information through the cortex. The dynamics of the control neurons are governed by simple differential equations that could be realized by neurobiologically plausible circuits. In preattentive mode, the control neurons receive their input from a low-level "saliency map" representing potentially interesting regions of a scene. During the pattern recognition phase, control neurons are driven by the interaction between top-down (memory) and bottom-up (retinal input) sources. The model respects key neurophysiological, neuroanatomical, and psychophysical data relating to attention, and it makes a variety of experimentally testable predictions.

Van Essen DC, Olshausen B, Anderson CH, Gallant JL (1991). Pattern recognition, attention, and information bottlenecks in the primate visual system. In: Proc. SPIE Conf. on Visual Information Processing: From Neurons to Chips, 1473, p. 17-28.

The primate visual system has evolved impressive capabilities for recognizing complex patterns in natural images. This process involves many stages of analysis and a variety of information processing strategies. Here, we concentrate on the importance of "information bottlenecks," which restrict the amount of information that can be handled at different stages of analysis. We believe these steps are crucial for reducing the overwhelming computational complexity associated with recognizing countless objects from arbitrary viewing angles, distances, and perspectives. The process of directed visual attention is an especially important information bottleneck because of its flexibility in determining how information is routed to high-level pattern recognition centers.