Previously, we have shown that the receptive field properties of simple-cells
in area V1 may be
accounted for in terms of a strategy for producing a sparse
distribution of output activity in response to natural images
(Nature, 381:607-609). Here, in addition to describing this
work in a more expansive fashion, we examine the neurobiological
implications of sparse coding. Of particular interest is the case
when the code is overcomplete---i.e., when the number of code elements
is greater than the effective dimensionality of the input space.
Because the basis functions overlap (i.e., are non-orthogonal and not
linearly independent of each other) sparsifying the code will recruit
only those basis functions necessary for representing a given input,
and so the input-output function will deviate from being purely
linear. Interestingly, these deviations from linearity provide a
potential explanation for the weak forms of non-linearity observed in
the response properties of cortical simple cells. Further predictions
of the model and proposed experimental tests are discussed.
The spatial receptive fields of simple cells in mammalian striate
cortex have been reasonably well described
physiologically
and can be characterized as being localized, oriented, and
bandpass (selective to structure at different spatial scales),
comparable to the basis functions of wavelet
transforms. One approach to
understanding such response properties of visual neurons has been to
consider their relationship to the statistical structure of natural
images in terms of efficient
coding.
Along these lines, a number of studies have undertaken to train
unsupervised learning algorithms on natural images in the hope of
developing receptive fields with similar
properties
but none has succeeded in producing a full set that spans the image
space and contains all three of the above properties. Here, we
investigate the proposal that a coding
strategy which maximizes sparseness is sufficient to account for these
properties. We show that a learning algorithm that attempts to find
sparse linear codes for natural scenes will develop a complete family
of localized, oriented, bandpass receptive fields, similar to those
found in the striate cortex. The resulting sparse image code provides
a more efficient representation for later stages of processing because
it possesses a higher degree of statistical independence among its
outputs.
Natural images contain characteristic statistical regularities that
set them apart from purely random images. Understanding what these
regularities are can enable natural images to be coded more
efficiently. In this paper, we describe some of the forms of
structure that are contained in natural images, and we show how these
are related to the response properties of neurons at early stages of
the visual system. Many of the important forms of structure require
higher-order (i.e., more than linear, pairwise) statistics to
characterize, which makes models based on linear hebbian learning, or
principal components analysis, inappropriate for finding efficient
codes for natural images. We suggest that a good objective for an
efficient coding of natural scenes is to maximize the sparseness of
the representation, and we show that a network that learns sparse
codes of natural scenes succeeds in developing localized, oriented,
bandpass receptive fields similar to those in the primate striate
cortex.
Lee CW, Olshausen BA (1996). A nonlinear Hebbian network that learns
to detect disparity in random-dot stereograms.
Neural Computation, 8, 545-566.
An intrinsic limitation of linear, Hebbian networks is that they are capable
of learning only from the linear pairwise correlations within an input
stream. In order to explore what higher forms of structure could be
learned with a nonlinear Hebbian network, we have constructed a model
network containing a simple form of nonlinearity and we have applied the
network to the problem of learning to detect the disparities present in
random-dot stereograms. The network consists of three layers, with
nonlinear, sigmoidal activation functions in the second layer units. The
nonlinearities allow the second layer to transform the pixel-based
representation in the input into a new representation based on coupled
pairs of left-right inputs. The third layer of the network then clusters
patterns occurring on the second layer outputs according to their disparity
via a standard competitive learning rule. Analysis of the network
dynamics shows that the second-layer units' nonlinearities interact with
the Hebbian learning rule to expand the region over which pairs of
left-right inputs are stable. The learning rule is neurobiologically
inspired and plausible, and the model may shed light on how the
nervous system learns to use coincidence detection in general.
Olshausen BA, Koch C (1995). Selective visual attention.
In M. Arbib (Ed.), The Handbook of Brain Theory and Neural
Networks. MIT Press.
(Review article.)
Olshausen BA, Anderson CH, Van Essen DC (1995). A multiscale routing
circuit for forming size- and position-invariant object representations.
The Journal of Computational Neuroscience, 2, 45-62.
We describe a neural model for forming size- and position-invariant
representations of visual objects. The model is based on a previously
proposed dynamic routing circuit (Olshausen et al., 1993, J Neurosci,
13: 4700-4719) that remaps selected portions of an input array into an
object-centered reference frame. Here, we show how a multiscale
representation may be incorporated at the input stage of the model,
and we describe the control architecture and dynamics for a
hierarchical, multistage routing circuit. Specific neurobiological
substrates and mechanisms for the model are proposed, and a number of
testable predictions are described.
Olshausen BA, Anderson CH, Van Essen DC (1993). A neurobiological
model of visual attention and invariant pattern recognition based on
dynamic routing of information. The Journal of Neuroscience,
13(11), 4700-4719.
We present a biologically plausible model of an attentional mechanism
for forming position- and size-invariant representations of objects in
the visual world. The model relies on a set of control neurons to
dynamically modify the synaptic strengths of intracortical connections
so that information from a windowed region of primary visual cortex
(V1) is selectively routed to higher cortical areas. Local spatial
relationships (i.e., topography) within the attentional window are
preserved as information is routed through the cortex. This enables
attended objects to be represented in higher cortical areas within an
object-centered reference frame that is position and scale invariant.
We hypothesize that the pulvinar may provide the control signals for
routing information through the cortex. The dynamics of the control
neurons are governed by simple differential equations that could be
realized by neurobiologically plausible circuits. In preattentive
mode, the control neurons receive their input from a low-level
"saliency map" representing potentially interesting regions of a
scene. During the pattern recognition phase, control neurons are
driven by the interaction between top-down (memory) and bottom-up
(retinal input) sources. The model respects key neurophysiological,
neuroanatomical, and psychophysical data relating to attention, and it
makes a variety of experimentally testable predictions.
Van Essen DC, Olshausen B, Anderson CH, Gallant JL
(1991). Pattern recognition, attention, and information bottlenecks in
the primate visual system. In: Proc. SPIE Conf. on Visual
Information Processing: From Neurons to Chips, 1473, p. 17-28.
The primate visual system has evolved impressive capabilities for
recognizing complex patterns in natural images. This process involves
many stages of analysis and a variety of information processing
strategies. Here, we concentrate on the importance of "information
bottlenecks," which restrict the amount of information that can be
handled at different stages of analysis. We believe these steps are
crucial for reducing the overwhelming computational complexity
associated with recognizing countless objects from arbitrary viewing
angles, distances, and perspectives. The process of directed visual
attention is an especially important information bottleneck because of
its flexibility in determining how information is routed to high-level
pattern recognition centers.