Refers to the perception of three dimensions imparted to a two-dimensional projection of a moving
The original experiments, which demonstrated this effect, used the projected shadow of a wire-frame box.
It is normally referred to as an illusion, because the wire-frame shadow is often perceived to change its rotation-direction as it is being observed, even though it continues to be rotated in a single direction.
. . . . .
More recent examples use a sillouette of a spinning dancer to demonstrate the effect. This seems to have led to a considerable amount of wiki-styled apocrypha, which states, with great authority and with a "settled science" air, that the effect is caused by our top-down expectations of human forms. Like the wire-box projection, the spinning dancer will seem to change rotation-direction as it is being viewed.
There are a variety of ways this effect can manifest in normal every-day experiences. Three are presented below, the most well known of which, is the mask effect (second video).
Above, you may observe the kinetic depth effect
when the image-frame zooms toward, or away, from still photographs. Here, it is likely that making input stimuli consistent with previous ubiquitous experiences of a three dimensional universe, is primarily responsible for the effect. Though consistency with three dimensions is the primary cause of the effect in the above video it probably isn't the only cause. Another ubiquitous perception about our world has been the human form. Filtering afferent
signaling, in order to make it consistent with those previous perceptions, may be playing a supporting role in the above video. Because humans are well-known 'objects' in our experience, we make them the objects of focus in the above photographs (or so the theory goes). The following video further promotes the idea of the role the human form may play in this effect.
A contributing cause of the above effect may be top-down
modifications of input stimuli to make them consistent with past experiences of a universe that has three-dimensions. As forwarded in the video, however, the top-down
filtering of input stimuli may also work to make stimuli consistent with deeply embedded previous experiences regarding human facial features (possibly the primary mechanism at work here?).
. . . . .
Much of the above discussion regarding the role of the human form in the perception of the kinetic depth effect
is simply my parroting of explanations that have been provided by others on the subject. In some circles, these explanations have come to be treated as conventional wisdom, and, as such, they are seldom questioned or tested.
Does the mask effect really have, as its primary cause, our deeply impressed expectations about the human face? One way to test this assertion might be to try to replicate the mask illusion, but without the human facial features that are so often credited for its effectiveness.
In the above example, the human form has been completely eliminated from the concave drawing, yet the cube still pops out and appears convex when set in motion.
About the only thing that is more ubiquitous and consistently uniform
than the appearance of the human face in our lives, is the three dimensional, volumetric, nature of every aspect of our universe.
I'm a firm believer in the idea that nothing should ever be considered conclusive at this level of understanding. That said, we can at least accept that this demonstration provides strong support for the notion that it is top-down
expectations of the volumetric nature of reality, and not so much the human form, that is responsible for the perception.
. . . . . . .
One Possible Explanation?
Unlike our conventional mathematical models, biological neural networks do not have the luxury of operating outside of the linear flow of time. They must function in real-time, and therefore, they are designed to commit, as soon as they can, to what is normally a fairly reliable perception/interpretation of reality. This tendency to commit to a given perception can also be seen in static interpretations as well; for example, when we see a wire-drawing of a 3-D cube.
In the above cube, the three dimensional interpretation is ambiguous, but the neural network will converge on one or the other of the two possibilities. There is always reactive convergence, representing a commitment to a given interpretation, but it can be flipped between the two interpretations.
The cube is perfectly ambiguous between the two choices, which is not always the case for such interpretation distortions (see the blog entry here about The McGurk Effect
). Because the two choices of how to interpret this image as “having volume” are both equally valid, your brain may flip back-and-forth between them. As can be seen in the McGurk effect, distortions of this nature, based on real-time (top-down
) interpretations of reality as it flows in, may also be seen in multi-modal (more than one sense) experiments.
. . . . . . .
It is my convention to refer to this as a perception.
As stated, this is referred to, traditionally, as an illusion. It is more helpful (imo) to think of this effect as the natural by-product of our brain producing a three-dimensional perception of the world around it. In other words, this effect is simply our brain interpreting (or pre-filtering) its sensory inputs in a way which is consistent with all other previously-made observations of its milieu. Or at least as consistent as possible