It sounds like a great idea to utilize a depth map to extract information and control the depth depicted in an image or series of images. It sounds great for converting a 2D image into a 3D image. It sounds like a great tool for plenoptic cameras to interpolate the data into imagery with depth. Alpha channels are great to use for transparency mapping – so a depth map should be equally useful, shouldn’t it?
Take a look at this depth map:
This is a depth map created from a plenoptic camera shot of a bunch of ice bits. It is a grayscale image with 256 shades of gray to depict the parts of the ice that are closer to the camera and the parts of the ice that are farther away from the camera. This information is used to adjust the depth of those bits that are closer and farther away by stretching or compressing pixels.
Now check out a rocking animation that uses motion parallax to depict the depth (items closer to you appear to move differently than items that are farther away).
Right away you can notice a few errors in the depth map, and for complex images this is typical and can be edited and “corrected”. But there is something else. Take a close look at the parts of the image where the depth map is seemingly correct. Sure, you can see the depth but does it really look like ice? If you are like me, the answer is no. Ice reflects and scatters light in a way that is unique for each perspective. Indeed, there IS binocular rivalry where one eye sees light reflection and distortion that is not present in the other eye’s perspective. This disparity tells us something about the texture and makeup of what we are looking at. Stretching or compressing pixels eliminates this information and only provides depth cues relating to the spatial position of things. For most people, I suspect it is reasonable to assume that this creates a perception conflict in their brains. There is something perceptually wrong with the image above. It does not look like ice because the light coming off of the two perspectives looks the same. A depth map does not provide information regarding binocular rivalry and creates errors as a result. Errors that can’t be fixed. Herein you see the flaw in using a depth map. It throws away all of the binocular rivalry information. In other words, it throws away the information between perspectives that is different.
In my opinion, depth maps take the life out of an image. It removes important texture information which, I believe, is gleaned from how light shifts and changes and appears and disappears as you alter perspective.
This is the secret fundamental flaw with depth maps. Now you can subjectively look at the image above and deem it to be cool and otherwise amazing. That is all good and well, but the truth is that, compared with looking at the real ice, it is fundamentally lacking and does not depict what is seen when you look at the ice in real life.
So, people ask themselves if this is important and some will say yes and some will say no. And there are many examples where you could argue both points of view. I don’t have an argument with that. My position is only to point out that this flaw exists and it should not be ignored.
It might be time to expand the way we think of human visual perception. What we “see” is a construct of our brain and how it processes the stream of data that is input from our senses. The vast amount of raw data that our brains receive from our eyes, set aside the data from our other senses for now, is not something that we typically think about. We open our eyes and see stuff. We’ve spent a lot of time learning about the parts of the eye and the mechanics, but I’m not sure that teaches us very much about “seeing”.
Understanding computers gives us a new way to think about this, specifically the converting of data (the signals our eyes send to the brain) into conscious perception. We aren’t born with all of the “software” needed to perceive the signals coming from our eyes. “Software” is created over time as the brain interprets and learns cause and effect through experience. I believe the brain never stops tweaking that processing and makes all sorts of modifications in the same way that computer software has upgrades that provide desirable new features and ease of use functions and performance enhancements and so on.
What we see and how we perceive what we see is a function of the snapshot in time of the current version of our vision “software”. Maybe that’s a radical idea, but there is anecdotal evidence that this might be true. I became aware of it when I noticed that each time I looked at a 3D image of an African tribal mask that it looked different from what I remembered. It was the same picture, it had not changed but how I perceived the image did change.
The weird thing about the image of the mask was that I did not have the same reaction to a 2D image of it. The 2D image always looked the same. The 3D image always looked slightly different. In my experience, my brain seems to be much more aggressive at tweaking how I perceive images with depth than it is when I look at flat images.
Having said that, it isn’t noticeable for all 3D images. Images that are life size or larger than life size and ones that I have some level of interest in seem to change in a more noticeable way. I’m curious if other 3D enthusiasts have experienced this.
I think it might be more pronounced with a 3D image because it is an illusion with perception conflicts that the brain must reconcile in some way.
Taking a stroll walking the dog a few evenings ago as the sun was going down I noticed that as it got darker I was experiencing a transition in my human vision processing system. Something that happens all of the time, but we seldom pay attention to it while it is happening. When the light is good, we pick out details and texture information. Edges define boundaries and the space between things. We interpret distance and notice sudden movements of things. But as it gets darker and darker there is a transition to shape based interpretation. There isn’t enough information to identify details and texture, so it would seem that the brain (my brain anyway) transitioned to processing shapes. The mood changed and sounds seemed to take on a different character that directed the direction of my gaze much more so than when it was lighter. The experience of walking and looking around definitely made a transition as it got darker and darker to something quite different.
This experience got me thinking about all of the different ways that we look at things and experience the space that we occupy. We have vision comfort zones where we are casual observers and don’t pay much attention to what we are looking at. Indeed, we can almost shut off our conscious analysis of the visual data streaming in from our eyes while we engage our attention thinking about something or talking on the phone or listening to music. When it is too dark to make out the informational details, we engage our imagination and try to find shapes and patterns that are familiar in the darkness.
It would seem that there are many modes to seeing. A few of the modes could be described as experiential, referential, interpretive, imaginative, detail, abstract, conscious and subconscious. And each one of those modes are impacted by the type of illumination and amount of illumination. Indeed, it would seem to me that sound, smell and state of mind equally impact these various modes. Quickly, it becomes obvious that what goes on in the human vision processing system is far more complex than we give it credit for. What we perceive and what we see are two different things that are dependant upon an exponentially large number of random possibilities. Yet, out of what should be confusing xaos that causes us to lose our minds – we make no special effort to see things and process vision in our brains instantly in ways that we don’t even think about. It is simple and automatic. We make all of the transitions effortlessly. When you step back for a second, however, and think about how limited our vision processing system is – only two eyes with limited luminance dynamic range, limited frequency bandwidth, distorted optics and field of view, persistence of vision or amount of individual time slices that can be interpreted independently. All of those limitations don’t occur to us because the brain fills in the blanks and we are ignorant of what we are missing.
Maybe this explains why we put such a high priority on esthetics and subjective interpretation. We are wired to simplify the complexity down to chunks that are easily absorbed and found useful. Yet, at the same time we are attracted to high levels of detail when we are interested in something. Stereovision helps to bridge the gap between providing a potential exponential increase in information, on demand when needed and a processing system that is simple enough to parse out only the bits of visual information that we need to have to not overwhelm our ability to make sense out of what we are looking at. It would seem that our vision system has evolved into a very carefully balanced high level system of links to different thought processes and interpretations and emotional experiences that instantly trigger response at the subconscious level. What is interesting is that we have the ability to go far beyond our current capabilities. To see things in new ways with biomechanical appliances and add dramatically to the capabilities of our vision processing system. Indeed, much will be revealed in the future that shows what we see now isn’t “real” at all. What we see in the year 2013 is magical vision that filters out things that our brains haven’t decided are useful… yet.