Estimating perception of scene layout properties from global image features.
Ross, M.G, & Oliva, A. (submitted)
The relationship between image features and scene structure is central to the study of human visual perception and computer vision, but many of the specifics of real-world layout perception remain unknown. We do not know which image features are relevant to perceiving various scene layout properties, or whether those features provide the same information for every type of image. Furthermore, we do not know the spatial resolutions required for perceiving different layout properties. This paper describes an experiment and a computational model that provides new insights on these issues. Humans efficiently perceive the global spatial layout properties of real-world scene such as dominant depth, openness, and perspective, from a single image. In this work, we describe an algorithm that reliably predicts human judgments of these properties. This model’s predictions are general and not specific to the observers it was trained on. Analysis of the results reveals that the optimal spatial resolutions for determining layout vary systematically with the content of the space (natural or manufactured) and with the property being estimated. Openness is best estimated at a high spatial resolution, dominant depth is best estimated at a medium spatial resolution, and perspective is best estimated at a low spatial resolution. Given the reliability and simplicity of estimating the global layout of real-world environments, this model could produce useful prior probabilities for a more detailed three-dimensional scene reconstruction algorithm – providing contextual knowledge to resolve local perceptual ambiguities.