Modeling search for people in 900 scenes
Krista Ehinger*, Barbara Hidalgo-Sotelo*, Antonio Torralba, Aude Oliva
Visual Cognition , Vol. 17, 945-978.
First Published online: 25 June 2009.
How predictable are human eye movements as they search real world scenes? Here, we recorded 14 observersí eye movements as they performed a search task (person detection) on 912 outdoor scenes. Searchers demonstrated high consistency of fixation locations, even when the target was absent from the scene. Furthermore, observers tended to fixate consistent regions even when those regions were not visually salient. We modeled three sources of guidance: saliency, target features, and scene context. Each of these sources independently outperformed a smart chance level at predicting human fixations. Models that combine sources of guidance predicted 94% of human agreement, with the scene context module providing the most explanatory power. Critically, none of the models could reach the precision and fidelity of a human-based attentional map. This work establishes a benchmark for computational models of search in real world scenes. Further improvements in modeling should capture mechanisms underlying the selectivity of observerís fixations during search.
Read poster (presented at the Scene Understanding Syposium, January 2009)
View high-resolution figures
July 03 2009