Modeling search for people in 900 scenes

Krista Ehinger*, Barbara Hidalgo-Sotelo*, Antonio Torralba, Aude Oliva

Visual Cognition , Vol. 17, 945-978.

First Published online: 25 June 2009.

Abstract

How predictable are human eye movements as they search real world scenes? Here, we recorded 14 observers’ eye movements as they performed a search task (person detection) on 912 outdoor scenes. Searchers demonstrated high consistency of fixation locations, even when the target was absent from the scene. Furthermore, observers tended to fixate consistent regions even when those regions were not visually salient. We modeled three sources of guidance: saliency, target features, and scene context. Each of these sources independently outperformed a smart chance level at predicting human fixations. Models that combine sources of guidance predicted 94% of human agreement, with the scene context module providing the most explanatory power. Critically, none of the models could reach the precision and fidelity of a human-based attentional map. This work establishes a benchmark for computational models of search in real world scenes. Further improvements in modeling should capture mechanisms underlying the selectivity of observer’s fixations during search.

Read article
Read poster (presented at the Scene Understanding Syposium, January 2009)
View high-resolution figures


Code

DemoModels.m - Show model predictions and search fixations
Tools folder - Required toolbox
ReadMe

contextModelCode - Code for training scene context model

This code requires the matlabPyrTools steerable pyramid toolbox by Eero Simoncelli:
http://www.cns.nyu.edu/~lcv/software.php

Retraining the scene context model uses the labelMe toolbox and dataset:
http://labelme.csail.mit.edu/


Downloads

Data set: Image stimuli
Data set: Eye data
Data set: Context oracle maps
Pre-generated maps:
Target features Maps
Saliency Maps



Last updated:
July 03 2009