Modeling search for people in 900 scenes

Abstract

How predictable are human eye movements as they search real world scenes? Here, we recorded 14 observers’ eye movements as they performed a search task (person detection) on 912 outdoor scenes. Searchers demonstrated high consistency of fixation locations, even when the target was absent from the scene. Furthermore, observers tended to fixate consistent regions even when those regions were not visually salient. We modeled three sources of guidance: saliency, target features, and scene context. Each of these sources independently outperformed a smart chance level at predicting human fixations. Models that combine sources of guidance predicted 94% of human agreement, with the scene context module providing the most explanatory power. Critically, none of the models could reach the precision and fidelity of a human-based attentional map. This work establishes a benchmark for computational models of search in real world scenes. Further improvements in modeling should capture mechanisms underlying the selectivity of observer’s fixations during search.

Read article
Read poster (presented at the Scene Understanding Syposium, January 2009)
View high-resolution figures

DemoModels.m - Show model predictions and search fixations
Tools folder - Required toolbox
ReadMe

contextModelCode - Code for training scene context model

This code requires the matlabPyrTools steerable pyramid toolbox by Eero Simoncelli:
http://www.cns.nyu.edu/~lcv/software.php

Retraining the scene context model uses the labelMe toolbox and dataset:
http://labelme.csail.mit.edu/

Downloads

	Data set: Image stimuli Data set: Eye data Data set: Context oracle maps
	Pre-generated maps: Target features Maps Saliency Maps