Publications (Computer Vision)
Pan, B., Panda, R., Jin, S., Feris, R., Oliva, A., Isola, P., & Kim, Y. (2023).
LangNav: Language as a Perceptual Representation for Navigation.
Submitted
arXiv Paper
Sun, X., Panda, R., Chen, C-F., Wang, N., Pan, B., Oliva, A., Rogerio, R., & Saenko, K. (2023).
Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths.
In Press, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)
arXiv Paper
Paper
Cascante-Bonilla, P., Shehada, K., Smith, J.S., Doveh, S., Kim, D., Panda, R., Varol, G., Oliva A., Ordonez, V., Feris, R., & Karlinsky, L. (2023).
Going Beyond Nouns With Vision & Language Models Using Synthetic Data.
The 19th International Conference on Computer Vision (ICCV 2023)
Paper
Supplementary Material
Website
News
Fosco, C., Jin, S., Josephs, E., & Oliva, A. (2023).
Leveraging Temporal Context in Low Representational Power Regimes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), pp. 10693–10703
Paper
Supplementary Material
Website
Kim, Y., Mishra, S., Jin, S., Panda, R., Kuehne, H., Karlinsky, L., Saligrama, V., Saenko, K., Oliva, A., & Feris, R. (2022).
How Transferable are Video Representations Based on Synthetic Data?
Conference on Neural Information Processing Systems (NeurIPS 2022) Datasets and Benchmarks Track, 35, 35710–35723
Paper
Supplementary Material
News
Fosco*, C., Josephs*, E., Andonian, A., Lee, A. , Wang, X. & Oliva, A. (2022).
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines.
arXiv:2206.00535.
arXiv Paper
Website
Grauman, K. et al. (2022).
Ego4D: Around the World in 3,000 Hours of Egocentric Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), pp. 18995–19012
arXiv Paper
Paper
Website
Pan, B., Jiang, Y., Panda, R., Wang, Z., Feris, R., & Oliva, A. (2021).
IA-RED²: Interpretability-Aware Redundancy Reduction for Vision Transformer.
Advances in Neural Information Processing Systems (NeurIPS 2021), 34, 24898-24911
arXiv Paper
Paper
Website
Bau*, D., Andonian*, A., Cui, A., Park, Y., Jahanian, A., Oliva, A., & Torralba, A. (2021).
Paint by Word.
arXiv, 2103.10951
arXiv Paper
Monfort*, M., Jin*, S., Liu, A., Harwath, D., Feris, R., Glass, J., & Oliva, A. (2021).
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021), (pp. 14871-14881).
Paper
Supplementary Material
Website
Chen, C., Panda, R., Ramakrishnan, K., Feris, R., Cohn, J., Oliva, A., & Fan, Q. (2021).
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021), (pp. 6165-6175).
arXiv Paper
Bylinskii, Z., Madan, S., Tancik, M., Recasens, A., Zhong, K., Alsheikh, S., Pfister, H.,
Oliva, A., & Durand, F. (2021).
Parsing and Summarizing Infographics with Synthetically Trained Icon Proposals.
In 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), (pp. 31-40).
arXiv Paper
Website
Monfort, M., Ramakrishnan, K., Andonian, A., McNamara, B., Lascelles, A., Pan, B., Fan, Q., Gutfreund, D., Feris, R., & Oliva, A. (2021)
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding.
IEEE Pattern Analysis and Machine Intelligence (PAMI), 44(12), 9434-9445.
arXiv Paper
Paper
Website
Andonian*, A., Fosco*, C, Monfort, M., Lee, A., Feris, R., Vondrick, C., & Oliva, A. (2020).
We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos.
Proceedings of the 16th European Conference on Computer Vision (ECCV 2020)
arXiv Paper
Website
News
Meng, Y., Lin, C., Panda, R., Sattigeri, P., Karlinsky, L., Oliva, A. Saenko, K., & Feris, R. (2020).
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition.
Proceedings of the 16th European Conference on Computer Vision (ECCV 2020)
arXiv Paper
Website
Monfort, M., Andonian, A., Zhou, B., Ramakrishnan, K., Adel Bargal, S., Yan, T., Brown, L.,
Fan, Q., Gutfreund, D., Vondrick, C., & Oliva, A. (2020). Moments in Time dataset:
one million videos for event understanding. IEEE Pattern Analysis and Machine
Intelligence (PAMI), 42(2), 502–508.Paper Website
Amini, L., Chen, C.-H., Cox, D., Oliva, A., & Torralba, A. (2020). Experiences and Insights for Collaborative Industry-Academic Research in Artificial
Intelligence. AI Magazine. 41(1), 70–81.
Paper
Ramakrishnan, K., Panda, R., Fan, Q., Henning, J., Oliva, A. & Feris, R. (2020).
Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors.
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
(CVPR-W 2020), Continual Learning in Computer Vision (CLVISION)
Paper
Ramakrishnan*, K., Monfort*, M., McNamara, B., Lascelles, A., Gutfreund, D., Feris, R.,
& Oliva, A. (2019). Identifying Interpretable Action Units in
Deep Networks.IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019) Workshop on Explainable AI.
Paper
Monfort, M., Ramakrishnan, K., McNamara, B., Lascelles, A., Gutfreund, D., Feris, R., & Oliva, A.
(2019). Examining
Interpretable Feature Relationships in Deep Networks for Action recognition.ICML 2019 Workshop Deep Phenomena.
Paper
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2018). Places: A 10 million Image Database for
Scene Recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence
(PAMI), 40(6), 1452–1464.PaperDemo Website Places CNN Models
Madan, S., Bylinskii, Z., Tancik, M., Recasens, A., Zhong, K., Alsheikh, S., Pfister, H., Oliva, A. & Durand, F. (2018).
Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics.
arXiv, 1807.10441.
arXiv Paper
Website
Monfort, M., Johnson, M., Oliva, A., & Hofmann, K. (2017). Asynchronous Data Aggregation for Training Visual Navigation Networks. Lifelong Learning: A Reinforcement Learning Approach Workshop at ICML, Sydney,
Australia. Paper
Bylinskii, Z.*, Alsheikh, S.*, Madan, S.*, Recasens, A.*, Zhong, K., Pfister, H., Durand,
F., & Oliva, A. (2017). Understanding Infographics through Textual and Visual Tag
Prediction. arXiv 1709.09215 arXiv Paper
Borkin*, A.M., Bylinskii*, Z., Kim, N.W., Bainbridge, C.M., Yeh, C.S., Borkin, D., Pfister,
H., & Oliva, A. (2016). Beyond Memorability: Visualization Recognition and
Recall. InfoVis 2015 IEEE Transactions on Visualization and Computer
Graphics, 22(1), 519–528.
Paper
Supplementary Material
Website
Video
News
Zhou, B., Khosla, A., Lapedriza , A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization.
IEEE Conference on Computer Vision and Pattern Recogntion (CVPR), pp. 2921–2929. arXiv Paper Website
Bylinskii, Z., Recasens, A., Borji , A., Oliva, A., Torralba, A., & Durand, F. (2016).
Where should saliency models look next? Proceedings of the
European Conference in Computer Vision (ECCV), Amsterdam Paper Supplementary Material Poster
Vondrick, C, Pirsiavash, H., Oliva, A., & Torralba, A. (2015).
Learning Visual Biases from Human Imagination.Advances in Neural Information
Processing Systems (NIPS), 28.
Paper
Zhou, B., Khosla, A., Lapedriza , A., Oliva, A., & Torralba, A. (2015). Object Detectors emerge in Deep Scene CNNs .International
Conference on Learning Representations (ICLR 2015).
arXiv Paper
Slides
Visualization Places-CNN
Visualization ImageNet-CNN
Isola, P., Xiao, J., Parikh, D, Torralba, A., & Oliva, A. (2014).
What makes a photograph memorable? IEEE Transactions on Pattern
Analysis and Machine Intelligence (PAMI), 36(7), 1469–1482.
Paper
Zhou, B., Liu, L., Oliva, A., & Torralba, A. (2014). Recognizing
City Identity via Attribute Analysis of Geo-tagged Images. Proceedings
of the 13th European Conference on Computer Vision (ECCV). Paper
Khosla, A., Xiao, J., Isola, P., Torralba, A., & Oliva, A. (2012).
Image memorability and visual inception.
In SIGGRAPH Asia 2012 technical briefs, (pp. 1–4).
Paper
Oliva, A. & Torralba, A. (2007). The Role of Context in Object
Recognition. Trends in Cognitive Sciences, 11(12), 520–527.
Paper
Oliva, A. & Torralba, A. (2006). Building the Gist of a Scene: The
Role of Global Image Features in Recognition. Progress in Brain Research:
Visual perception, 155, 23–36.
Paper
Hidalgo-Sotelo, B., Oliva, A., & Torralba, A. (2005). Human Learning
of Contextual Priors for Object Search: Where does the time go? Proceedings of
the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops -
vol. 3, p86–86.Paper
Torralba, A., & Oliva, A. (2003). Statistics of Natural Images
Categories.Network: Computation in Neural Systems, 14, 391–412.
Paper
Oliva, A., Torralba, A. Casthelano, M., & Henderson, J. (2003). Top-Down control of visual attention in object detection.
Proceeding of the IEEE International Conference Image Processing, vol1 (pp 253–256).Paper
Torralba, A., & Oliva, A. (2002). Depth estimation from image
structure. IEEE Pattern Analysis and Machine Intelligence (PAMI),
24, 1226–1238.
Paper
Oliva, A., & Torralba, A. (2002). Scene-centered description from
spatial envelope properties. Lecture Note in Computer Science Serie Proc.
Second International Workshop on Biologically Motivated Computer Vision, Eds: H. Bulthoff, S.W.
Lee, T. Poggio, & C. Wallraven. Srpinger-Verlag, Tuebingen, Germany (pp.263–272).Paper
Guérin-Dugué, A., & Oliva, A. (2000).
Classification of scene photographs from local orientations features.
Pattern Recognition Letters, 21 (13–14), 1135–1140.
Paper
Torralba, A. & Oliva, A. (1999).
Semantic organization of scenes using discriminant structural templates.
Proceedings of the Seventh IEEE International Conference on Computer Vision, Vol. 2, pp. 1253–1258.
Paper
Oliva, A., Torralba, A., Guérin-Dugué, A., & Hérault, J. (1999).
Global semantic classification of scenes using power spectrum templates.
In Challenge of image retrieval, pp. 1–11.
Paper
Hérault, J., Oliva, A., & Guérin-Dugué, A. (1997).
Scene categorisation by curvilinear component analysis of low frequency spectra.
In ESANN'97: European symposium on artificial neural networks, pp. 91–96.
Paper
Copyright Notice:
The documents distributed here have been provided as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.