Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

By Ramprasaath R. Selvaraju Many popular and well-performing models for multi-modal, vision and language tasks exhibit poor visual grounding -- failing to appropriately associate words or phrases with the image regions they denote and relying instead on superficial linguistic correlations. For example, answering the question “What color are the bananas?” with yellow regardless of their … Continue reading Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Embodied Amodal Recognition: Learning to Move to Perceive Objects

By Jianwei Yang and Zhile Ren With the rapid development of computer vision, several technologies such as object detection and image classification are becoming mature and effective. Those vision algorithms play important roles in many real-world systems, enabling applications ranging from augmented reality to self-driving cars.  The pipeline for designing a typical computer vision system … Continue reading Embodied Amodal Recognition: Learning to Move to Perceive Objects

Overcoming Large-scale Annotation Requirements for Understanding Videos in the Wild

By Min-Hung Chen, Zsolt Kira and Ghassan AlRegib Videos have become an increasingly important type of media from which we obtain valuable information and knowledge. This motivates the need for the development of video analysis techniques. The development of these techniques could, for example, provide recommendations or support discovery for different objectives. Given the recent … Continue reading Overcoming Large-scale Annotation Requirements for Understanding Videos in the Wild

Snapshots of ICML 2019

The 36th International Conference on Machine Learning (ICML) is by all accounts a premier conference in the machine learning world. Thousands of papers are submitted and thousands of people from around the world travel to attend the weeklong conference. This year was no different with over 6,000 attendees and 2,473 submitted papers. Only 621 papers … Continue reading Snapshots of ICML 2019

Playing Text-adventure Games with an AI

By Prithviraj Ammanabrolu People affect change in the world all the time using natural language communication. Grounding such communication in real world actions is a well-studied and notoriously complex task, even the data gathering step is difficult. So does there exist a platform on which we could more easily simulate such communication? And the answer … Continue reading Playing Text-adventure Games with an AI

ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging

By Samarth Brahmbhatt and Charlie Kemp Paper (CVPR 2019 oral) | bib | Explore ContactdB Paper by Samarth Brahmbhatt, Cusuh Ham, Charlie Kemp, and James Hays Georgia Institute of Technology Many times a day, people effortlessly grasp objects, yet human grasping is a complex phenomenon that has proven challenging to emulate and analyze. If robots … Continue reading ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging

Mixing Frank-Wolfe and Gradient Descent

By Sebastian Pokutta, associate director of ML@GT TL;DR: This is an informal summary of our recent paper Blended Conditional Gradients with Gábor Braun, Dan Tu, and Stephen Wright, showing how mixing Frank-Wolfe and Gradient Descent gives a new, very fast, projection-free algorithm for constrained smooth convex minimization. What is the paper about and why you might care Frank-Wolfe methods [FW] … Continue reading Mixing Frank-Wolfe and Gradient Descent