Devi Parikh of School of Interactive Computing was awarded the Google Faculty Research Award for 2017. Her project is entitled “Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering” and aims to counter language biases and elevate the role of image understanding in Visual Question Answering (VQA). She describes the project as
The complex compositional structure of language makes problems at the intersection of vision and language challenging. But language also provides a strong prior that can result in good superficial performance, without the underlying models truly understanding the visual content.
The goal of this project is to counter language biases and elevate the role of image understanding in Visual Question Answering (VQA). In VQA, given an image and a free-form natural language question about the image the machine’s task is to automatically produce a concise, accurate, free-form, natural language answer.
Specifically, this project involves collecting a balanced VQA dataset with significantly reduced language biases, training a VQA model that leverages the balanced dataset and a novel loss function to focus on detailed image understanding, and developing an explanation modality, where the VQA model justifies its answer to an image-question pair by providing a counter-example.