Extracting and understanding the high-level semantic information in vision and text data is considered as one of the key capabilities of effective artificial intelligence (AI) systems, which has been explored in many areas of AI, including computer vision, natural language processing, machine learning, data mining, knowledge representation, etc. Due to the success of deep representation learning, we have observed increasing research efforts in the intersection between vision and language for a better understanding of semantics, such as image captioning, visual question answering, etc. Besides, exploiting external semantic knowledge (e.g., semantic relations, knowledge graphs) for vision and text understanding also deserves more attention: The vast amount of external semantic knowledge could assist in having a “deeper” understanding of vision and/or text data, e.g., describing the contents of images in a more natural way, constructing a comprehensive knowledge graph for movies, building a dialog system equipped with commonsense knowledge, etc.
This one-day workshop will continue the first workshop on the same topic that was successfully held at IJCAI-2019. The workshop will provide a forum for researchers to review the recent progress of vision and text understanding, with an emphasis on novel approaches that involve deeper and better semantic understanding of vision and text data. The workshop is targeting a broad audience, including the researchers and practitioners in computer vision, natural language processing, machine learning, data mining, etc.