The Second International Workshop on

Bringing Semantic Knowledge into
Vision and Text Understanding

In conjunction with IJCAI-2020, Yokohama, Japan

Time: 1:25 AM -- 3:35 AM (UTC Time), January 8, 2021


Extracting and understanding the high-level semantic information in vision and text data is considered as one of the key capabilities of effective artificial intelligence (AI) systems, which has been explored in many areas of AI, including computer vision, natural language processing, machine learning, data mining, knowledge representation, etc. Due to the success of deep representation learning, we have observed increasing research efforts in the intersection between vision and language for a better understanding of semantics, such as image captioning, visual question answering, etc. Besides, exploiting external semantic knowledge (e.g., semantic relations, knowledge graphs) for vision and text understanding also deserves more attention: The vast amount of external semantic knowledge could assist in having a “deeper” understanding of vision and/or text data, e.g., describing the contents of images in a more natural way, constructing a comprehensive knowledge graph for movies, building a dialog system equipped with commonsense knowledge, etc.

This one-day workshop will continue the first workshop on the same topic that was successfully held at IJCAI-2019. The workshop will provide a forum for researchers to review the recent progress of vision and text understanding, with an emphasis on novel approaches that involve deeper and better semantic understanding of vision and text data. The workshop is targeting a broad audience, including the researchers and practitioners in computer vision, natural language processing, machine learning, data mining, etc.

Workshop Topics

Image and Video Captioning

Visual Question Answering and Visual Dialog

Scene Graph Generation from Visual Data

Video Prediction and Reasoning

Scene Understanding

Knowledge Graph Construction

Knowledge Graph Embedding

Representation Learning

Question Answering over Knowledge Bases

Dialog Systems using Knowledge Graph

Adversarial Generation of Language & Images

Graphical Causal Models

Multimodal Representation and Fusion

Transfer Learning across Vision and Text

Pretrained Models and Meta-Learning

Explainable Text and Vision Understanding

Submission Guidelines

Three types of submissions are invited to the workshop, long papers (up to 7 pages, including all content and references), short papers (up to 4 pages, including all content and references) and demo papers (up to 4 pages, including all content and references).

All submissions should be formatted according to the IJCAI'2020 Formatting Instructions and Templates. Authors are required to submit their papers electronically in PDF format to the Microsoft CMT submission site.

Reviewing for IJCAI-Tusion workshop is double blind (reviewers do not know the author's identity or vice versa). The paper should not contain names or affiliations of the authors.

At least one author of each accepted paper must register for the workshop, and the registration information can be found on the IJCAI-2020 website. The authors of accepted papers should present their work at the workshop.

As in previous years, IJCAI does not have a formal proceeding for workshop papers. All the accepted papers will be made available to the workshop participants.

Any question regarding paper submission, please email us:[AT] or[AT]

Date: January 8th, 2021, 1:25AM--3:35AM UTC time
Location: Zoom meeting on the virtual chair platform
1:25AM--1:30AM Welcome from Organizers
1:30AM--2:00AM Invited Talk I: Physics Guided Machine Learning: A New Paradigm for Scientific Knowledge Discovery

Xiaowei Jia (University of Pittsburgh)

2:00AM--2:15AM Oral 1: Reinforced Learning for History Selection in Machine Reading Comprehension (PDF)

Minghui Qiu, Xinjing Huang, Cen Chen, Feng Ji, and Yin Zhang

2:15AM--2:30AM Oral 2: Integrating Image Captioning with Rule-based Entity Masking (PDF)

Aditya Mogadala, Xiaoyu Shen, and Dietrich Klakow

2:30AM--3:00AM Invited Talk II: Multi-View Learning for Time-Series Classification: From Perspectives of Fusion and Attention

Zhiqiang Tao (Santa Clara University)

3:00AM--3:15AM Oral 3: Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation (PDF)

Hamed Shahbazi, Xiaoli Fern, Reza Ghaeini, Rasha Obeidat, and Prasad Tadepalli

3:15AM--3:30AM Oral 4: COBRA: Contrastive Bi-Modal Representation Learning (PDF)

Vishaal Udandarao, Abhishek Maiti, Deepak Srivatsav, Suryatej Vyalla, Yifang Yin, and Rajiv Shah

3:30AM--3:35AM Closing Remarks

  • Submission Deadline: September 15, 2020 (11:59PM UTC-12)
  • Notification: November 15, 2020 (11:59PM UTC-12)
  • Camera Ready: December 15, 2020 (11:59PM UTC-12)
  • 140x140

    Sheng Li

    Assistant Professor
    University of Georgia


    Yaliang Li

    Research Scientist
    Alibaba Group


    Jing Gao

    Associate Professor
    Purdue University


    Yun Fu

    Northeastern University

    Program Committee

    Daoyuan Chen, Alibaba

    Yang Deng, The Chinese University of Hong Kong

    Jiashi Feng, National University of Singapore

    Vishrawas Gopalakrishnan, IBM

    Xiaodong Jiang, Facebook

    Chaochun Liu, JD Finance AI Lab

    Zhaoyang Liu, Alibaba

    Jiasen Lu, Allen Institute for AI

    Khorrami Pooya, MIT Lincoln Laboratory

    Saed Rezayi, University of Georgia

    Huan Sun, Ohio State University

    Mingming Sun, Baidu Research

    Zhiqiang Tao, Northeastern University

    Zhaowen Wang, Adobe Research

    Yuexiang Xie, Peking University

    Tong Yu, Samsung

    Chenwei Zhang, Amazon

    Handong Zhao, Adobe Research

    Ronghang Zhu, University of Georgia