The First International Workshop on

Bringing Semantic Knowledge into
Vision and Text Understanding

In conjunction with IJCAI-2019, Macao, China

Time: August 11, 2019 Full Day

Location: Sicily 2503

Extracting and understanding the high-level semantic information in vision and text data is considered as one of the key capabilities of effective artificial intelligence (AI) systems, which has been explored in many areas of AI, including computer vision, natural language processing, machine learning, data mining, knowledge representation, etc. Due to the success of deep representation learning, we have observed increasing research efforts in the intersection between vision and language for a better understanding of semantics, such as image captioning, visual question answering, etc. Besides, exploiting external semantic knowledge (e.g., semantic relations, knowledge graphs) for vision and text understanding also deserves more attention: The vast amount of external semantic knowledge could assist in having a “deeper” understanding of vision and/or text data, e.g., describing the contents of images in a more natural way, constructing a comprehensive knowledge graph for movies, building a dialog system equipped with commonsense knowledge, etc.

This workshop will provide a forum for researchers to review the recent progress of vision and text understanding, with an emphasis on novel approaches that involve deeper and better semantic understanding of vision and text data. The workshop is targeting a broad audience, including the researchers and practitioners in computer vision, natural language processing, machine learning, data mining, etc.

Workshop Topics

Image and Video Captioning

Visual Question Answering and Visual Dialog

Scene Graph Generation from Visual Data

Video Prediction and Reasoning

Scene Understanding

Knowledge Graph Construction

Knowledge Graph Embedding

Representation Learning

Question Answering over Knowledge Bases

Dialog Systems using Knowledge Graph

Adversarial Generation of Language & Images

Graphical Causal Models

Multimodal Representation and Fusion

Transfer Learning across Vision and Text

Submission Guidelines

Three types of submissions are invited to the workshop: long papers (up to 7 pages), short papers (up to 4 pages) and demo papers (up to 4 pages).

All submissions should be formatted according to the IJCAI'2019 Formatting Instructions and Templates. Authors are required to submit their papers electronically in PDF format to the Microsoft CMT submission site.

At least one author of each accepted paper must register for the workshop, and the registration information can be found on the IJCAI-2019 website. The authors of accepted papers should present their work at the workshop.

As in previous years, IJCAI does not have a formal proceeding for workshop papers. All the accepted papers will be made available to the workshop participants.

Any question regarding paper submission, please email us:[AT] or[AT]

Date: August 11, 2019
Location: Sicily 2503
9:05AM--9:20AM Welcome from Organizers
9:20AM--9:40AM Oral 1: Discovering Medical Entity Relations from Texts using Dependency Information (PDF)

Ying Shen; Jiyue Huang; Jin Zhang; Min Yang; Kai Lei

9:40AM--10:00AM Oral 2: Automatic Query Correction for POI Retrieval using Deep and Statistical Collaborative Model (PDF)

Canxiang Zhu; Zhiming Chen; Yang Liu; Juan Hu; Shujuan Sun; Bixiao Cheng; Zhendong Yang; Li Ma; Hua Chai

10:00AM--10:30AM Coffee Break
10:30AM--11:30AM Keynote I: Video and Language

Jiebo Luo (University of Rochester)

11:30AM--11:50AM Oral 3: Transfer Learning with Domain-aware Attention Network for Item Recommendation in E-commerce (PDF)

Minghui Qiu; Bo Wang; Cen Chen; Xiaoyi Zeng; Jun Huang

11:50AM--12:10PM Oral 4: Compressive Multi-document Summarization with Sense-level Concepts (PDF)

Xin Shen; Wai Lam; Xunying Liu; Piji Li

12:10AM--2:00PM Lunch Break
2:10PM--3:10PM Keynote II: Learning Generalized Transformation Equivariant Representations: A Novel Paradigm of Deep Learning

Guojun Qi

3:10PM--3:30PM Oral 5: Vehicle Semantic Understanding for Automated Driving using Deep Vision-based Features (PDF)

Vijay John; Seiichi Mita

3:40PM--4:00PM Oral 6: Pedestrian Detection via Combined Cascades (PDF)

Yujia Tang

4:00PM--4:30PM Oral 7: Meanings of "Data" and "Rules" Emerge as Actions through Auto-Programming for General Purposes (PDF)

Juyang Weng

4:30PM--4:50PM Oral 8: Conformal-Cycle-Consistent Adversarial Model for Video Prediction with Action Control (PDF)

Zhihang Hu; Jason T. L. Wang

4:50PM--5:10PM Oral 9: System Demo for Transfer Learning across Vision and Text using Domain Specific CNN Accelerator for On-Device NLP Applications (PDF)

Baohua Sun; Lin Yang; Michael Lin; Wenhan Zhang; Patrick Dong; Charles Young; Jason Dong

5:10PM--5:20PM Closing Remarks

  • Submission Deadline: April 18, 2019 (11:59PM UTC-12)
  • Notification: May 10, 2019 (11:59PM UTC-12)
  • Camera Ready: June 10, 2019 (11:59PM UTC-12)
  • 140x140

    Sheng Li

    Assistant Professor
    University of Georgia


    Yaliang Li

    Research Scientist
    Alibaba Group


    Jing Gao

    Associate Professor
    University at Buffalo


    Yun Fu

    Northeastern University

    Program Committee

    Daoyuan Chen, Peking University

    Yang Deng, Peking University

    Jiashi Feng, National University of Singapore

    Vishrawas Gopalakrishnan, SUNY at Buffalo

    Xiaodong Jiang, University of Georgia

    Chaochun Liu, JD Finance AI Lab

    Jiasen Lu, Georgia Institute of Technology

    Khorrami Pooya, MIT Lincoln Laboratory

    Minghui Qiu, Alibaba

    Huan Sun, Ohio State University

    Mingming Sun, Baidu Research

    Zhiqiang Tao, Northeastern University

    Zhaowen Wang, Adobe Research

    Chenwei Zhang, University of Illinois at Chicago

    Handong Zhao, Adobe Research