2020年の国際学会論文【マルチモーダル学習編】

Uncategorized

国際学会に採択された論文

機械学習やディープラーニングの領域で、マルチモーダルの研究は幅広く扱われています。

マルチモーダルに関する論文は多数発表されています

ディープラーニングに関係がある国際学会、

  • International Conference on Learning Representations (ICLR)
  • Conference and Workshop on Neural Information Processing Systems (NeurIPS)
  • Conference on Computer Vision and Pattern Recognition (CVPR)

などから、2020年に採択された論文のうち、マルチモーダルに関係する論文を紹介します

マルチモーダル論文リスト

ICML (International Conference on Machine Learning)

Interpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device Failure

ICLR (International Conference on Learning Representations)

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling

NeurIPS (Conference and Workshop on Neural Information Processing Systems)

Self-Supervised MultiModal Versatile Networks

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

Labelling unlabelled videos from scratch with multi-modal self-supervision

Deep Multimodal Fusion by Channel Exchanging

Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence

Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

CoMIR: Contrastive Multimodal Image Representation for Registration

CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Learning Representations from Audio-Visual Spatial Alignment

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

See, Hear, Explore: Curiosity via Audio-Visual Association

IJCAI (International Joint Conference on Artificial Intelligence)

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings

Embodied Multimodal Multitask Learning

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

Interpretable Multimodal Learning for Intelligent Regulation in Online Payment Systems

A Similarity Inference Metric for RGB-Infrared Cross-Modality Person Re-identification

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering

Set and Rebase: Determining the Semantic Graph Connectivity for Unsupervised Cross-Modal Hashing

Modeling Dense Cross-Modal Interactions for Joint Entity-Relation Extraction

CVPR (Conference on Computer Vision and Pattern Recognition)

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior 概要動画

Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing 概要動画

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

Semantically Multi-modal Image Synthesis

Cross-Modal Deep Face Normals With Deactivable Skip Connections

Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge 概要動画

Cross-Modal Pattern-Propagation for RGB-T Tracking

A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation

End-to-End Adversarial-Attention Network for Multi-Modal Clustering

CoverNet: Multimodal Behavior Prediction Using Trajectory Sets

Where, What, Whether: Multi-Modal Learning Meets Pedestrian Detection

Multimodal Categorization of Crisis Events in Social Media

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA 概要動画

Modality Shifting Attention Network for Multi-Modal Video Question Answering 概要動画

Hypergraph Attention Networks for Multimodal Learning

MMTM: Multimodal Transfer Module for CNN Fusion

What Makes Training Multi-Modal Classification Networks Hard?

Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather 概要動画

Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation 概要動画

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion 概要動画

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text 概要動画

EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle 概要動画

Multi-Modality Cross Attention Network for Image and Sentence Matching

nuScenes: A Multimodal Dataset for Autonomous Driving

Discriminative Multi-Modality Speech Recognition

ACL (Association for Computational Linguistics)

A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation

A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks

CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality

Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer

Integrating Multimodal Information in Large Pretrained Transformers

MMPE: A Multi-Modal Interface for Post-Editing Machine Translation

Multimodal Neural Graph Memory Networks for Visual Question Answering

MultiQT: Multimodal learning for real-time question tracking in speech

Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association

Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis

Towards Emotion-aided Multi-modal Dialogue Act Classification

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting

Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts

Multimodal and Multiresolution Speech Recognition with Transformers

Multimodal Quality Estimation for Machine Translation

Multimodal Transformer for Multimodal Machine Translation

ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents

GAIA: A Fine-grained Multimedia Knowledge Extraction System

Adaptive Transformers for Learning Multimodal Representations

Cross-media Structured Common Space for Multimedia Event Extraction

Cross-modal Coherence Modeling for Caption Generation

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

Cross-Modality Relevance for Reasoning on Language and Vision

コメント

タイトルとURLをコピーしました