Single-Stream Multi-level Alignment for Vision-Language Pretraining

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ECCV (17. : 2022 : Tel Aviv; Online) Computer vision – ECCV 2022 ; Part 36
1. Verfasser: Khan, Zaid (VerfasserIn)
Weitere Verfasser: Vijay Kumar, B. G. (VerfasserIn), Yu, Xiang (VerfasserIn), Schulter, Samuel (VerfasserIn), Chandraker, Manmohan (VerfasserIn), Fu, Yun (VerfasserIn)
Pages:2022
Format: UnknownFormat
Sprache:eng
Veröffentlicht: 2022
Schlagworte:
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Titel Jahr Verfasser
Object-Centric Unsupervised Image Captioning 2022 Meng, Zihang
Learning Linguistic Association Towards Efficient Text-Video Retrieval 2022 Fang, Sheng
Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing 2022 Boecking, Benedikt
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input 2022 Guo, Qingpei
Video Graph Transformer for Video Question Answering 2022 Xiao, Junbin
Rethinking Data Augmentation for Robust Visual Question Answering 2022 Chen, Long
Word-Level Fine-Grained Story Visualization 2022 Li, Bowen
Webly Supervised Concept Expansion for General Purpose Vision Models 2022 Kamath, Amita
Unifying Event Detection and Captioning as Sequence Generation via Pre-training 2022 Zhang, Qi
Fine-Grained Visual Entailment 2022 Thomas, Christopher
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection 2022 Hong, Joanna
Language-Driven Artistic Style Transfer 2022 Fu, Tsu-Jui
Explicit Image Caption Editing 2022 Wang, Zhen
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks 2022 Cai, Zhaowei
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding 2022 Hao, Jiachang
Generative Negative Text Replay for Continual Vision-Language Pretraining 2022 Yan, Shipeng
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly 2022 Whitehead, Spencer
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds 2022 Jain, Ayush
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling 2022 Yang, Zhengyuan
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding 2022 Heisler, Morgan
Alle Artikel auflisten