MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Abstract

Though Multi-modal Large Language Models (MLLMs) have recently achieved significant progress, they often face various problems while handling inter-object relations, i.e., the interaction or association among distinct objects. This constraint largely stems from insufficient training and evaluation data for relation understanding, which has greatly impeded MLLMs in various vision-language generation and reasoning tasks. We attempt to address this challenge by introducing Multi-Modal Relation Understanding (MMRel), a benchmark that features large-scale, high-quality, and diverse data on inter-object relations. MMRel features three distinctive attributes: (i) It contains over 22K question-answer pairs, spanning three distinct domains and covering three relation categories, ensuring both scale and diversity; (ii) it provides manually verified, high-quality labels to ensure exceptional annotation accuracy; (iii) it includes adversarial cases with highly unusual relations, offering a challenging setting for evaluating relation hallucination. These features make MMRel ideal for evaluating MLLMs on relation understanding, as well as for fine-tuning MLLMs to enhance relation comprehension capability. Extensive experiments verify the effectiveness of MMRel in evaluating and enhancing MLLMs' relation understanding capabilities.

Existing Benchmark

Though several benchmarks on inter-object relations have been created, they were not intended for assessing MLLMs' relation understanding capabilities. Specifically, most existing benchmarks suffer from obvious limitations in data scales, relation categories, and data diversity. We address this issue by creating a comprehensive benchmark on inter-object relations, aiming to gauge and enhance MLLMs' relation understanding capability in various multimodal tasks.

MMRel

We introduce a Semi-automatic Data Collection pipeline (SemiDC), which is capable of annotating large-scale existing images and generating a substantial amount of high-quality synthetic images. As discussed in paper, re-labeling existing images is essential since their original labels are incompatible with MLLMs. To this end, we design SemiDC to generate high-quality relation annotations via GPT-4V for large-scale VG benchmark. This process is divided into three stages: (i) Pre-processing: We selectively exclude images featuring complex scenes that pose challenges for GPT-4V in generating accurate annotations; (ii) Re-labeling via GPT-4V: We employ the in-context learning paradigm to use GPT-4V to generate relation annotations. GPe text prompt; (iii) Human verification: We manually assess and correct the annotations that are generated by GPT-4V, to ensure the quality of the collected inter-object relation data.

Table shows the statistics of MMRel. Specifically, MMRel comprises around 22,500 question-answer pairs (15K Yes/No, and 7.5K Open-ended) across 7 subsets, spanning 3 domains and 3 categories of relations. Thanks to the open-vocabulary capability of GPT-4V, MMRel guarantees a diverse range of objects and action relations.

Fine-Tuning with MMRel

As Table shows, fine-tuning with MMRel improves the capabilities of relation understanding significantly and consistently across all data domains and relation categories. In addition, fine-tuning improves the relation understanding of the adversarial subset as well.

Citation

@article{nie2024mmrel,
title={MMRel: A Relation Understanding Benchmark in the MLLM Era},
author={Nie, Jiahao and Zhang, Gongjie and An, Wenbin and Tan, Yap-Peng and Kot, Alex C and Lu, Shijian},
journal={arXiv preprint arXiv:2406.09121},
year={2024}
}

Jiahao Nie * Nanyang Technological University	Gongjie Zhang * Alibaba DAMO Academy	Wenbin An Xi'an Jiaotong University
Yap-Peng Tan Nanyang Technological University	Alex C. Kot Nanyang Technological University	Shijian Lu Nanyang Technological University

MMRel: A Relation Understanding Benchmark in the MLLM Era
arXiv 2024

Paper

Benchmark

Abstract

Existing Benchmark

MMRel

Evaluation on MMRel

Fine-Tuning with MMRel

Citation

Acknowledgements

MMRel: A Relation Understanding Benchmark in the MLLM Era arXiv 2024

Paper

Benchmark

Abstract

Existing Benchmark

MMRel

Evaluation on MMRel

Fine-Tuning with MMRel

Citation

Acknowledgements

MMRel: A Relation Understanding Benchmark in the MLLM Era
arXiv 2024