LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Overall framework

Abstract

With the rapid advancement of AI-generated content, the future internet may become saturated with synthetic media, making it increasingly challenging to discern truth and trust information. Synthetic data detection has thus garnered widespread attention, and the performance of multimodal large models (LMMs) in this task has attracted significant interest. On one hand, these models can provide natural language explanations for their authenticity judgments, paving the way for enhanced explainability in synthetic content detection. On the other hand, distinguishing between real and synthetic data tests the perception, knowledge, and reasoning abilities of LMMs—capabilities essential for advancing towards more robust Artificial General Intelligence (AGI). In response, we introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities. LOKI encompasses video, image, 3D, text, and audio modalities, comprising 13K carefully curated questions across 28 subcategories with clear difficulty levels. The benchmark includes coarse-grained true/false questions, in-domain multiple-choice questions, and fine-grained anomaly explanation questions, effectively assessing models in synthetic data detection and reason explanation. We evaluated 15 open-source LMMs and 3 closed-source models (including GPT-4 and Gemini) on LOKI, highlighting their potential as synthetic data detectors while also revealing current limitations such as imbalanced modality capabilities and weak logical reasoning abilities.

Publication
arXiv preprint
Tong WU 吴桐
Tong WU 吴桐
PostDoc @ Stanford

My research interests include 3d vision, long-tailed recognition, and robustness.

Related