Deep Network Representations as Reliable Indicators of Synthetic Content in Audiovisual and Clinical Contexts
AI-Driven Innovation and Smart Systems: 46AI 2025
This study introduces an interpretable framework for detecting synthetic audiovisual content using deep neural representations, applied to the DeepFake RealWorld (DFRW) dataset (46 371 clips; 77% with audio). Visual, acoustic, and cross-modal embeddings from ResNet, Vision Transformer, SlowFast, Wav2Vec2, and ECAPA-TDNN were evaluated with frequency-based…