Voice Veritas: Fake Voice Detection Using Deep Learning

Open

Abstract

The rise of deepfake audio technology has sparked serious concerns about the authenticity of voice recordings, driving the need for reliable detection methods. This paper introduces Voice Veritas, a deep learning system designed to detect fake voices using a hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architecture. By analyzing Mel spectrograms of audio signals, the system uses CNNs to identify spatial patterns in the frequency domain and LSTMs to track temporal changes over time. Trained and tested on the ASVspoof 2019 dataset—a collection of 10,000 real and synthetic audio samples—the model undergoes preprocessing steps such as audio standardization, mel spectrogram conversion, and label encoding. The hybrid CNN-LSTM architecture achieves high accuracy, outperforming traditional methods that rely on manual feature engineering. Experimental results highlight the model’s ability to reliably distinguish genuine from fake audio, even when challenged by diverse spoofing techniques. Key innovations include the seamless integration of CNN and LSTM layers for capturing spatial and temporal details, a streamlined preprocessing workflow, and benchmark performance that sets a new standard in the field. This work addresses gaps in current audio forensics and offers a practical solution for real-world applications. Looking ahead, future research could explore adversarial training to enhance robustness and lightweight models for deployment on edge devices.

DOI URL: https://doi.org/10.64820/AEPJMLDL.21.41.45.62025