The progress made in the field of machine learning applied to signal processing offers interesting perspectives in terms of technological evolution but also causes some troubles in terms of ethics and security. For example, we are witnessing the emergence of audio deepFakes used to orchestrate scams. However, although the tools used in the generation of these deepFake audios show good results which can sometimes produce audios that seem to be confused with real audio, it is not impossible to dissect them. In order to detect them, many methods exist, in particular the analysis of the acoustic parameters which can attest to the authenticity of an audio extract. These parameters include energy, power, pitch, signal spectrum, cepstral coefficients, etc. However, these acoustic parameters are numerous and not all of them are suitable for detecting deepFake audio. This paper presents a comparative review of acoustic parameters useful in detecting DeepFake audio. Among them, we highlight the relevance of the study of cepstral parameters such as MFCC compared to other acoustic parameters such as mel-spectograms. The objective is to provide reliable leads in the detection of deepFake audio.
Deep learning, deepFake audio, Deepfakes, detection, Ethics, Mel frequency cepstral coefficient, mel-spectogram, MFCC, Reliability, Security