A deepfake attack uses artificial intelligence to create synthetic audio, video, or images that convincingly impersonate real people. In a cybersecurity context, attackers use deepfakes to clone executive voices for vishing attacks, create fake video calls to authorize transactions, or generate realistic images to support social engineering pretexts..
What is a Deepfake Attack?
A deepfake attack uses artificial intelligence to create synthetic audio, video, or images that convincingly impersonate real people. In a cybersecurity context, attackers use deepfakes to clone executive voices for vishing attacks, create fake video calls to authorize transactions, or generate realistic images to support social engineering pretexts.
How Deepfake Attacks Work
Modern AI voice cloning tools can replicate a person's voice from as little as three seconds of audio, easily obtained from earnings calls, conference presentations, podcasts, or social media videos. An attacker clones the CEO's voice, calls the CFO, and requests an urgent wire transfer. Because the voice sounds authentic, the target complies. Video deepfakes follow the same principle: an attacker joins a video call using a synthetic video feed of a trusted executive.
Why Deepfake Attacks Matter
In 2024, 40% of executives reported being targeted by deepfake-enabled attacks. The engineering firm Arup lost $25 million when an employee was deceived by a deepfake video call impersonating senior management. AI voice cloning technology has become accessible and inexpensive - tools that produce convincing clones in minutes are freely available. The barrier to launching deepfake attacks is effectively zero.
How to Protect Against Deepfake Attacks
- Run deepfake voice and video simulations to train employees to question synthetic media
- Establish verbal authentication protocols (codewords, callback numbers) for sensitive requests
- Limit the amount of executive audio and video publicly available
- Implement multi-person approval for financial transactions regardless of who requests them
- Train employees that voice and video can no longer be trusted as proof of identity
Frequently Asked Questions
How much audio is needed to clone someone's voice?
Modern deepfake tools can create convincing voice clones from as little as 3-10 seconds of audio. This is easily gathered from public sources like earnings calls, podcasts, YouTube videos, or LinkedIn content.
What's the difference between voice cloning and voice synthesis?
Voice synthesis creates a new voice from scratch, while voice cloning replicates an existing person's voice. Voice cloning is more dangerous because it produces authentic-sounding impersonations that humans cannot distinguish from the real person.
Can video deepfakes be detected?
Detection tools exist but are imperfect. Most rely on identifying artifacts like unnatural eye movements, inconsistent lighting, or audio-visual desynchronization. Attackers actively improve deepfake quality to evade detection, creating an arms race.
What should you do if you receive a suspicious call claiming to be an executive?
Never authorize sensitive actions during an inbound call. End the call and use a known contact number to verify the request independently. Implement verification protocols requiring in-person approval or video call confirmation from multiple people.