Cybersecurity researchers have developed a framework that uses artificial intelligence to clone a person's voice in real-time, requiring only a few minutes of audio. This breakthrough technology significantly lowers the barrier for conducting sophisticated voice phishing, or "vishing," attacks, posing a new level of threat to both corporations and individuals.
Key Takeaways
- Researchers at NCC Group created an AI model that can clone voices for live conversations, a major advancement over previous technologies.
- The system requires only a few minutes of a target's voice, often obtainable from public sources, and can be trained in hours.
- This technology makes highly convincing social engineering attacks more accessible to threat actors with moderate resources.
- Recent security incidents at major companies like Cisco and Salesforce highlight the growing effectiveness of vishing campaigns.
- Experts recommend multi-factor authentication and enhanced employee training to counter the rising threat of AI-powered vishing.
A New Era of Voice-Based Scams
Security researchers from NCC Group have demonstrated a new capability that could fundamentally change the landscape of social engineering attacks. Their team developed a framework that can replicate a person's voice and use it in a live conversation, effectively allowing an attacker to speak as someone else in real-time.
This development marks a significant departure from previous voice deepfake technologies. Older methods were often limited to offline processing, meaning they could create a fake recording but couldn't be used for a live, interactive phone call. Other systems that relied on text-to-speech (TTS) models often introduced unnatural delays, which could alert a victim that something was wrong.
What is Vishing?
Vishing, a term combining "voice" and "phishing," is a type of cyberattack where criminals use phone calls to deceive people into revealing sensitive information. Attackers often impersonate trusted figures, such as bank representatives, IT support staff, or even family members, to steal credentials, financial details, or corporate data.
The new framework overcomes these limitations. "The main objective of our research was therefore to try to overcome these limitations by developing a framework capable of real-time voice cloning," the NCC Group researchers wrote in their report. This allows for a natural, flowing conversation where the attacker’s words are instantly converted into the target's cloned voice.
Lowering the Barrier for Sophisticated Attacks
One of the most concerning findings from the research is the accessibility of this technology. The model was trained using just a few minutes of publicly available audio of the target individual. Furthermore, the hardware and software required are not specialized or prohibitively expensive.
"This was all possible using hardware, audio sources and audio processing software that were all 'good enough', rather than being exceptional," the researchers stated. This implies that threat actors without state-level resources could develop and deploy similar tools.
The NCC Group team successfully used their framework in practical tests against real organizations. They were able to obtain confidential information by convincing employees to perform actions on their behalf, demonstrating the tool's effectiveness in real-world scenarios.
To prevent misuse, the company has chosen not to release the technical specifics of their framework. However, they caution that it is reasonable to assume that sophisticated threat actors may have already developed similar capabilities independently.
A Growing Trend in Corporate Breaches
While AI-powered vishing represents a future threat, traditional vishing is already a proven and effective attack vector. Several high-profile incidents this year have highlighted its danger.
Recent Vishing Incidents
- August: Cisco reported a data breach that originated from a vishing attack targeting one of its employees.
- June: A financially motivated group impersonated IT support staff in phone calls to trick Salesforce customer employees into giving up their credentials.
- May: The 3AM ransomware group began using vishing calls to gain initial access to victim networks before deploying their malware.
These attacks show that criminals are increasingly using phone calls to bypass technical security controls by targeting the human element. The addition of convincing, real-time voice cloning is expected to make these attacks even more difficult to detect.
"We anticipate a rise in both broad campaigns leveraging well-known personas and highly targeted attacks aimed at specific organizations," the researchers noted. This could include attackers impersonating a CEO to instruct the finance department to make an urgent wire transfer or posing as an IT administrator to gain remote access to an employee's computer.
Defending Against AI-Powered Impersonation
The emergence of realistic voice cloning challenges traditional security practices, especially those that rely on voice for authentication. As the line between real and simulated voices blurs, organizations and individuals must adapt their defenses.
Experts stress that technology alone is not a complete solution. A multi-layered approach combining technical safeguards with human awareness is essential.
Recommended Security Measures
NCC Group and other security experts recommend several key strategies to mitigate the risk of AI-powered vishing:
- Implement Multi-Factor Authentication (MFA): Requiring a second form of verification makes it much harder for an attacker to gain access, even if they successfully steal credentials.
- Enhance Employee Training: Staff should be trained to be skeptical of unusual or urgent requests, even if the voice on the phone sounds familiar. This includes verifying such requests through a separate communication channel, like an internal messaging app or a direct call to a known number.
- Establish Verification Protocols: For sensitive transactions like financial transfers or system access changes, companies should use pre-established code words or secondary verification methods that cannot be easily socially engineered.
- Limit Public Voice Exposure: Executives and other public-facing employees should be mindful of their publicly available audio, such as in interviews, podcasts, and conference calls, as this material can be used to train voice-cloning models.
Ultimately, the researchers warn that any process reliant on human judgment is vulnerable. "Any process in which a person can make an exception is vulnerable to social engineering, whether it is traditional vishing, AI voice cloning, phishing or deepfakes," they concluded. This underscores the need for robust, consistently enforced security policies.





