Microsoft plans to allow Teams users to clone their voices and speak similar to their own voices in other languages during meetings.
At Microsoft Ignite 2024 on Tuesday, the company announced Interpreter in Teams, a tool for Microsoft Teams that provides “real-time voice-to-voice” interpretation capabilities. Starting in early 2025, users using Teams for meetings can use an interpreter to support up to nine languages: English, French, German, Italian, Japanese, Korean, Portuguese, Chinese, and Spanish. You will be able to simulate speech in your language.
“Imagine sounding just like you in another language,” Microsoft CMO Jared Spataro wrote in a blog post shared with TechCrunch. “Interpreter in Teams provides real-time voice translation during meetings, and you can choose to simulate your own speaking voice for a more personal and engaging experience.”
Microsoft has provided few specific details about the feature, which is only available to Microsoft 365 subscribers. However, it said the tool does not store any biometric data, does not add any emotion beyond what is “naturally present” in the audio, and can be disabled through Teams settings.
“Interpreter is designed to reproduce a speaker's message as faithfully as possible, without making assumptions or adding extraneous information,” a Microsoft spokesperson told TechCrunch. “Audio simulation can only be enabled if the user consents through a notification during the meeting or by enabling 'Audio simulation consent' in settings. ”
A number of companies have developed technology that digitally mimics fairly natural-sounding sounds. Meta recently announced that it is piloting a translation tool that can automatically translate audio on Instagram Reels, while Celebrities offers a robust platform for multilingual audio generation.
AI translations tend to have less extensive vocabulary than human interpreters, and AI translators often struggle to accurately convey colloquialisms, analogies, and cultural nuances. Still, the cost savings are attractive enough to be worth the trade-off for some. According to Markets and Markets, the field of natural language processing technology, including translation technology, could be worth $35.1 billion by 2026.
However, AI clones also have security challenges.
Deepfakes are spreading like wildfire across social media, making it difficult to separate truth from disinformation. So far this year, deepfakes featuring President Joe Biden, Taylor Swift and Vice President Kamala Harris have racked up millions of views and re-shares. Deepfakes are also used to target individuals, for example by impersonating a loved one. According to the FTC, losses related to identity fraud exceeded $1 billion last year.
Just this year, a team of cybercriminals held a Teams meeting with company executives that was so persuasive that the targeted company reportedly transferred $25 million to the criminals.
Partly due to risks (and optics), OpenAI decided earlier this year not to release Voice Engine, its voice cloning technology.
From what we've learned so far, Interpretation in Teams is a relatively limited application for voice cloning. Still, that doesn't mean the tool can't be abused. One could imagine a malicious person providing a misleading recording to an interpreter (for example, someone asking for bank account information) in order to obtain a translation into the target language.
We hope to get a better idea of the safety measures Microsoft will add to Interpreter in the coming months.