Nvidia wants you to know that your strangest audio whims are now possible. The company’s latest project, alongside its AI NPC and gaming chatbot, is a text-to-audio AI tool called Fugatto. Like other generative modules, Nvidia’s audio AI can create tracks from simple descriptions, but this tool also claims to generate “sounds that have never been heard before,” such as a “howling saxophone”—whatever that might mean.
In a blog post, Nvidia called Fugatto its “Swiss army knife for sound,” capable of modifying existing audio or creating entire soundscapes from scratch. Fugatto is short for the unwieldy name “Fundamental Generative Audio Transformer Opus 1.” It can process voices, music, and background noise, blending them into a single audio track, or modify existing sources.
Calling anything “a sound you’ve never heard before” is a bold claim, especially when it’s AI-generated. Ultimately, AI audio output is the result of algorithms trained on existing data to approximate a user’s prompt. Nvidia says Fugatto is unique because it combines instructions that were separated during training to “create soundscapes it has never encountered before.” For example, it demonstrated generating a train sound morphing into an orchestral score or rain fading into the distance.
These capabilities seem unprecedented. Beyond enabling “electronic music with barking dogs in rhythm,” Nvidia claims its tool offers “fine control” over generated soundscapes. The company also stated that the voiceover in its video was an AI-generated version of Nvidia CEO Jensen Huang, although if Fugatto was behind its slightly unnatural delivery, the tool may need refinement before it’s ready for broader use in creative projects.
Numerous AI audio tools already turn text prompts into audio tracks. Adobe launched its MusicGenAI Control for musicians, while big tech companies like Meta are marketing their audio models to industries like film. Meta recently debuted MovieGen, which can generate soundtracks for AI-created films.
Nvidia quoted AI researcher Rohan Badlani, who said Fugatto made her “feel a bit like an artist,” though the tool relies on vast amounts of pre-existing musical and audio data. Nvidia didn’t share details about its training dataset, noting only that it consists of “millions of audio samples.” The full Fugatto model has 2.5 billion parameters, trained using Nvidia’s H100 AI GPUs.
This could spell bad news for foley artists, who’ve turned audio mimicry into an art form. Nvidia claims Fugatto could be a helpful tool for ad agencies, game developers, or musicians exploring creative changes with minimal effort. On the flip side, it could enable users to create “new assets,” potentially adding to the growing pile of AI-generated clutter.
Fugatto could have uses beyond replacing human sound designers in films. Nvidia says it can add or remove instruments from existing music, isolate and modify specific sounds, or create entirely new compositions. While you might generate empty drum beats over a dreamy synth track, a full AI-generated soundtrack likely isn’t why most people buy a movie ticket.