Podcastle, a podcast recording and editing platform, is joining other companies in the race to convert text to speech based on artificial intelligence by releasing its own model called Asyncflow v1.0. An API will also be available for developers to directly integrate the text-to-speech model into their applications.
Thanks to the new model, the company can offer more than 450 artificial intelligence voices that can read your text. The startup said it has developed the technology and model in such a way that training and inference costs are low, giving it an edge over competitors.
With this move, Podcastle joins a number of startups, including ElevenLabs, Speechify, and WellSaid, that have developed AI technology and models to turn any type of text into an AI-voiced voice clip. This technology covers such areas of use as marketing, advertising, content creation, education, and corporate training.
Podcastle founder Arto Yeritsian told TechCrunch that the company has always wanted to build a text-to-speech model, but the training costs and data requirements were very high.
“We’ve wanted to build a robust text-to-speech model since our founding. However, the development costs were very high. With recent major developments in speech models, we were able to achieve a breakthrough last year to get to a place where we could build a high-quality voice model without needing a ton of data,” Yeritsian said.
The company was also helped by raising $13.5 million in Series A funding last year.
Yeritsian says that while Podcastle charges about $40 for 500 minutes of text-to-speech, ElevenLabs charges $99 for the same.
Podcastle’s voice cloning feature will also receive an update to speed up the learning process.
Previously, the learning process involved reading about 70 different sentences. Now, it only takes a few seconds of recording from you to create a clone of your voice. The new process also uses Podcastle’s Magic Dust AI, which was released last year, to improve the quality of the audio recording.
In our testing, the voice created by the new process sounded somewhat robotic, although it mimicked our intonation. The company assures us that they will improve this feature over time. In addition, you can practice different samples of your voice to get different results.
Podcastle says that aside from the cost, having tools for audio, video, podcasts, and voiceovers on one updated site will give them an edge over the competition. Yeritsyan says that while most users use Podcastle for audio content, video is catching up as well.