Meta has released an “open” implementation of its viral podcast generation feature in Google's NotebookLM.
The project, called NotebookLlama, unsurprisingly uses Meta's own Llama model for much of its processing. Similar to NotebookLM, it can generate podcast-style digests of uploaded text files before and after.
NotebookLlama first creates a transcript from a file (such as a PDF of a news article or blog post). It then adds “further dramatization” and interruptions before feeding the transcript into an open text-to-speech model.
Image credit: Meta
The results don't seem to be as good as NotebookLM. In the NotebookLlama sample I listened to, the voices have a decidedly robotic quality, and tend to talk over each other in strange ways.
But the meta-researchers behind the project say quality could be improved with more powerful models.
“Text-to-speech models are limited in how natural this can sound,” they write on NotebookLlama's GitHub page. “[Also,] Another approach to creating a podcast is to have two agents discuss topics of interest and create an outline for the podcast. Currently, we use a single model to create podcast outlines. ”
NotebookLlama is not the first attempt to recreate NotebookLM's podcast functionality. Some projects have been more successful than others. But no one, not even NotebookLM itself, has been able to solve the problem of illusions that plague all AI. This means that AI-generated podcasts will always contain some kind of fabrication.