NSU has developed an AI service for creating audio versions of scientific books.

Translation. Region: Russian Federation –

Source: Novosibirsk State University –

An important disclaimer is at the bottom of this article.

Novosibirsk State University has launched a service for automatically creating audio versions of books from the digitized collection of the University's Scientific Library. The project is based on developments by the NSU Research Center for Artificial Intelligence (AI Center) and is currently undergoing testing. Following the successful completion of the pilot project, the technology is planned to be rolled out to other libraries.

At this stage, we are talking about converting books from the university press and materials posted in the electronic library into audio format, with the permission of the copyright holders—a total of about seven thousand titles.

The audio is generated by a neural network: the text is extracted from a PDF, pre-processed, and then an audio version is created. "In the future, we plan to convert all books available in the NSU e-library into audio format. Currently, this number is around 7,000," said Evgeny Pavlovsky, a leading researcher at the NSU Center for Artificial Intelligence and a PhD candidate in physics and mathematics. According to him, the service is not intended to completely replace traditional reading, but is being created as an alternative form of access to text.

"We don't create a voiceover that completely replicates the original. It's an additional way to work with the book. For mass use, it's important that the solution isn't resource-intensive: one book takes about half an hour of processor time, and we're talking about a 16-core processor, even without a graphics card," he explained.

The service is based on the Kappa framework, developed at the NSU AI Center. It is designed for managing datasets and artificial intelligence models, testing them, and evaluating them before implementing them in workflows. The framework allows for checking the correctness of models and reducing the risk of errors or so-called AI "hallucinations." In the new project, Kappa is used to prepare training data for voiceover and collect feedback on the quality of the results.

The first hundred books have already been read in pilot mode, and the team is now awaiting feedback from the library and users. Here's one of them. examples of the service's operationNSU emphasizes that the project is being considered a technological test. Once the technology itself and the mechanisms for interacting with the library have been refined, the service may be offered to other universities and public libraries through a partner platform or in other formats. According to the developers, in terms of computing resources, audio recording of the entire collection is possible within a month, but organizational preparation and verification of the audio recording results may take up to a year.

Please note: This information is raw content obtained directly from the source. It represents an accurate account of the source's assertions and does not necessarily reflect the position of MIL-OSI or its clients.