news.bridge in a nutshell
news.bridge is about making audiovisual/broadcasting content available in virtually any language. We are building a platform that integrates the best off-the-shelf tools in the field of automatic transcription, translation, summarization, and voice-over generation. To help you pick the right NLP tool for your tasks, we are also working on a benchmarking service. Last, but not least, there will be an editor to manually improve output — because we all know that NLP hasn’t reached perfection yet.
Once fully up and running, the news.bridge platform will
- recognize and transcribe speech from audiovisual sources in various languages
- translate transcriptions into various languages
- produce voiceovers of transcriptions
- produce summaries of transcriptions
- offer a benchmarking service to help users find the right tools for their task
- allow users to apply an individual mix of tools
- allow users to edit their transripts and translations
- allow publication of audiovisual content with subtitles in the source language
- allow publication of audiovisual content with subtitles in the target language
- allow publication of audiovisual content with voiceover in the target language
news.bridge consortium and funding
news.bridge is a pan-European project that involves four partners:
Deutsche Welle (DW) is Germany’s international public service broadcaster, offering news in 30 languages across a multitude of channels. DW’s Research & Cooperation team is the main coordinator of news.bridge and also oversees beta testing and content provision.
The Latvian News Agency (LETA) is Latvia’s main information agency. It offers multifaceted content and custom-made solutions for professional information users in Latvia, the Baltics, and beyond. LETA’s primary role in news.bridge is technical integration which is handled by a special team of project managers and developers.
The Laboratoire d’Informatique de l’Université du Maine (LIUM) is a public computer science research lab and part of the University of Maine (Le Mans Université). LIUM’s role in news.bridge is to provide transcription and translation tools and enhancements via its renowned Language and Speech Technologies (LST) team.
Priberam is a Portuguese SME that offers cutting-edge semantic search and natural language understanding technologies to IT and media clients in Portugal, Spain, and Brazil. Its research unit Priberam Labs contributes to new.bridge by creating a summarization tool and supporting commercialization of the platform.
new.brigde is funded by Google’s Digital News Initiative (DNI).
news.bridge builds upon the results of speech.media, another DNI project in which DW and LETA created a prototype platform that integrated different building blocks based on APIs for language processing components.
speech.media was already able to automatically process audiovisual content and generate a transcript and/or set of subtitles in the source language. Based on this, the prototype created a translation into a preselected target language which in turn served as the basis for a synthetic voice-over. speech.media results were surprinsingly good, but here were also shortcomings — which will be addressed in newsbridge.
The consoritum found, for example, that most speech recognition tools lack (decent) punctuation — which makes the transcript hard to read and results in poorly synchronized subtitles and an insufficient base for the next step, i.e. automated translation. This is why project news.bridge is building a rock-solid editor that lets users fix errors after each output step.
Another imporant insight was that different NLP tools are good at different tasks. To get the best possible results, users must always apply a mix of different services. Tool X may be great when it comes to initial audio recognition and transcription, but Tool Y is better at translation. And if you change your source or target language, you might want to consider using Tool Z. This is why project news.bridge tries to include as many NLP services as possible.
Both news.bridge and its forerunner speech.media have their roots in two other NLP projects which not only produced important insights and technical infrastructure, but also helped form the current consortium:
EUMSSI was an EU-funded Research Project that ran from late 2013 to late 2016 and featured project partners DW and LIUM.
SUMMA is an EU-funded Research and Innovation Action that will run until early 2019 and features project partners DW, LETA, and Priberam.
Another piece in the puzzle is the upcoing, Google-funded DNI project Sixth Sense Retrieval which will be kicked-off in February 2018.
There are plans to combine technology from SUMMA, news.bridge, and Sixth Sense Retrieval and create an even more sophisticated, all-in-one NLP SAAS platform — but let’s not get carried away at this point.