Posts

DW uses news.bridge prototype for automated subtitling in day-to-day news operations

,

nb_lang_sub

Right from the start, project news.bridge has been about creating a language technology platform fit for daily production workflows at modern media organizations. Our goal is to provide a collection of transcription, translation, subtitling and voice-over services that reporters and editors are able to use hassle-free. In the last couple of months, we’ve taken another big step towards this goal: DW newsrooms have started to work with the news.bridge prototype.

Kudos for pioneering spirit go to both the Hindi and Portuguese language teams, who were the first to use our (still unfinished) platform for the automated production of subtitles for program items on an actual publishing schedule. Needless to say, there were a couple of hiccups, but all in all the subtitling process went really well, and the results speak for themselves.

DW Hindi used news.bridge to turn English audio into Hindi subtitles in a short web video on activists fighting against female genital mutilation in Guinea:

https://twitter.com/dw_hindi/status/1068876286259531778

nb_dw_hindi

DW Portuguese for Africa worked with news.bridge to create Portuguese subtitles from German audio for an episode of Euromaxx’s “Baking Bread”. The video focuses on pão de milho, Portugal’s famous cornbread:

https://www.dw.com/pt-002/baking-bread-o-que-nos-ensina-a-broa-de-milho-sobre-portugal/av-48283900

nb_dw_brasil

By now, a team of editors from different newsroom has started adapting long-form videos from German and English. A series of documentaries on celebrated art school Bauhaus will soon be published with Russian and Brasilian Portuguese subtitles, for instance. The Turkish, Indonesian and Swahili desks have also experimented with news.bridge.

Just a couple of days ago, the DW Brasil department announced they’re preparing the production of subtitles for no less than 12 (German-language) instalments of DW’s Reporter series. We think news.bridge will be just right for the heavy lifting in the translation process.

5 questions with… Peggy van der Kreeft

Peggy van der Kreeft discussing DW’s human language technology projects at Global Media Forum 2017

 

The news.bridge consortium consists of four partners: Deutsche Welle (DW), the Latvian News Agency (LETA), the Laboratoire d’Informatique de l’Université du Mans (LIUM), and Priberam. We are all part of the media and tech industry in some way, we’re all fascinated by what cutting-edge human language technology (HLT) can do, and we’re all dedicated to this DNI project. Other than that, we’re actually pretty different organizations and people with a wide range of interests and a distinctive set of skills. So we figured it would be a good idea to sit down, ask the four team leaders a couple of questions — and give you some more insights on who is doing what exactly, and why it’s all worthwile.

For the first of part of this series of posts, we’ve talked to Peggy van der Kreeft, an experienced linguist and innovation manager, who is running news.bridge at DW.

Peggy, when and how did you first get in touch with human language technology (HLT)?

That was is in the early 1980s, believe it or not, during my postgraduate on “Translating at European Level” at the University of Louvain. I tried out early machine translation (MT) systems like SYSTRAN. Later on, in the 90s, I used Babel Fish while working at an American translation and documentation center. The first research project at Deutsche Welle that focused on MT was CoSyne, which ran from 2010 to 2013. Among other things, we experimented with translating DW’s Today in History. The quality wasn’t good enough for direct publication, but there was always post editing, and we already succeeded in making the translation process a little more efficient. My first experience with automated speech recognition (ASR) and speech-to-text (STT) was in the scope of project AXES (2011-2015). This one was about finding novel ways to explore and interact with audiovisual libraries. The platform itself wasn’t sophisticated enough for use in a real production environment, but it certainly showed the power of HLT — which has been a focus topic of our department (DW Research & Cooperation Projects) ever since.

What is the most fascinating aspect about news.bridge?

There are many fascinating aspects, so it’s really hard to choose one. Perhaps the most striking thing is that the platform is so powerful, even though it’s based on a fairly simple concept. news.bridge covers virtually any language, and through the use of external tools, it remains state of the art.

What is the project’s biggest challenge?

Well, the overall challenge is to make news.bridge stable, scalable and provide a smooth workflow for the entire process of creating subtitles (and voice-overs). A challenge we’re currently focusing on is the seamless ingestion of existing scripts — which are rarely standardized. However, using original scripts (instead of ASR/STT) always leads to the best results, so we need to work on this.

Who’s in your team and what are they currently working on?

Here at DW, we’re currently four people: Ruben Bouwmeester, Hina Imran, Alexander Plaum, and myself. Ruben, who joined the project very recently, works on the project’s business development and marketing plan. He also makes sure we always have professionally designed dissemination material. Hina is our developer. She works on customized user interfaces for HLT applications and output, coordinating technical issues with LETA and within DW. She also manages and maintains a local test installation of news.bridge. Alex is in charge of communication and dissemination. He runs our website, this blog, and our Twitter account. He also works on brand design and sometimes acquires new partners. As for me, I am the HLT lead at our department and also the main coordinator of news.bridge. That means I take care of operations, oversee and report progress, organize user testing and plan implementation.

Where do you see news.bridge in five years?

news.bridge has already attracted quite a bit of attention. Many broadcasters, news producers and language technology providers are interested in implementing it as soon as possible. Multilingual content has become very important, and news.bridge will significantly speed up production workflows — with modest investment costs. We hope we’ll be able to offer a reliable service — local installations and software as a service (SaaS) — sometime in 2019. But we are not waiting for that. Our first major test case is here at Deutsche Welle, with its many newsrooms and its international orientation. We’ve started betatesting the platform for automated translation and subtitling of videos in different languages, and we’re taking if from there. It would be great if news.bridge became a standard HLT platform by 2023; it certainly has the potential. In order for that to happen, we need to find the right exploitation partners and strategies, of course. news.bridge is not a startup, but a media innovation project. When it’s finished, the platform will lead its own life.

Insights from our first user testing sessions

, ,

newsbridge_user_testing

Getting early input from the people you are designing for is absolutely essential – which is why we invited about a dozen colleagues to give the latest beta version of news.bridge a test run at the DW headquarters in Bonn last month. We had two really inspiring sessions with journalists, project managers and other media people working for DW and associated companies — and we have gained a number of useful insights. While some of them are too project-specific to share (news.bridge is not in public beta yet), there are also more general learnings that should make for an interesting journo tech blog post. Here we go:

Infrastructure and preparation

When inviting people to simultaneously stream and play around with news videos, make sure you have enough bandwidth. This may sound trivial, but it’s important, especially in Germany (which doesn’t even make the Top 20 when it comes to internet connection speed).

To document what your beta testers have to say as quickly and convenient as possible, we recommend to prepare digital questionnaires (e.g. Google Forms) and send out a link well before the end of the session. That way, you get solid feedback from everyone. It’s also a good idea to add a screenshot/comment feature (e.g. html2canvas) to the platform that is being tested. In addition, open discussions and interview-type interactions provide very useful feedback.

Testing automatic speech recognition (ASR) tools

Thanks to artificial neural networks, ASR services have become incredibly sophisticated in the last couple of years and deliver very decent results. Basically all of our test users said the technology will significantly speed up the tiresome transcription process when producing multilingual news videos.

However, ASR still has trouble when:

  • people speak with a heavy dialect and/or in incomplete sentences (like some European football coaches who shall not be named)
  • people speak simultaneously (which frequently happens at press conferences, for example)
  • complicated proper names occur (Aung San Suu Kyi, Hery Rajaonarimampianina)
  • homophones occur (merry, marry, Mary)
  • there is a lot of background noise (which is often interpreted as language and transcribed to gibberish)

As a result, journalists will almost certainly have to do thorough post-editing for a while and also correct (or add) punctuation, which is crucial for the subsequent translation.

Testing machine translation (MT) tools

What has been said about ASR also applies to MT: The tech has made huge leaps, but results are not perfect yet. Especially when you are a professional editor and thus have high standards. Something really important to remember:

The better and more structured your transcript (or uploaded original script),
the better the translation you end up with.

As for the limits of machine translation during our testrun, we found that “exotic” languages like Pashto (which is really important for international broadcasters like DW) are not implemented really well. Few services cover them, and the translation results are subpar. This is not a big surprise, of course, as the corpus used to train the algorithms is so much smaller than that of a major Western language like French or German. This also means that it is up to projects like news.bridge to improve MT services by feeding the algorithms high-quality content, e.g. articles from DW’s Pashto news site.

While MT tools are in general very useful when producing web videos — you need a lot of subtitling in the era of mobile social videos on muted phones — there are some workflows that are hard to improve or speed up. For example: How do you tap into digital information carriers that are an individually branded, hard-coded part of a video created in software like Adobe Premiere? Well, for now we can’t, but we’re working on solutions. In the meantime, running news.bridge in a fixed tab and copy-pasting your translated script bits is an acceptable workaround.

Testing speech synthesis

Sometimes, computer voices are indispensable. For example, when you’re really curious about this blogpost, but can’t read it because you’re on a bike or in a (traditional) car.

In news production however, artificial readers/presenters are merely a gimmick. At least for the time being. That’s because once your scripts are finished, reading/recording them isn’t that time consuming and will provide much nicer results. Besides, synthetic voices aren’t yet available in all languages (once again, Pashto is paragon).

Nevertheless, news.bridge beta testers told us that the voices work fairly well, and even sound pretty natural in some cases. They can be trained, by the way, which is an interesting exercise we will try out at some point.

HLT services and news production in a nutshell

If we had to sum up the assessment of our beta testers in just a few sentences, they would read something like this:

HLT services and tools are useful (or very useful) in news productions these days: They get you decent results and save you a lot of time.

news.bridge is a promising, easy-to-use mash-up platform, especially when it comes to transcribing and translating and creating subtitles (another relevant use case is gisting).

news.bridge is not about complete automation. It’s about supporting journalists and editors. It’s about making things easier.