5 Questions with… Renārs Liepiņš

renars

We’ve talked to Peggy van der Kreeft (who manages news.bridge for Deutsche Welle), and we’ve talked to Yannick Estève (who represents the LIUM computer scientists involved in the project). Now it’s time to have a chat with Renārs Liepiņš – without whom news.bridge would be little more than a white paper and a collection of wire frames. Renārs is Senior Research Scientist at IMCS UL and LETA, founder and CEO of MindFlux, and project lead for LETA.

1) Renārs, when and how did you first get in touch with human language technology?

I first learned about HLT when I was working on tools for the semantic web during my PhD years (2010-2015). At the time HLT reached a new level; the output started to become useful. After my PhD, in early 2016, I began to work for the SUMMA project, which is about combining multiple HLT modules and creating a unified pipeline for automated media monitoring. The success of SUMMA made me think about other interesting combinations of HLT tools, and thus the idea for news.bridge was born.

2) What is the most fascinating aspect of news.bridge?

Well, first of all, it’s great to build a platform that saves people a lot of cumbersome routine work. Videos, audio tracks, and scripts processed with news.bridge aren’t perfect – but they require only minor tweaks. It’s also fascinating to explore the options of human-computer cooperation. news.bridge depends on smart algorithms and smart editors. We get the best results when they work in tandem.

3) What is the project’s biggest challenge?

It’s actually a combination of challenges: We need to scale the system so it can be used in a production environment of a big broadcaster and extend the UI to handle more workflows – all while keeping the platform as simple to use as possible.

4) Who’s in your team and what are they currently working on?

The LETA team consists of Roberts Dargis, Didzis Gosko, Mikus Grasmanis, and myself. Roberts and Didzis take care of the backend development and the integration of new HLT modules from internal and external partners. Mikus is responsible for the UI and does most of the front-end coding. I’m the project lead, which means I handle coordination with other partners as well as overall system architecture and integration.

5)  Where do you see news.bridge in five years?

I hope that news.bridge will have become a mature platform that helps media companies expand their markets and provide truly multilingual news all around the world.

From ASR to xml:tm – Human Language Technology Abbreviations Spelled Out!

hlt_abbreviations

Let’s face it: Media tech people live in a language filter bubble. Every community uses their own lingo – which already makes communication somewhat difficult. To make matters worse, communities go on to create a whole bunch of abbreviations, as their favorite terms are long and unwieldy. HLT pros – that’s human language technology professionals – are no exception here. The news.bridge consortium would like to try and fix the problem (or at least a part of it). That’s why we’ve created the following list of terms that spells out the cryptic language we use in our daily business.

Please note that all included abbreviations have a strict focus on speech, language, and translation. So even though deep neural networks (DNN) and hidden Markov models (HMM) are very important, we didn’t put them on the list, because they’re not exclusive to HLT. We also decided to go without full definitions (a lot of terms are actually self-explanatory once they’re spelled out), but rather include a number of links as well as information on whether an abbreviation represents a general term, a paradigm, or a file format. Ok, end of introduction. Here’s our list, we hope it’s useful:

 

  • ASR = Automatic speech recognition
  • CAT = Computer-assisted translation
  • DNT = Do not translate (because this is: a proper name, a trademark etc.)
  • EBMT = Example-based machine translation (an MT paradigm)
  • G11n = Globalization (“G”, followed by 11 letters, followed by “n”)
  • GILT =  Globalization, internationalization, localization, translation
  • HPMT = Hierarchical phrase-based machine translation (a statistical MT approach)
  • ISG = the Industry Specification Group for localisation industry standards (= the successor of LISA) at the European Telecommunications Standards Institute (ETSI)
  • I18N = Internationalization (“i”, followed by 18 letters, followed by “n”)
  • L10n = Localization (“l”, followed by 10 letters, followed by “n”)
  • LIS = Localizsation Industry Standard(s)
  • LISA = Localization Industry Standards Association
  • LSP = Language services provider
  • MT = Machine translation
  • NMT = Neural machine translation (an MT paradigm)
  • NER = Named-entity recognition (= the extraction of the names of persons, organizations, locations, etc. from a text)
  • NLG = Natural language generation
  • NLP = Natural language processing
  • OLIF = Open Lexicon Interchange Format (= an open standard for the exchange of terminological and lexical data)
  • PBMT = Phrase-based machine translation (an MT paradigm)
  • POS tagging = Part-of-speech tagging (= the identification of words as nouns, verbs, adjectives, adverbs, etc.)
  • SBMT = Syntax-based machine translation (a statistical MT approach)
  • SLU = Spoken language understanding
  • SMT = Statistical machine translation (an MT paradigm)
  • SRX =  Segmentation Rules eXchange (an enhancement of the TMX standard)
  • STT = Speech-to-text
  • T9n = translation (“T”, followed by 9 letters, followed by “n”)
  • TBX = TermBase eXchange (a standard for exchanging terminological data)
  • TEP = Translate, edit, proofread
  • TM = Translation memory (= a database that stores translated sentences, paragraphs etc.)
  • TMM = Translation memory manager (= software tapping into a TM)
  • TMS = Translation memory system
  • TMX = Translation memory eXchange (= a standard that enables the interchange of translation memories between translation suppliers)
  • TQA = Translation quality assurance
  • TransWS = Translation web services (= a framework for the automation of localization processes via the use of web services)
  • TTS = Text-to-speech
  • TU = Translation unit (= a segment of text treated as a single unit of meaning)
  • UTX = Universal terminology eXchange (= a standard specifically designed for user dictionaries of MT)
  • WBMT = word-based machine translation (a statistical MT approach)
  • WER = Word error rate
  • XLIFF = XML localization interchange file format (= a language tech industry standard for exchanging XML data)
  • xml:tm = XML-based text memory (= an approach to translation memory based on the concept of “text memory”, which is a combination of author and translation memory)

This list is a work in progress. If you feel something important is missing, please drop us a line.

5 questions with… Yannick Estève

yannick

Four partners, four areas of expertise, four teams with a distinctive set of skills. For the second part of this series of posts on the people behind news.bridge, we’ve talked to Yannick Estève, Professor of Computer Science at the University of Le Mans, and project lead for LIUM.

Yannick, when and how did you first get in touch with human language technology?

Well, first of all, I’ve always been a fan of science fiction. So most certainly, books and movies like “2001” had a big influence on me. When I became a student in the 1990s, I was fascinated by computer science, but also by the humanities. Working on human language technology seemed like an excellent way to satisfy both interests.

What is the most fascinating aspect about news.bridge?

In my opinion, the most fascinating aspect is that you can handle complex and powerful technologies like speech recognition, machine translation, speech generation, and summarization through a very simple user interface. The platform offers easy access to global information in a vast number of languages, and that’s really fantastic!

What is the project’s biggest challenge?

The biggest challenge is probably related to integration. We need to manage heterogeneous technologies and services from several companies – and come up with one smart, unified application.

Who’s in your team and what are they currently working on?

We have four core members in this project: Sahar Ghannay is a post-doc researcher and an expert on deep learning for speech and natural language processing. Antoine Laurent is an assistant professor, his focus is on speech recognition. Natalia Tomashenko is a research engineer, she’s all about deep learning and acoustic models adaptation for speech recognition. Well, and I’m the professor and project lead; my expertise lies in speech and language technologies and deep learning.

We’re all members of LIUM, which can safely be called an HLT stronghold. For the last five years, our main research interest has been deep learning applied to media technology. Currently, we mainly work on neural end-to-end approaches for different tasks related to speech and language. End-to-end neural means that a single neural model processes the input (for example an audio signal containing speech) to generate the output (text), whereas in the “classical” pipeline, we apply different sequential systems and models between the input and the final output.

Where do you see news.bridge in five years?

In five years, news.bridge will have even better integrated services, cover even more languages, and offer new functionalities, like the smooth extraction of semantic information. Progress in HLT is very fast, and we still haven’t realized the full potential of the deep learning paradigm. Increasing computation power and training data is just a first step here.

LIUM publishes new papers on named entity extraction and speech recognition

,

news.bridge partner LIUM has released two new academic papers on computation and language: End-to-end named entity extraction from speech and TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. The papers were collaboratively written by Antoine Caubrière, Yannick Estève, Sahar Ghannay, Antoine Laurent, Natalia Tomashenko (University of Le Mans), François Hernandez and Vincent Nguyen (Ubiqus), and Emmanuel Morin (University of Nantes).

End-to-end named entity extraction from speech shows that it’s possible to recognize named entities in speech with a deep neural network that directly analyzes the audio signal. This new approach is an alternative to the classical pipeline which applies an automatic speech recognition (ASR) system first and subsequently analyzes the transcriptions. LIUM’s new method not only allows to simultaneously deal with speech recognition and entity recognition, it also makes it possible to obtain named entities only — and ignore the other words. The approach is interesting for at least two reasons:

  1. The system is easier to deploy (because you only need to set up a neural net).
  2. Performance will most likely be better (because the neural net is optimized for named entity extraction, whereas in the classical pipeline the different tools are not jointly optimized for the same task).

End-to-end named entity extraction from speech is available on arXiv under this link.

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation is about describing (and providing) a new LIUM TED talk corpus as well as documenting experiments with it.

TED LIUM basically follows two aims: Train and improve acoustic models — and fix flawed TED talk subtitles while at it. Via its own ASR system, tailor-made for processing TED talks, LIUM creates new transcriptions of original audio. Subsequently, these transcriptions are compared to the old, often inferior subtitles. LIUM saves reliable segments, gets rid of unreliable material, applies some heuristics and finally provides the subtitles in file formats used by the international speech recognition community.

One of the (rather surprising) insights of the paper is that when transcribing oral presentations like TED talks, augmenting training data from 207 hours to 452 hours (+ 218.36%) doesn’t significantly affect state-of-the-art ASR system (i.e. Hidden Markov Models coupled with deep neural networks using a pipeline of different processes: speaker adaptation, acoustic decoding, language model rescoring). The word error rate (WER) dropped by merely 0.2%, from (an already low) 6.8% to 6.6%. The system seems to have reached a plateau.

However, training data augmentation absolutely benefits emergent ASR systems (fully neural end-to-end architecture, only one process, no speaker adaptation, no heavy language model rescoring). In this case, the same augmentation of training data led to a 4.8% drop in the WER. At 13.7%, it’s still rather high, but significantly lower than the 18.5% achieved by an Markov-based system at a comparable development stage in 2012.

Conclusion: Emergent fully neural ASR systems aren’t bad at all, very sensitive to training data augmentation, and can probably be exploited further. The big question in this context: How much data does it take to reach competitive results?

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation is available on arXiv under this link and was also submitted to and accepted by SPECOM 2018.

5 questions with… Peggy van der Kreeft

Peggy van der Kreeft discussing DW’s human language technology projects at Global Media Forum 2017

 

The news.bridge consortium consists of four partners: Deutsche Welle (DW), the Latvian News Agency (LETA), the Laboratoire d’Informatique de l’Université du Mans (LIUM), and Priberam. We are all part of the media and tech industry in some way, we’re all fascinated by what cutting-edge human language technology (HLT) can do, and we’re all dedicated to this DNI project. Other than that, we’re actually pretty different organizations and people with a wide range of interests and a distinctive set of skills. So we figured it would be a good idea to sit down, ask the four team leaders a couple of questions — and give you some more insights on who is doing what exactly, and why it’s all worthwile.

For the first of part of this series of posts, we’ve talked to Peggy van der Kreeft, an experienced linguist and innovation manager, who is running news.bridge at DW.

Peggy, when and how did you first get in touch with human language technology (HLT)?

That was is in the early 1980s, believe it or not, during my postgraduate on “Translating at European Level” at the University of Louvain. I tried out early machine translation (MT) systems like SYSTRAN. Later on, in the 90s, I used Babel Fish while working at an American translation and documentation center. The first research project at Deutsche Welle that focused on MT was CoSyne, which ran from 2010 to 2013. Among other things, we experimented with translating DW’s Today in History. The quality wasn’t good enough for direct publication, but there was always post editing, and we already succeeded in making the translation process a little more efficient. My first experience with automated speech recognition (ASR) and speech-to-text (STT) was in the scope of project AXES (2011-2015). This one was about finding novel ways to explore and interact with audiovisual libraries. The platform itself wasn’t sophisticated enough for use in a real production environment, but it certainly showed the power of HLT — which has been a focus topic of our department (DW Research & Cooperation Projects) ever since.

What is the most fascinating aspect about news.bridge?

There are many fascinating aspects, so it’s really hard to choose one. Perhaps the most striking thing is that the platform is so powerful, even though it’s based on a fairly simple concept. news.bridge covers virtually any language, and through the use of external tools, it remains state of the art.

What is the project’s biggest challenge?

Well, the overall challenge is to make news.bridge stable, scalable and provide a smooth workflow for the entire process of creating subtitles (and voice-overs). A challenge we’re currently focusing on is the seamless ingestion of existing scripts — which are rarely standardized. However, using original scripts (instead of ASR/STT) always leads to the best results, so we need to work on this.

Who’s in your team and what are they currently working on?

Here at DW, we’re currently four people: Ruben Bouwmeester, Hina Imran, Alexander Plaum, and myself. Ruben, who joined the project very recently, works on the project’s business development and marketing plan. He also makes sure we always have professionally designed dissemination material. Hina is our developer. She works on customized user interfaces for HLT applications and output, coordinating technical issues with LETA and within DW. She also manages and maintains a local test installation of news.bridge. Alex is in charge of communication and dissemination. He runs our website, this blog, and our Twitter account. He also works on brand design and sometimes acquires new partners. As for me, I am the HLT lead at our department and also the main coordinator of news.bridge. That means I take care of operations, oversee and report progress, organize user testing and plan implementation.

Where do you see news.bridge in five years?

news.bridge has already attracted quite a bit of attention. Many broadcasters, news producers and language technology providers are interested in implementing it as soon as possible. Multilingual content has become very important, and news.bridge will significantly speed up production workflows — with modest investment costs. We hope we’ll be able to offer a reliable service — local installations and software as a service (SaaS) — sometime in 2019. But we are not waiting for that. Our first major test case is here at Deutsche Welle, with its many newsrooms and its international orientation. We’ve started betatesting the platform for automated translation and subtitling of videos in different languages, and we’re taking if from there. It would be great if news.bridge became a standard HLT platform by 2023; it certainly has the potential. In order for that to happen, we need to find the right exploitation partners and strategies, of course. news.bridge is not a startup, but a media innovation project. When it’s finished, the platform will lead its own life.

Insights from our first user testing sessions

, ,

newsbridge_user_testing

Getting early input from the people you are designing for is absolutely essential – which is why we invited about a dozen colleagues to give the latest beta version of news.bridge a test run at the DW headquarters in Bonn last month. We had two really inspiring sessions with journalists, project managers and other media people working for DW and associated companies — and we have gained a number of useful insights. While some of them are too project-specific to share (news.bridge is not in public beta yet), there are also more general learnings that should make for an interesting journo tech blog post. Here we go:

Infrastructure and preparation

When inviting people to simultaneously stream and play around with news videos, make sure you have enough bandwidth. This may sound trivial, but it’s important, especially in Germany (which doesn’t even make the Top 20 when it comes to internet connection speed).

To document what your beta testers have to say as quickly and convenient as possible, we recommend to prepare digital questionnaires (e.g. Google Forms) and send out a link well before the end of the session. That way, you get solid feedback from everyone. It’s also a good idea to add a screenshot/comment feature (e.g. html2canvas) to the platform that is being tested. In addition, open discussions and interview-type interactions provide very useful feedback.

Testing automatic speech recognition (ASR) tools

Thanks to artificial neural networks, ASR services have become incredibly sophisticated in the last couple of years and deliver very decent results. Basically all of our test users said the technology will significantly speed up the tiresome transcription process when producing multilingual news videos.

However, ASR still has trouble when:

  • people speak with a heavy dialect and/or in incomplete sentences (like some European football coaches who shall not be named)
  • people speak simultaneously (which frequently happens at press conferences, for example)
  • complicated proper names occur (Aung San Suu Kyi, Hery Rajaonarimampianina)
  • homophones occur (merry, marry, Mary)
  • there is a lot of background noise (which is often interpreted as language and transcribed to gibberish)

As a result, journalists will almost certainly have to do thorough post-editing for a while and also correct (or add) punctuation, which is crucial for the subsequent translation.

Testing machine translation (MT) tools

What has been said about ASR also applies to MT: The tech has made huge leaps, but results are not perfect yet. Especially when you are a professional editor and thus have high standards. Something really important to remember:

The better and more structured your transcript (or uploaded original script),
the better the translation you end up with.

As for the limits of machine translation during our testrun, we found that “exotic” languages like Pashto (which is really important for international broadcasters like DW) are not implemented really well. Few services cover them, and the translation results are subpar. This is not a big surprise, of course, as the corpus used to train the algorithms is so much smaller than that of a major Western language like French or German. This also means that it is up to projects like news.bridge to improve MT services by feeding the algorithms high-quality content, e.g. articles from DW’s Pashto news site.

While MT tools are in general very useful when producing web videos — you need a lot of subtitling in the era of mobile social videos on muted phones — there are some workflows that are hard to improve or speed up. For example: How do you tap into digital information carriers that are an individually branded, hard-coded part of a video created in software like Adobe Premiere? Well, for now we can’t, but we’re working on solutions. In the meantime, running news.bridge in a fixed tab and copy-pasting your translated script bits is an acceptable workaround.

Testing speech synthesis

Sometimes, computer voices are indispensable. For example, when you’re really curious about this blogpost, but can’t read it because you’re on a bike or in a (traditional) car.

In news production however, artificial readers/presenters are merely a gimmick. At least for the time being. That’s because once your scripts are finished, reading/recording them isn’t that time consuming and will provide much nicer results. Besides, synthetic voices aren’t yet available in all languages (once again, Pashto is paragon).

Nevertheless, news.bridge beta testers told us that the voices work fairly well, and even sound pretty natural in some cases. They can be trained, by the way, which is an interesting exercise we will try out at some point.

HLT services and news production in a nutshell

If we had to sum up the assessment of our beta testers in just a few sentences, they would read something like this:

HLT services and tools are useful (or very useful) in news productions these days: They get you decent results and save you a lot of time.

news.bridge is a promising, easy-to-use mash-up platform, especially when it comes to transcribing and translating and creating subtitles (another relevant use case is gisting).

news.bridge is not about complete automation. It’s about supporting journalists and editors. It’s about making things easier.