From ASR to xml:tm – Human Language Technology Abbreviations Spelled Out!


Let’s face it: Media tech people live in a language filter bubble. Every community uses their own lingo – which already makes communication somewhat difficult. To make matters worse, communities go on to create a whole bunch of abbreviations, as their favorite terms are long and unwieldy. HLT pros – that’s human language technology professionals – are no exception here. The news.bridge consortium would like to try and fix the problem (or at least a part of it). That’s why we’ve created the following list of terms that spells out the cryptic language we use in our daily business.

Please note that all included abbreviations have a strict focus on speech, language, and translation. So even though deep neural networks (DNN) and hidden Markov models (HMM) are very important, we didn’t put them on the list, because they’re not exclusive to HLT. We also decided to go without full definitions (a lot of terms are actually self-explanatory once they’re spelled out), but rather include a number of links as well as information on whether an abbreviation represents a general term, a paradigm, or a file format. Ok, end of introduction. Here’s our list, we hope it’s useful:


  • ASR = Automatic speech recognition
  • CAT = Computer-assisted translation
  • DNT = Do not translate (because this is: a proper name, a trademark etc.)
  • EBMT = Example-based machine translation (an MT paradigm)
  • G11n = Globalization (“G”, followed by 11 letters, followed by “n”)
  • GILT =  Globalization, internationalization, localization, translation
  • HPMT = Hierarchical phrase-based machine translation (a statistical MT approach)
  • ISG = the Industry Specification Group for localisation industry standards (= the successor of LISA) at the European Telecommunications Standards Institute (ETSI)
  • I18N = Internationalization (“i”, followed by 18 letters, followed by “n”)
  • L10n = Localization (“l”, followed by 10 letters, followed by “n”)
  • LIS = Localizsation Industry Standard(s)
  • LISA = Localization Industry Standards Association
  • LSP = Language services provider
  • MT = Machine translation
  • NMT = Neural machine translation (an MT paradigm)
  • NER = Named-entity recognition (= the extraction of the names of persons, organizations, locations, etc. from a text)
  • NLG = Natural language generation
  • NLP = Natural language processing
  • OLIF = Open Lexicon Interchange Format (= an open standard for the exchange of terminological and lexical data)
  • PBMT = Phrase-based machine translation (an MT paradigm)
  • POS tagging = Part-of-speech tagging (= the identification of words as nouns, verbs, adjectives, adverbs, etc.)
  • SBMT = Syntax-based machine translation (a statistical MT approach)
  • SLU = Spoken language understanding
  • SMT = Statistical machine translation (an MT paradigm)
  • SRX =  Segmentation Rules eXchange (an enhancement of the TMX standard)
  • STT = Speech-to-text
  • T9n = translation (“T”, followed by 9 letters, followed by “n”)
  • TBX = TermBase eXchange (a standard for exchanging terminological data)
  • TEP = Translate, edit, proofread
  • TM = Translation memory (= a database that stores translated sentences, paragraphs etc.)
  • TMM = Translation memory manager (= software tapping into a TM)
  • TMS = Translation memory system
  • TMX = Translation memory eXchange (= a standard that enables the interchange of translation memories between translation suppliers)
  • TQA = Translation quality assurance
  • TransWS = Translation web services (= a framework for the automation of localization processes via the use of web services)
  • TTS = Text-to-speech
  • TU = Translation unit (= a segment of text treated as a single unit of meaning)
  • UTX = Universal terminology eXchange (= a standard specifically designed for user dictionaries of MT)
  • WBMT = word-based machine translation (a statistical MT approach)
  • WER = Word error rate
  • XLIFF = XML localization interchange file format (= a language tech industry standard for exchanging XML data)
  • xml:tm = XML-based text memory (= an approach to translation memory based on the concept of “text memory”, which is a combination of author and translation memory)

This list is a work in progress. If you feel something important is missing, please drop us a line.