Machine Translation in NLP: Examples, Flow & Models


There are over 6,500 recognized languages in the world. It is felt a need of the time to understand the written resource across the cultures. In that attempt, many old books are translated into local languages and preserved for reference.

Sanskrit, for example, the ancient language of Hindu heritage is said to have resourceful information of ancient ages. This is because very few know the Sanskrit language. It is likely to depend on some mechanism to seek information from the scriptures and manuscripts.

Many times we want computers to understand natural language. The good thing about computers is that they can calculate faster than we humans. However, the challenge of learning a natural language is very difficult to replicate on a computational model.

Machine Translation

The term ‘machine translation’ (MT) refers to computerized systems responsible for producing translations with or without human assistance. It excludes computer-based translation tools that support translators by providing access to online dictionaries, remote terminology databanks, transmission and reception of texts, etc.

Before the AI technology era, computer programs for the automatic translation of text from one language to another were developed. In recent years, AI has been tasked with making the automatic or machine translation of human languages’ fluidity and versatility of scripts, dialects, and variations. Machine translation is challenging given the inherent ambiguity and flexibility of human language.

What is NLP?

Natural Language Processing (NLP) is one of the branches in the spread of Artificial Intelligence (AI) technology. This discipline is concerned with the creation of computational models that process and understand natural language. NKP models essentially make the computer understand the semantic grouping of objects (e.g., the words “cat and dog” are semantically quite similar to the words “cat and bat”), text to speech, translating language, and so on.

Natural Language Processing (NLP) makes the computer system use, interpret, and understand human languages and verbal speech, such as English, German, or another “natural language”. A range of NLP applications is seen in practice today.

They are typically grouped in respective use cases, such as speech recognition, dialog systems, information retrieval, question answering, and machine translation have started to reshape the way people identify, retrieve, and make use of the resource of information.

NLP Examples

  • Voice/speech recognition systems, or query systems like Siri, work on the question and return an answer. Here you feed voice to a computer, and it understands your message.
  • Computer programs that read financial reports in plain English and produce numbers (e.g., inflation rate).
  • Job portal retrieving candidate details and auto-constructs resume and application to the job matching with skills.
  • Google Translate processes the text in the input string and maps it with language to translate it into the fly.
  • Google-like search engines return your documents after you type a word of the subject into the search box. For example, when you search for Tajmahal, Google gives you documents containing Tajmahal as an artifact and even a “Tajmahal” brand. Here, the English synonyms and English plural patterns are taken into consideration.

NLP Flow

Natural Language Processing is a kind of Artificial Intelligence. If you want to build an NLP program, you can start writing rules like “ignore an s on the end of a word”. This is the old school way of doing things, and it’s called the “rule-based” approach.

However, the more advanced techniques use statistical learning, where you program your computer to learn patterns in English. If you do this, you could even write your program only once and train it to work in many human languages.

The objective of NLP is to make human languages intelligible so that a programmed mechanism can interpret and understand the manuscripts. Here, the programmed mechanism we call a machine, and the manuscript is the language script fed to the program. The computerized program thus extracts the linguistic data in the form of digital knowledge.

The machine, rather than the statistical learning models, then transforms the language attributes into a rule-based, statistical approach intended to address specific problems and perform the task of processing the language.

In many older systems, particularly those of the ‘direct translation’ type, the components of analysis, transfer, and synthesis were not always clearly separated. Some of them also mixed data (dictionary and grammar) and processing rules and routines.

New systems have exhibited various modularity degrees, so system components, data, and programs can be adapted and changed without damage to overall system efficiency. A further stage in some recent systems is the reversibility of analysis and synthesis components, i.e., the data and transformations used in the analysis of a particular language are applied in reverse when generating texts in that language. Learn more about the applications of natural language processing.

Evolution of Machine Translation

Until the late 1980s, considerable research in machine translation was conducted during this phase, when the first Statistical Machine Translation (SMT) systems were developed.

Classically, the rule-based systems were used for this task, later replaced in the 1990s with statistical methods. Very recently, the deep neural network models arrived to achieve state-of-the-art results in a field that is rightly termed as neural machine translation.

Statistical machine translation replaced classical rule-based systems with models that learn to translate from examples.

Neural machine translation models fit a single model instead of a refined pipeline and currently achieve state-of-the-art results. Since the early 2010s, this field has then largely abandoned statistical methods and then shifted to neural networks for machine learning.

Several notable early successes on statistical methods in NLP arrived in machine translation, intended to work at IBM Research. These systems were capable of taking advantage of existing multilingual textual bodies produced by the Parliament of Canada and the EU as an outcome of laws requiring the translation of all governmental proceedings into various official languages of the government’s corresponding systems.

However, many other systems were dependent on corpora that were specifically developed for the tasks implemented by these systems, which was and continues a major restriction to systems’ development. Therefore, a need for a great deal of research arose into methods of effectively learning from limited data.

For instance, the term Neural Machine Translation (NMT) emphasizes that deep learning-based approaches to machine translation directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word alignment and language modeling that was used in statistical machine translation (SMT). Google started using such a model in production for Google Translate in late 2016.

Sequence to Sequence Model       

Normally, the sequence-to-sequence model comprises two parts; first, an encoder, and second, a decoder. They are two different neural network models working hand-in-hand as one big network.

The decoder part of the model then generates a mapped sequence in the output. Decoder encrypts the string and adds meaning to the sequence in representation. An encoder-decoder approach, for neural machine translation, encodes the entire input string of a sentence into a finite length vector from where the translation gets decoded.

Broadly, an encoder network’s function is to read and analyze the input sequence to make meaning and then generate a small dimensional representation of the input string. The model then forwards this representation to the decoder network.

The EncoderDecoder LSTM is a recurrent neural network designed to address sequence-to-sequence problems, sometimes called seq2seq. Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in deep learning.

For example, when the string in the input sequence is “What is this place,” then after this input sequence is parsed through the encoder-decoder network synthesizes the string using the LSTM blocks (a type of an RNN architecture.) The decoder then generates words in the sequence in every step of the decoder’s iteration.

After the total loop of iteration, the output sequence is constructed, say something like “This place is Pune.” The LSTM network is made suitable to classify based on the rules, analyze to process the input and make predictions using the trained data examples.

Attention Model

“Attention” model, which highly improved the quality of machine translation systems. Attention allows the model to focus on the relevant parts of the input sequence as needed.

An attention model differs from a classic sequence-to-sequence model in two main ways:

  • The encoder passes a lot more data to the decoder. Instead of passing the encoding stage’s last hidden state, the encoder passes all the hidden states to the decoder.
  • An attention decoder does an extra step before producing its output.

Transformer Model

A sequential computation cannot be parallelized since we have to wait for the previous step to finish before moving on to the next one. This lengthens both the training time and the time it takes to run inference. One way around the sequential dilemma is to use Convolutional Neural Networks (CNNs) instead of RNNs. The transformer is a model that uses attention to boost the speed. More specifically, it uses self-attention. Here, each encoder consists of two layers:

  • Self-attention
  • A Feed Forward Neural Network

Transformers use Convolutional Neural Networks together with attention models for machine translation. Transformers are a type of neural network architecture that has been gaining popularity. Transformers were recently used by OpenAI in their language models and used recently by DeepMind for AlphaStar, their program to defeat a top professional Starcraft player. The Transformers outperform the Google Neural Machine Translation model in specific tasks.


In a nutshell, the self-attention mechanism in the system allows the variance of inputs to interact with each other (“self”) and lets them decide whom they should pay more attention to (“attention”). The processed outputs are thus aggregates of these interactions and weighted with attention scores.

The fundamental understanding of MT in NLP helps data analysts and data scientists prepare themselves to undertake prestigious projects involving projects in the NLP discipline of AI. The training courses in the subject, by the providers such as upGrad, help take the journey ahead. The brand upGrad is an online higher education platform providing a vast range of industry-relevant programs ready to lead your professional success.

If you’re interested to learn more about machine learning & AI, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Lead the AI Driven Technological Revolution

Enroll Now @ upGrad

Leave a comment

Your email address will not be published. Required fields are marked *

Our Popular Machine Learning Course

Accelerate Your Career with upGrad