Evolution of Learning Automata in Natural Language Processing (NLP)


NLP — Natural Language Processing is a generic word that relates to technology that alters natural language, including such as speech and text, autonomously. The theory of automata is important in giving answers to a variety of difficulties in NLP. Speech recognition, information retrieval, and spelling correction are just a few examples. Because the modeling of facts using rules has several features for language modeling, finite-state approaches are very effective in processing natural language. The theoretical model of a finite state automaton is relatively intelligible; data can be expressed in a condensed manner, and it permits the automated synthesis of system components. Transducers offer some output response for the given input, FSM (both DFA and NFA) give string approval and denial decision. As a result, the two machines are extremely beneficial for language processing jobs. Finite state automata are useful for determining whether or not a particular word refers to a specific language. Words can even be parsed and synthesized using transducers in their lexical form.

Related work

Linguistics and automata theory used to be inextricably linked. Markov used finite-state processes to anticipate vowel and consonant patterns in Pushkin’s works quite early on.

Significant work in automata theory, languages, including natural language processing (NLP), on the other hand, began to break afterward. Linguists went the opposite way, while automata theorists investigated several theory-driven generalizations.

Formalism was rejected in favor of a more organic approach. For a period, computational linguistics concentrated on CFG extensions, many of which were Turing equivalent. During the Speech recognition research, the researchers went back to FSAs in the 1970s. These formal devices turned out to be remarkably successful. The reason behind the success was because they have associated algorithms that were efficient enough for practical computers at the time, so it could be easily implemented.

Since the introduction of tree automata, it has reawakened the attention of computational linguists in the modern era. Techniques like these became more prevalent as we started dealing with problems related to language translation, where the task was sensitive towards the syntactic structure. To aid research, generic tree automata toolkits have been developed.

To demonstrate how finite-state transducers are utilized in natural language processing, two examples are provided.

1) Transliteration:

Transliteration is defined as the process of borrowing technical terms from one language to another. It can be considered a trivial task for a few language pairs. For instance, the names of celebrities like Will Smith will appear the same in a Spanish or an English newspaper. While the word movie in English gets translated to ‘la película’ which is a consistent and predictable pattern. When the language pairs use separate character sets and sound systems, the task becomes substantially more difficult.

Let’s take an example of transliteration of Katakana (A Japanese syllabary) to English.

The task is hard for humans as the number of potential transliterations is quite high. The employment of finite-state automata can help us deal with the combinatorial explosion. A single finite-state transducer can be considered at the first step. With something like a transformation chance of P(E|K), the whole transducer converts channels of Katakana signs K into streaming of English letters E. At the last step, the most likely E selected:

The corresponding transducer design will be looking like this:

Katakana ⇒ WFST ⇒ English

The famous Bayes’ law could be used to distinguish the probability of a well-formed E from the probability of conversion between K and E:

The corresponding transducer design now will be looking like this:

WFSA ⇒ English ⇒ WFST ⇒ Katakana

This diagram can be viewed as an explanation for similar cases where we use Katakana strings. “Generative tales” is a term used to describe these explanations.

In the following design, we break the first transducer into a chain of three transducers.

The design is justified because of the usage of conditional probability chain rule. Throughout this case, the statistical concept is split into such a network of discrete distributed rather than a single distribution.

The probability model equation then becomes:

Now it would be simple enough to build them after the division of one complex automation into a chain of automata.

  1. Translation:

While the challenge of automatic translation has not been addressed, significant progress has been achieved in recent years. The automatic analysis of massive manually translated papers, such as those generated each year by the United Nations and the European Union, is responsible for most of this development.

The following design is taken for translation:

Because each word must be interpreted in the context of all the other words, this design is problematic once again. As a result, we use the noisy-channel model technique.

Grammatical hypotheses can be rewarded when this set is intersected with the English WFSA. It’s worth noting that the WFSA can assist with both word choice and word ordering and that it can be trained on large amounts of monolingual English literature.

The transducer cascade is used to convert Spanish to English in the reverse direction, rather than in the forward direction. At the start, Spanish input is sent backward through WFST E to get a variety of re-

orderings, including what we believe is an English-like ordering. Finally, the results are compared to WFSA A, a program that favors well-formed English.

Other applications of Weighted String Automata include Speech Recognition, Lexical Processing, Tagging, Summarization, OCR, etc.

In day-to-day life, we encounter numerous issues while dealing with customers. Handling such a huge number of complaints at once is a difficult task to accomplish. Issues like understanding a large set of text through social media, handling large data, which is available in unstructured form, can be handled by machines. Machines can execute such tasks as it takes help from NLP algorithms. These algorithms can perform tasks like text preprocessing, speech recognition, sentiments analysis, and many more to better understand our text recognition, sentiments analysis, and many more to better understand our text.

Proposed Methodology

We must process natural language using automata in this article, we have used various machines to contribute to natural processing language acceptance. To get through this let us first see the tools of automata to process natural language.

Suppose we have a sentence of English like Bili is a cat

The dog is not a Bili

In the English language, only a certain sequence of words in a sentence is fine.

By reading this one can check the sentence according to the rules of English grammar and various constraints, day by day when we read various sentences, we are able to develop an idea of speaking and getting how to differentiate a language from other

For determining the correctness of the sequence of words in a sentence we have a finite state machine, we are giving the sequence in FSM using finite automata states. first, let us discuss finite state machine FSM. In this, we have a set of states and special states consisting of starting state and an ending state and several connections called transitions which will take us from one state to another state. by making use of several FSM, we design a machine that may have output associated either in states or in transition.

this we have a set of states and special states consisting of starting state and an ending state and several connections called transitions which will take us from one state to another state. by making use of several FSM, we design a machine that may have output associated either in states or in transition.

If we give a sequence of input to the FSM, then it will move from the starting state till the end state using transition and covering the states. Let us see this with an example suppose, you went to an animal pet shop, which sells cats by accepting payment in the form of money in dollars and diamonds (cost is 150 dollars per diamond made of aluminum)

, and gold coin ( cost is 75 dollars made of silver ) and a gift card ( cost is 25 dollars per gift card ) .and if a user wants to buy a big cat he has to pay 250 dollars. to pay this he/she can use any of the methods to pay

250 dollars, now in this FSM you can understand, get a big cat you can pay 250 dollars you will directly reach from start state to final (end) state, or you can first use a diamond of 150 dollars to reach 3rd state and then use gold to reach the final state

to buy the cat. Or by using the third way you can use the gift card and then gold and then gift and then again gift card to buy the cat.

In this, we have restrictions on the use of resources to buy the big cat.

The cats live inside the house on their own. The cats live on their own.

In deterministic FSM, we have a unique transition from one state to another state while in non-deterministic we may have more than one transition for the same input.

The cats are smaller than dogs. The cats are cuter than dogs.

Fig. Non-Deterministic FSM visualization


Language Recognizer

Many jobs need the use of a language recognition system. For instance, a lexical analyzer, morphological analysis, and language recognition are just a few examples. As a language identification system, deterministic and non-deterministic finite machines are extremely effective. An NFA that identifies a certain word can be readily created. Figure 1 depicts that NFA for such word’s ‘bat’ and ‘boy.’ Similarly, an NFA may be created for each word, as well as several NFAs can be integrated to make a language’s spellchecker or thesaurus compilations.

Figure-1. NFA for the words ‘bat’ and ‘boy’.

Finite automata are capable of recognizing inflectional morphology. We may create a distinct NFA for every type of keyword and afterward merge them via transitions. For example, some NFA could recognize nouns and their plurals, while another NFA can recognize verbs and their various forms, as well as the two NFA, may then be integrated. The NFA for several words, as well as their morphological modifications, are shown in Figure 2.

Figure-2. Several words including its morphological versions are NFA

Morphological generation and parsing

The morphological parser was indeed a technology that generates the language’s semantic framework or splits the term down among stems and enhancements or identifies the phrases with an identifiable label. The term ‘books’ can also be translated as s + books & then as N + book + PL, whereas the term went could be processed as V + go + PAST, and so on. The reversal of parsing is generation., in which the semantic version of the word is combined to make a word. For instance, N + book + Pl yields the phrase ‘boxes.’ In morphological parsing, finite-state transducers are very effective. Imagine the usual derivational ‘girl +N +PL’ in its lexical form. For the sake of simplicity, let's assume that x tends the word ‘girl’. Because the source and destination of a transducer would be the same for any normal singular noun, a variable x may be used to use for a noun. Any ordinary noun, such as

‘boy,’ can be substituted for the word ‘girl.’

Fig. 3. Generation of word from its lexical form through the transducer.

Orthographic norms could be utilized for uncommon inflections and equations, and non-deterministic finite automata can be generated for each term and its changes inside a system.


We have shown the sequence of words of the English language accepted using FSM. We have recognized language using finite automata.


An overview of finite automata implementations in natural language processing is presented in this work. Several instances of morphological analysis and parsing of Simple language have been provided. Finally, finite automata were used to accept and recognize several English words, which is a crucial aspect of regulation machine translation.


1. https://www.cs.rochester.edu/u/james


2. http://citeseerx.ist.psu.edu/viewdoc/do



3. https://aclanthology.org/W02-0110.pdf

4. https://www.researchtrend.net/ijet/pdf


5. https://citeseerx.ist.psu.edu/viewdoc/d






Love podcasts or audiobooks? Learn on the go with our new app.

Auto-Sklearn: Scikit-Learn on Steroids

Applied RL: Customization of RL policies using StableBaselines3

Reinforcement Learning — Monte-Carlo for policy evaluation.

Accelerating ML Models using Microsoft’s HummingBird

Ideas on how to fine-tune a pre-trained model in PyTorch

Key Criteria's for Machine Learning Solution Evaluation

Tackling Big Data with Acceleration and High Accuracy

Transfer Learning — Self-taught Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anshumaan Phukan

Anshumaan Phukan

More from Medium

Ethics in Natural Language Processing

Ethics in Natural Language Processing

Basic Difference between Word Embeddings and Word Vectors

List of Machine Learning dataset from different domain

State of Transfer Learning in NLP