AI can self-learn human language norms and patterns

At the dawn of this year, scientist Gary Marcus told CNBC that the most important AI breakthrough in 2022 “will likely be one that the world doesn’t immediately see”. The ‘suspense’ created by Marcus’s statement relies on AI’s ability to learn on its own and is getting more and more suspenseful with each new AI discovery this year.

We are getting a ton of advances in the field of AI in 2022.

For example, Meta researchers have recently developed artificial intelligence that, by analyzing brainwaves, can “hear” what people are hearing.

Leading human artists to despair, AI-created artwork won ‘the first-place blue ribbon’ and got the $300 prize. AI is creating Art, music, and articles, and is set to take over the

A couple of days earlier, Google’s DeepMind trained virtual bots to play matches of 2v2 football with one another in a bid to get AI to work together in teams.

And most recently, researchers at MIT, Cornell University, and McGill University, have taken a step further in this direction by developing an AI system to self-learn human language norms and patterns.

According to the findings published in Nature Communications, the machine-learning model generates rules that explain why the forms of those words vary when given words and examples of how those words change in one language to indicate other grammatical functions such as tense, case, or gender. For example, it may be discovered that the letter “a” needs to be added to the end of a word in Serbo-Croatian to turn the masculine form feminine.

This model can also learn higher-level linguistic patterns that can be used across multiple languages, enhancing its performance.

58 different languages were used to train and test the model using issues from linguistics textbooks. Each test included a unique set of words and word-form modifications. The model offered a reliable set of rules to explain the word-form modifications in 60% of the situations.

“One of the motivations of this work was our desire to study systems that learn models of datasets that are represented in a way that humans can understand”, said Kevin Ellis, an assistant professor of computer science at Cornell University and the paper’s primary author.

To develop an AI system that could automatically generate a model from many related datasets, the researchers chose to analyze the relationship between phonology (the study of sound patterns) and morphology (the study of word structure).

The researchers devised a model that could learn grammar, or a set of rules for creating words, using a machine-learning technique known as Bayesian Program Learning. By employing this approach, the model creates a computer program that solves an issue.

In this example, the grammar that the model believes provides the most logical explanation for the words and meanings in a linguistics problem is the program. They used Sketch, a well-known software synthesizer created by Solar-Lezama at MIT, to create the model.

When the model was tested on 70 textbook problems, it correctly matched the grammar of the complete word set in 60% of the cases and most of the word-form changes in 79% of the cases.

The model frequently produced surprising results. On one occasion, it revealed a valid option that made use of a textbook error in addition to the predicted response to a Polish language puzzle. This indicates, in Ellis’ opinion, how well the model can “debug” linguistics studies.

In the future, researchers are hoping to use this method to find surprising solutions to problems in various academic fields. They might apply the technique in other situations where applying advanced knowledge across connected databases is possible. For instance, according to Ellis, they might develop a method to infer differential equations from data on the motion of numerous objects.

Continuous development of AIs through untiring research has now one after another been turning into significant breakthroughs that Marcus had dreamed of.