Dreams of "smart machines," the defeat of expert systems, and the triumph of transformers

Translation. Region: Russian Federation –

Source: Novosibirsk State University –

An important disclaimer is at the bottom of this article.

Sergey Ospichev, PhD in Physics and Mathematics, Deputy Director of the Mathematics Center in Akademgorodok, and Acting Head of the Department of Computer Science and ICT at the Specialized Scientific Center of Novosibirsk State University, discussed how artificial intelligence evolved from the fantasies of the past about thinking machines to today's large-scale language models. His lecture, "Artificial Intelligence: Origins and Evolution," was held as part of "Darwin Week"—a popular science marathon traditionally held at Novosibirsk State University in February. This year, the event was held for the first time on the new NSU campus.

From Golem to "Rent a Human"

Sergey Ospichev began his lecture with a quote from the film "Blade Runner," which, in his opinion, describes AI very well: "I don't think, I calculate, but the difference is already becoming unclear." He cited the definition of AI given by Chinese researcher YX Zhong back in 2006 in her article "A Cognitive Approach and AI Research": "Artificial intelligence is a branch of modern science and technology aimed, on the one hand, at exploring the secrets of the human mind and bestowing upon machines the advantages of human intelligence, and on the other, at enabling machines to perform functions as intelligently as they are capable of…"

Sergei Ospichev cited the earliest example of artificial intelligence, which existed, however, only as a fantasy of a "non-living" yet powerful assistant to humans. This was a clay giant, brought to life through Kabbalistic rituals. It was activated and deactivated by a magic word written on a scroll and placed in the idol's mouth. Upon receiving an order, it independently decided how to carry it out. It operated according to a predetermined program, a kind of machine operating from instructions. Back then, in the 17th century, humans gave orders to an artificial intelligence, albeit a primitive and fictitious one, but recently this has begun to change.

"A portal called 'Rent a Human' has appeared online, where neural networks can select a human to perform various tasks they couldn't do on their own: for example, photographing objects, delivering goods or receiving packages, or emotionally evaluating certain events or phenomena. While this platform is still experimental, a trend is emerging: AI is now beginning to manage people. Whether this is a good thing or not is still unknown, but this is the world we live in," said Sergey Ospichev.

First ancestors

Sergey Ospichev proposed examining the evolution of AI from the early 20th century. He discussed the ups and downs of this challenging path and analyzed the important milestones in this process.

The first to embark on this path was the German researcher David Hilbert (1862-1943), one of the most renowned mathematicians of the last century. The telegraph and railways became symbols of that time, and the prevailing mood was optimism and faith in science. Hilbert proposed the creation of a unified formal language of mathematics, based on simple arithmetic. This language was to presuppose the algorithmic decidability of all science. Why was this so necessary? With the advent of the telegraph, the world changed. Science instantly became international, and scientific knowledge became instantaneous. Scientists from different countries now had the opportunity to actively communicate with each other, exchange news, and organize international conferences, congresses, forums, and symposia. Therefore, mathematicians urgently needed a unified formal language understandable to all scientists.

An arithmometer is a desktop mechanical machine designed to accurately perform four arithmetic operations: addition, subtraction, multiplication, and division.

"At the beginning of the last century, many believed that science would solve all problems, and that a good adding machine would enable one to perform any calculation and achieve great achievements in mathematics, physics, and other sciences. David Hilbert was no exception, proposing to formalize mathematics. However, the Austrian logician, mathematician, and philosopher of mathematics, Kurt Gödel (1906-1978), entered the picture with his incompleteness theorem, according to which any algorithmically decidable theory that extends arithmetic is incomplete. He argued that it is impossible to formalize mathematics based on arithmetic and using algorithmic methods. An 'artificial' mathematician cannot replace living intelligence. For us scientists, on the one hand, this is very sad, because we will never see an automated mathematician, but on the other, it is wonderful, because we will always have work to do," explained Sergei Ospichev.

A Turing machine is an abstract computing machine, a mathematical model of computation, proposed by the eminent British mathematician Alan Turing (1912–1954) in 1936 to formalize the concept of an algorithm. It is considered the foundation of computability theory and is used to formally define which problems can be solved using algorithms.

A key discovery during this early period of AI was the Turing machine. This scientist shifted discussions of algorithms from philosophy to engineering. During World War II, the idea of Turing's abstract machine was combined with the idea of breaking the German Enigma encryption machine, which was then actively used to transmit secret messages. Ultimately, Alan Turing developed the Bombe, a code-breaking machine that earned him a place in history as the Enigma breaker and the founder of AI.

"The Turing machine became the ancestor of modern computers, but its creator also formulated the Entscheidungsproblem (decidability problem), proving that not all computations can be performed by computers—there are algorithms that cannot be written in any programming language. This poses a complex problem: on the one hand, an engineering approach is used, creating complex adding machines and computing machines, while on the other, scientists are well aware that not all problems can be solved with these tools. I like to call this 'computability schizophrenia,'" said Sergei Ospichev.

At the start

The term "artificial intelligence" emerged in 1956 at a Dartmouth seminar. This seminar is considered the beginning of AI development. A surprising situation arose here: not a single paper was published following the seminar, yet many of its participants became widely recognized as the "founding fathers" of AI. Important events in the background: the Cold War and the start of the space race. There was talk in the scientific community that computing power would not be sufficient to launch satellites into space.

Humanity has already invented computers and confidently uses them. The era of microchips has not yet arrived. "Smart machines" are still weak and gigantic in size—one of the fastest computers occupies 280 square meters and weighs 25 tons. It is only suitable for simple arithmetic calculations. A new method of calculation must be adopted, accelerated, and optimized. At a Dartmouth seminar, American mathematician John McCarthy (1927–) coined the term "artificial intelligence." He would later invent the Lisp programming language, become the founder of functional programming, and receive the Turing Award for his enormous contribution to artificial intelligence research.

Under the ban

Another crucial link in the evolution of AI was the invention of American psychologist and neurophysiologist Frank Rosenblatt (1928-1971) of Cornell University (USA). He designed and built the first numerical computer, the Mark I, which could recognize some handwritten letters of the English alphabet. Crucially, the computer learned all this on its own. The Mark I became the first neural network built in hardware. Naturally, the invention was a resounding success, spurring the need to study perceptrons and create increasingly complex neural networks.

The Rosenblatt perceptron (1957–1960) is one of the first artificial neural network models, simulating the brain's perception process. It consists of sensory (S), associative (A), and reactive (R) elements, operating as a linear binary classifier with a threshold activation function. It is based on learning with weight correction.

However, the euphoria was short-lived. A few years later, the book "Perceptrons" by MIT AI scientist Marvin Minsky (1927-2016) and mathematician Seymour Papert (1928-2016) was published. In it, the authors argued that "…increasing the size of a perceptron does not improve its ability to solve complex problems." Thus, Minsky was likely trying to attract attention (and funding) to his work, but the result was unexpected: interest in neural networks waned, funding for research ceased, the term "AI" itself was banned, and Minsky earned the nickname "Neural Network Killer." Thus, due to the rivalry between the two organizations, AI development stalled for decades.

Too complicated!

Sergey Ospichev surprised the audience when he said that the first multilayer neural networks appeared in the 1970s. Since neural networks were tacitly banned and even mentioning them was discouraged, let alone pursuing research in this area, the expert system relied on logical rules.

Logical programming languages are becoming increasingly popular. This isn't surprising: since, as Marvin Minsky wrote in his book, we can't train a system because it doesn't work, we have to write all the rules ourselves. The first very complex expert systems are emerging. One of them, MYCIN, is a medical expert system initially created at Stanford and designed to diagnose infectious diseases (meningitis, sepsis) and recommend antibiotics. It used a rule base (about 600) and backward inference, demonstrating accuracy on par with expert doctors and even higher. True, it was only 2.6% higher, but still. By comparison, it suggested acceptable therapy in 65% of cases, while doctors did so in 62.5% of cases. This system raised the first questions about AI ethics, but it never found application due to the complexity of data entry, as the patient had to answer approximately 200 questions before the system could make a treatment decision. At best, data entry took half an hour or more, said Sergei Ospichev.

Generation V

The 1980s were marked by a technological boom in Japan and the advent of microprocessors. Japan was dominating the computing market. The flow of data was growing, and computing power to process it was becoming insufficient.

The advent of microprocessors changed the world of computers—they became smaller and more powerful. They now weighed 5 kg instead of 28 tons. True, they were expensive, and not everyone could afford a personal computer at home, but it was a major step forward.

Seeking to maintain technological leadership, in 1982 the Japanese government initiated a massive 11-year program with funding of 50 billion yen ($500 million). Other countries later joined the race. A breakthrough in applied AI was expected, but the bets were placed on technologies that were already obsolete at the outset: supercomputers with hardware capable of distributed computing. The term "AI" remains taboo: instead, scientific papers use terms such as "data processing," "automated image analysis," "automated approach to formula processing," and so on. Imperative languages began to flourish, while logical ones began to lose ground.

Dark blue thaw

In the 1990s, personal computers became ubiquitous, and the World Wide Web grew exponentially. A new certainty arose: machines were smarter than humans! Confirmation of this appeared in 1997 and was widely publicized. A sensation: the IBM supercomputer Deep Blue defeated world champion Garry Kasparov for the first time in a six-game classical match, with a score of 3.5–2.5. This historic event marked the first victory of artificial intelligence over a reigning champion, marking a new era in chess and the development of AI technologies.

"Of course, this was very important for AI companies—it was a wonderful opportunity for them to emerge from the shadows and develop AI openly: publish articles about their research in journals, open departments at universities, implement their developments, and apply for funding. True, there were theories that this victory was the result of a coding error that caused the computer to make an unconventional move that determined the outcome of the game. But on the other hand, Deep Blue opened up AI to society, and people realized that AI was possible, that it was something big, important, and that it would change our lives. By today's standards, Deep Blue was a very weak computer, with very little artificial intelligence, and it didn't yet have thinking, but rather computation, but it was certainly one of the most important steps in modern AI," shared Sergey Ospichev.

Video cards – a second life

Multilayer neural networks were further developed by developments not originally intended for serious tasks—gaming video cards. They made it possible to overcome the insufficient computing power of the computers of the time for the necessary calculations.

The market was oversaturated with video cards—they were being produced in far greater numbers than gamers of the time needed, and they were much more expensive than they could afford. Furthermore, these video cards were much more powerful than the games of the time. Then, technology was developed that allowed them to be used for computing. Nvidia, the company that manufactured them, began donating these video cards to various universities for free, so that scientists could try them out in solving their own problems. In 2012, Ilya Sutskever, Geoffrey Hinton, and Alex Krizhevsky, the developers of the AlexNet convolutional neural network, also received them. By combining two video cards and obtaining 6 GB of video memory, they were able to win a major image processing competition. In creating their neural network, they outperformed classic machine learning algorithms developed 5-7 years earlier, demonstrating the superiority of the GPU—a specialized electronic chip for parallel data processing, graphics rendering, and acceleration of complex calculations. They succeeded in setting off a chain reaction that led to the popularity of deep learning today. Neural networks were rehabilitated," said Sergey Ospichev.

Three Horsemen of AI

Today, the development of neural networks is driven by three AI horsemen: arXiv, the largest free open archive (repository) of electronic preprints of scientific articles, transformers, and a chatbot based on the Generative Pretrained Transformer (GPT).

ArXiv is a preprint database containing 2.5 million articles, over 30,000 downloads per month, and 200 AI articles per day.

"Machine learning science is advancing very rapidly, and decisions to publish articles in scientific journals are made over a fairly long period of time—a year or two. Within two years, an article in machine learning will have disappeared from the world of machine learning—it will have lost its relevance and novelty. On this resource, you can immediately post your article so that colleagues can read it, discuss it, start using it, and share recommendations without waiting for official publication. Articles appear here instantly, making ArXiv one of the main hubs of machine learning today," explained Sergey Ospichev.

The second "horseman of AI" is Transformers—the next generation of neural networks, a kind of bridge between AlexNet and modern GPT systems. They enable deep learning for text processing. Next to them is the "third horseman," ChatGPT—a chatbot based on a generative pre-trained Transformer, which already receives billions of queries per year. GPT allows us to quickly and efficiently process texts, translate them from one language to another, search for data, generate sentences from them, and so on. It appeared in 2020, and its "successors" were subsequently developed, which are now our constant assistants.

What a twist!

And yet, no matter how tempting it may be to embrace AI, one cannot trust it completely. Whatever it does must be verified by natural intelligence. For example, after his lecture, Sergey Ospichev admitted that several opening quotes were generated by an AI neural network. The phrase in question was not found in the film "Blade Runner." And the photo of the Chinese researcher who outlined her vision of AI in a scientific paper cited in the lecture was also generated by the DeepSeek neural network.

Please note: This information is raw content obtained directly from the source. It represents an accurate account of the source's assertions and does not necessarily reflect the position of MIL-OSI or its clients.