Translation. Region: Russian Federal
Source: Novosibirsk State University –
An important disclaimer is at the bottom of this article.
The first two certificates of awarding academic degrees were presented at the Dissertation Council for Technical Sciences Faculty of Information Technology, Novosibirsk State University. Both PhD theses are devoted to computational linguistics: Dmitry Morozov developed a system for assessing the complexity of text using machine learning methods on the example of the Russian language, and Davlater Mengliev developed a hybrid algorithm for recognizing named entities in the Uzbek language. In August, another PhD thesis will be defended, which is devoted to the application of mathematical modeling methods in geophysics.
— We note the high demand for the Scientific Council for Technical Sciences created at our faculty. Its requirements for dissertation defenses are less formalized than those of the Higher Attestation Commission (HAC), but it sets higher requirements for the quality of publications. Due to these circumstances, our Council will be in demand by a number of employees of both scientific organizations and high-tech companies, for whom the procedure for defending dissertations established by us will be more convenient, but one should not assume that it is simple. This can be confirmed by our first two applicants, who submitted all the necessary documents to the Council and successfully completed all the established and strictly regulated procedures, spoke several times at seminars in front of the scientific community, received high marks for the quality of their work from specially created commissions with the involvement of experts from our Dissertation Council and external experts from several regions of our country and neighboring countries. We are glad that Dmitry Morozov and Davlater Mengliev successfully passed all these tests and their PhD diplomas have the same status as diplomas issued by the Higher Attestation Commission, said Mikhail Lavrentyev, Dean of the NSU Institute of Information Technologies and Corresponding Member of the Russian Academy of Sciences.
Head of the Department of Mathematical Modeling of the Faculty of Mechanics and Mathematics of NSU, Professor of the Department of Informatics Systems and the Department of General Informatics of the Faculty of Information Technologies of NSU, Doctor of Technical Sciences Vladimir Barakhnin noted that it is no coincidence that the first two defenses of dissertations for the degree of candidate of sciences are related to computer linguistics – this is evidence of the relevance of this topic.
— As neural networks and large language models develop, so-called glitches become more and more apparent. The abundance of information loaded onto them inevitably generates a wider range of fake information, and these models are simply no longer able to assess the truth of the information. Therefore, direct or combined methods of information processing that contain classical direct approaches remain important. It is they, as many specialists believe, that will be able to correct the work of large language models. These approaches were used in their works by Dmitry Morozov and Davlater Mengliev. In order for the development of neural networks and large language models not to reach a dead end, it is necessary to involve classical methods of computational linguistics, which uses knowledge of language. In this context, this knowledge is the modeling of human thinking. Neural networks model neural connections in the human brain, but not thinking, and thus implement a purely mechanistic approach to the process of information processing, which is unthinkable without human participation, because humans are both the producer and the end consumer of any information. Therefore, language processing should include an understanding of how it is structured, and not be a mechanical collection of information into large language models, explained Vladimir Barakhnin, the scientific supervisor of both degree candidates.
Dmitry Morozov's research is particularly relevant because it aims to establish a correspondence between the text and its potential reader. As Vladimir Barakhnin explained, there is currently a large gap between generations: many words in texts that seem quite understandable to representatives of the older generation turn out to be completely unperceivable for young people. In most cases, these are obsolete words, and in order to understand them, schoolchildren have to turn to dictionaries. The algorithms developed by Dmitry Morozov are aimed at ensuring that the information consumer receives information adequate to his level of education. Then his development and enrichment of his vocabulary will occur gradually. The importance of these algorithms lies in their real adaptation to the properties of the information consumer and taking into account his capabilities. The expert's assessment is mostly subjective, and therefore not very reliable, and the methods of objective control developed in Dmitry Morozov's dissertation allow for a more thorough educational process in the humanities.
— The topic of my dissertation is “Text Complexity Assessment Using Machine Learning Methods on the Russian Language.” It is devoted to assessing how well the text will be understood by the reader or how well the reader should be prepared to understand what is written. This is necessary to assess the complexity of various instructions. Such texts should be understandable to people without special education and training. But there is a problem: they are created by people who have special knowledge about the subject of the narrative, and therefore much of what is incomprehensible to outsiders seems obvious to them. It is difficult for them to objectively assess the text they are creating. On the other hand, a person who does not have this knowledge, assessing the complexity of the text, must fully familiarize himself with it and give his own assessment. This takes a lot of time. Therefore, a vast field for automating the process is being formed in this area. We have a variety of pre-trained large language models that can be used within the framework of different algorithmic approaches and assess the complexity of the text automatically. My dissertation details how to use them to construct a description of a text, so that the resulting description can then be converted into an assessment of linguistic complexity, said Dmitry Morozov.
The young scientist's development will find application in compiling instructions for complex products. It is also proposed to use this complex to create a collection of texts that would be understandable to schoolchildren of different ages. This is necessary so that linguists can further study their vocabulary, because the various texts read by schoolchildren become an important source of new words in their vocabulary. In this way, they will be able to create different collections of words and predict which of them are known to schoolchildren and which are not, relying not on subjective experience, but on objective data.
The research of the second candidate for the academic degree Davlater Mengliev, according to his scientific supervisor Vladimir Barakhnin, is a pioneering one for Uzbek computer linguistics, which began to develop relatively recently. According to him, at present, an entire scientific school has begun to take shape at NSU and several postgraduate students from the Republic of Uzbekistan are working on the development of this topic.
— I devoted my PhD thesis to the development of a hybrid algorithm for recognizing named entities in the Uzbek language. This algorithm allows extracting key information from the text and recognizing it. Similar developments already exist for other languages, but for Uzbek, as well as for all Turkic languages in general, such work has not yet been done. The use of a hybrid approach, which involves the use of not only modern neural networks, but also traditional rule-oriented algorithms, which, together with several architectures, contributed to achieving good results, gives additional relevance to my work. At the moment, my development has been implemented in various organizations of the Republic of Uzbekistan, in particular, in the reception office of the governor of the Khorezm region. With the help of this algorithm, key information is extracted from requests and applications received by the institution and sent to the relevant divisions and departments. Since there are many dialects in the Uzbek language, my work in this direction is not yet complete, — explained Davlater Mengliev.
Secretary of the scientific seminar of the NSU FIT, within the framework of which pre-defenses of dissertations are held, Alexander Vlasov is confident that the first two defenses of candidate dissertations are the beginning of a long journey both within the faculty and NSU and the Akademgorodok as a whole.
Material prepared by: Elena Panfilo, NSU press service
Please note: This information is raw content obtained directly from the source of the information. It is an accurate report of what the source claims and does not necessarily reflect the position of MIL-OSI or its clients.
.
