The NSU team took first place in the prestigious international competition in computational linguistics SemEval-2026.

Translation. Region: Russian Federation –

Source: Novosibirsk State University –

An important disclaimer is at the bottom of this article.

A team of researchers from Novosibirsk State University won first place in the international scientific competition SemEval-2026 Task 8 "MTRAGEval: Evaluating Multi-Turn RAG Conversations." The team was led by NSU Associate Professor of Industrial Engineering and a research fellow at the Laboratory of Applied Digital Technologies. Faculty of Mechanics and Mathematics of NSUAssociate Professor Ivan Bondarenko. The results of the competition will be presented at the world's largest conference on computational linguistics, ACL, which will be held in the summer of 2026.

The competition was hosted by IBM and consisted of three tracks. The NSU team participated in Task B—a task of generating answers to user questions based on provided reference documents and the history of a multi-step dialogue. Of the 26 participating teams, the NSU team took first place, achieving a quality metric of 0.7827 (conditioned harmonic mean), significantly exceeding the organizers' best baseline result (0.6390) by 14.4 percentage points.

SemEval (Semantic Evaluation) is an annual international workshop on methods and algorithms for computational semantics, held for over 20 years. This event hosts competitions in various areas of computational linguistics. This year, SemEval presented 13 challenging research problems to participants. One of the most interesting and significant problems was Task 8, which assessed the performance of RAG (Retrieval-Augmented Generation) systems in multi-step dialogues. Such RAG systems address a key issue with modern large language models: their limited worldview and the difficulty of adapting them to specialized subject areas. The "knowledge" of a large language model is limited to the training set and does not include relevant or domain-specific information. RAG integrates language models with external knowledge bases, enabling them to find and use relevant information when generating responses.

"Our team proposed three key approaches that secured our victory in the competition. The first involved iteratively improving the system prompt using an LLM agent. We developed a multi-agent system in which a large Gemini neural network analyzes the model's performance and suggests improvements to the system prompt. This process is repeated iteratively until a plateau is reached. The second approach involved using in-context learning, in which the model learns to perform a task based on several examples of correct solutions to the problem provided in the input context. For each problem category, the researchers selected the most typical examples using the medoid method in a metric embedding space. These examples were added to the prompt to demonstrate the correct behavior of the model. This approach consistently demonstrated the best results," explained Ivan Bondarenko.

The researchers created several network algorithms using both approaches and evaluated their advantages before deciding to combine them. Among a variety of ensemble methods, they chose a method using a judge neural network that would select the best ensemble response in each case. The team combined seven disparate language models (Gemini-3-Pro-Preview, GLM-4.6, Llama-3.3-70B-Instruct, Qwen3-235B-A22B-Instruct, Claude 4.5 Haiku, Qwen2.5-32B-Instruct, and their own model, Meno-Lite-0.1) and used GPT-4o-mini to select the best response in each case. The diversity of models and approaches provided an additional boost in quality.

"The Meno-Lite-0.1 model, our team's own development based on Qwen2.5-7B-Instruct, deserves special attention. This compact model with 7 billion parameters was specifically retrained for use in RAG pipelines on a corpus of Russian- and English-language educational data. Despite its small size, Meno-Lite demonstrated performance comparable to significantly larger models and contributed to the ensemble's final result," explained Ivan Bondarenko.

The NSU team that participated in the competition included current and former NSU students and staff: Mikhail Kulakov, a master's student in the machine learning program implemented jointly with the School of Data Analysis and the Faculty of Mathematics and Mechanics of NSU; Ivan Chernov, a fourth-year student at the NSU Institute of Intelligent Robotics; Mikhail Komarov, a graduate of the NSU Institute of Intelligent Robotics and chief engineer of the RAGU open source project; Oleg Sedukhin, a graduate of the NSU Faculty of Information Technology; and Roman Derunets, a graduate of the NSU Institute of Intelligent Robotics and a participant in the Meno project.

A scientific paper describing their proposed solution has been submitted for peer review and will be presented at the ACL (Association for Computational Linguistics) conference, the world's largest scientific forum on computational linguistics. Ivan Bondarenko emphasized that the results obtained are already being used in the development of the university's internal project, Meno, an intelligent system based on RAG technologies. The methods developed by the team members can be used to improve the quality of dialog systems that work with external knowledge bases, including corporate and educational applications.

Material prepared by: Elena Panfilo, NSU press service

Please note: This information is raw content obtained directly from the source. It represents an accurate account of the source's assertions and does not necessarily reflect the position of MIL-OSI or its clients.