Translation. Region: Russian Federation –
Source: People's Republic of China in Russian – People's Republic of China in Russian –
An important disclaimer is at the bottom of this article.
Source: People's Republic of China – State Council News
Beijing, December 29 (Xinhua) — Imagine a chemistry exam where your opponents are not your fellow students, but the world's most advanced artificial intelligence systems – GPT, Gemini, DeepSeek. This is precisely the unusual situation that confronted 174 chemistry students at Peking University, considered one of the top two chemistry universities in China, and the results of this virtual competition challenged our understanding of AI capabilities.
The Department of Chemistry and Molecular Engineering at Peking University recently, together with the Computing Center, the Department of Computer Science, and Yuanpei College of the same university, presented the latest development, SUPERChem. It was this tool that served as the basis for the "exam version" used in the aforementioned virtual confrontation between AI and humans.
SPECIAL MID-TERM EXAM
When you open the SUPERChem question bank, you immediately feel the pressure build.
Intricate analysis of crystal structures, detailed deductions of reaction mechanisms, quantitative calculations of physicochemical properties… These 500 questions are not taken from a publicly available online question bank, but are deeply adapted from complex exam problems and materials from leading professional publications.
Why was it necessary to spend so much effort on creating completely new questions?
"Because large models are too good at 'cramming' text," explained the research team. Most tests available online have already been 'read' by AIs trained on massive data during training. And chemistry is precisely the kind of discipline where memorization alone isn't enough. It combines rigorous logical deduction with a rich spatial understanding of the microworld. "We are very interested in whether one-dimensional next-token prediction in large language models can solve complex inference problems in two- or even three-dimensional space," the researchers added.
Creating a set of problems that AI hasn't seen before and that require genuine reasoning abilities is extremely difficult. However, this is precisely the unique advantage of Peking University's Department of Chemistry. Around a hundred faculty and students, including many Olympiad winners, have teamed up to develop a high-barrier exam for AI that emphasizes reasoning and is immune to cheating. They want to test whether AI truly "understands" chemistry.
RESULTS OF THE FIGHT
This carefully designed test demonstrated sophisticated scientific intuition. As a baseline, undergraduate students from Peking University's Department of Chemistry who participated in the test demonstrated a 40.3% correct answer rate in the context of test evaluation. This figure alone speaks to the high level of difficulty of the tasks.
So how did the AI perform? Even the leading models tested showed results only comparable to the average level of junior students.
The team was surprised by the confusion caused by the visual information. The language of chemistry is visual: key data is contained in diagrams of molecular structures and reaction mechanisms. However, for some models, the accuracy of the added images decreased rather than improved. This demonstrates that modern AI still faces a clear cognitive barrier in converting visual information into chemical semantics.
Even if the correct answer is chosen, the reasoning behind the solution may not stand up to scrutiny. Therefore, the team developed detailed scoring rules for each task. Under SUPERChem's "microscope," it becomes clear at a glance whether the AI truly understands the material or is just pretending to.
Researchers have found that AI reasoning chains often break down on complex tasks such as predicting product structure, determining reaction mechanisms, or analyzing structure-property relationships. Modern advanced models, despite possessing vast amounts of knowledge, still demonstrate insufficient capabilities when solving complex chemical problems that require rigorous logic and deep understanding.
A SMALL STEP TOWARDS GENERAL AI
The creation of SUPERChem filled a gap in the existing tools for assessing the capabilities of multimodal deep analysis in chemical science. The published results are intended not to demonstrate the imperfections of AI, but to stimulate its further development. SUPERChem is like a road sign reminding us that there is a long way to go before a universal chatbot becomes a competent scientific assistant capable of establishing relationships between the structure and properties of substances and inferring the mechanisms of chemical reactions. The developers emphasize the transition from simply "reproducing information" to "understanding the laws of the physical world."
The SUPERChem project is currently entirely open source. The team hopes that this "exam," held at Peking University, will become a shared asset for the global scientific and AI communities and serve as a catalyst for the next wave of technological breakthroughs. Perhaps, in the near future, reusing this set of problems will allow AI to demonstrate perfect performance. This would be a surprise for both chemistry and AI.
Please note: This information is raw content obtained directly from the source. It represents an accurate account of the source's assertions and does not necessarily reflect the position of MIL-OSI or its clients.
