Thematic Forums

Multimodal Large Models: Basic Theory and Application

Host: Qin Chuan

Host: Qin Chuan
Host Affiliation: Computer Network Information Center, Chinese Academy of Sciences
Bio: Qin Chuan, Ph.D., is currently an associate researcher at the Computer Network Information Center, Chinese Academy of Sciences. He previously served as a senior researcher at research institutes of publicly listed internet companies such as Baidu. He received his bachelor's and doctoral degrees from the School of Computer Science, University of Science and Technology of China in 2015 and 2021, respectively. His current main research directions include knowledge computing, cognitive computing, scientific data mining, and large models. He has published over 50 papers in important international journals and conferences such as Proceedings of the IEEE (PIEEE), ACM TOIS, IEEE TKDE, KDD, SIGIR, ICDE, WWW, NeurIPS, AAAI, and IJCAI, and holds over 40 authorized and published patents domestically and internationally. He has received numerous honors including the CAS President's Excellence Award, KDD'2018 Best Student Paper Award, and the Hot Paper Award from Science China Information Sciences.

Speaker 1: Fu Chaoyou

Speaker: Fu Chaoyou
Speaker Affiliation: Nanjing University
Title: Research and Prospects of Multimodal Large Language Models
Abstract: In recent years, multimodal large language models have received widespread attention from scholars and industry worldwide due to their powerful generalization and reasoning capabilities. This report will briefly review the development history of multimodal large language models and elaborate on aspects such as data, evaluation, architecture, training, and applications of multimodal large language models, discussing existing problems and future development directions.
Bio: Fu Chaoyou is a researcher, assistant professor, and doctoral supervisor at the School of Artificial Intelligence, Nanjing University, and was selected for the China Association for Science and Technology Young Talent Support Program. He received his Ph.D. from the Institute of Automation, Chinese Academy of Sciences in 2022; from 2022-2024, he joined Tencent as a senior researcher through the "Tech Expert-T10" program; he joined Nanjing University in September 2024. His research direction is multimodal intelligence. He has published over 20 papers with over 4,000 Google Scholar citations. His GitHub open-source projects as owner have accumulated over 20,000 stars. Representative works include the VITA multimodal large model series (first author of VITA-1.0 & -1.5, corresponding author of Long-VITA, VITA-Audio, 3,000 GitHub Stars), MME multimodal evaluation benchmark series (first author of MME & Video-MME, over 1,000 citations), and Awesome-MLLM multimodal community (Owner, over 10,000 GitHub Stars). He has received honors including the CAS President's Special Award, IEEE Biometrics Council Best Doctoral Dissertation, Beijing Outstanding Doctoral Dissertation, CAS Outstanding Doctoral Dissertation, Xiaomi Young Scholar-Technology Innovation Award, Nanjing University Zijin Scholar, and CVPR-2023 Outstanding Reviewer.

Speaker 2: Hu Xuming

Speaker: Hu Xuming
Speaker Affiliation: The Hong Kong University of Science and Technology (Guangzhou)
Title: Research and Defense Methods for Hallucination Phenomena in Multimodal Large Models
Abstract: This report will deeply analyze the causes of hallucination phenomena generated by large language models and evaluate the impact of these phenomena on model reliability. The report will introduce uncertainty-aware model alignment (U2Align) and retrieval-augmented generation (RAG) methods, which aim to improve the accuracy and credibility of model outputs. Additionally, the report will explore watermark defense technologies against external attacks, including robust semantic watermarks and publicly verifiable watermarks, to enhance the model's defense capabilities when facing malicious attacks. Through the introduction and analysis of these defense methods, this report will provide new insights into the reliability and security of large language models.
Bio: Hu Xuming is an associate researcher, assistant professor, and doctoral supervisor at the Artificial Intelligence Thrust, The Hong Kong University of Science and Technology (Guangzhou). He received his Ph.D. from Tsinghua University. His main research directions are natural language processing, large models, and related field applications, committed to exploring trustworthy large models and integrating multimodal data into large models to achieve more comprehensive general artificial intelligence. He has led Guangdong Province Young Talent Projects, Guangzhou High-level Talent Projects, and a series of enterprise horizontal projects; participated in major projects and key projects of the National Natural Science Foundation of China, and key R&D plan projects of the Ministry of Science and Technology. Related research results have been applied to multiple application scenarios including intelligent Q&A and intelligent search at Alibaba and AWS Glue. In the past five years, Dr. Hu Xuming has published over 10 first-author articles in top international journals and conferences in the large model field such as ICLR, ACL, EMNLP, NAACL, TKDE, and SIGIR, with over 2,000 citations. He serves as area chair for top international conferences such as ACL, EMNLP, NAACL, and EACL, as well as executive editor for ACL Rolling Review, and has organized the 2023 International Big Data Competition and the 2022 Chongqing Artificial Intelligence Competition, attracting over 3,000 teams from around the world. Some of the honors Dr. Hu Xuming has received include KDD Cup global third place in all tracks, China Chinese Information Society Doctoral Dissertation Incentive Program, Beijing Outstanding Graduate, Tsinghua University Outstanding Graduate, and Tsinghua University Outstanding Doctoral Dissertation.

Speaker 3: Cao Shaosheng

Speaker: Cao Shaosheng
Speaker Affiliation: Xiaohongshu
Title: Technological Innovation and Practical Applications of Multimodal Large Models
Abstract: Large model technology is advancing rapidly. I will focus on solving practical problems in industry and provide detailed introductions to our team's recent technological innovations and practical implementation results. First, I will share technological innovations and product implementation experiences in emotional companionship, including the iPET active memory dialogue method driven by Agent world logs, and the PaRT framework for personalized AI search generation dialogue; next, I will showcase the technical details of Xiaohongshu's translation large model, including MT-R1-Zero which uses reinforcement learning thinking reasoning to quickly awaken large model translation capabilities, RedTrans social lifestyle translation large model, and MT3 image text translation model based on multimodal and multi-task reasoning; finally, I will introduce the practical tasks of social lifestyle domain large models SNS-Bench, multimodal reasoning large model Vision-R1, and domain large language model RedOne.
Bio: Cao Shaosheng, Senior Engineer, Head of NLP Algorithms at Xiaohongshu, responsible for large language model and multimodal large model post-training, and the implementation of large models in scenarios such as search, recommendation, advertising, translation, customer service, and emotional companionship. He has published over 30 papers, holds over 100 authorized patents, with over 4,000 citations, and received the ICDE 2023 Best Industry Paper Award, CIKM 2015-2020 Most Cited Paper, and AAAI 2016 Most Influential Paper. Additionally, he received the China Invention Association Innovation Achievement First Prize (ranked 1st), China Association for Artificial Intelligence Wu Wenjun Science and Technology Progress Second Prize (ranked 1st), and has been selected for the AI-2000 Rising Stars list top 100 for consecutive years, Elsevier China Highly Cited Scholars, and was interviewed and reported by CCTV-13 "Live News Room".

Speaker 4: Du Changde

Speaker: Du Changde
Speaker Affiliation: Institute of Automation, Chinese Academy of Sciences
Title: Cognitive Mechanism Analysis of Large Models' Object Concept Representation
Abstract: Large models have demonstrated superior performance in various tasks. However, it is still unclear whether large models' representation mechanisms for object concepts are similar to the human brain. Previous studies have addressed this issue by quantifying correlations between large model and human brain responses, but simple correlation analysis cannot reveal the similarities and differences in core dimensions of object concept cognition between the two. Here, we use cognitive psychology and brain imaging experimental paradigms to deeply analyze the human/brain-like characteristics of internal representations in various large models. We have also conducted brain neural signal encoding and decoding research based on different types of large models, achieving new breakthroughs in prediction accuracy, generation quality, and interpretability.
Bio: Du Changde is an associate researcher and master's supervisor at the Institute of Automation, Chinese Academy of Sciences, mainly engaged in interdisciplinary research on brain cognition and artificial intelligence. He has published over 50 high-level papers in neural encoding and decoding, multimodal neural computation, large model mechanism analysis, and brain-machine fusion intelligence, including publications in Nature Machine Intelligence, IEEE TPAMI, ICLR, ICML, etc. He has led/participated in multiple projects including National Natural Science Foundation and National Key R&D Programs. He has long served as a reviewer for journals such as Nat. Hum. Beh. and TPAMI. He has received honors including IEEE ICME 2019 Best Paper Award (Runner-up), 2021 AI Chinese Rising Stars Top 100, and his research results have been reported by MIT Technology Review. Personal homepage: https://changdedu.github.io/.

Large Model Inference and Reinforcement Learning

Host: He Junxian

Host: He Junxian
Host Affiliation: The Hong Kong University of Science and Technology
Bio: He Junxian is an assistant professor in the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology. He received his Ph.D. in Natural Language Processing from Carnegie Mellon University's School of Computer Science in 2022. His recent research focuses on large model reasoning. He serves as an area chair for ICLR, ACL, and EMNLP. His representative works include Unify-PEFT, C-Eval, CodeIO, SimpleRL, etc.

Speaker 1: He Junxian

Speaker: He Junxian
Speaker Affiliation: The Hong Kong University of Science and Technology
Title: Large Model Reasoning -- From Intermediate Training to Reinforcement Learning
Abstract: The complex reasoning capabilities of large models are not only a key component in their application to complex tasks but also an important indicator for measuring model intelligence levels. In this report, we will systematically introduce the main methods for improving large model reasoning capabilities and related research progress, and share our latest work in enhancing complex reasoning capabilities, including (1) CodeIO: a method that improves model general reasoning capabilities through synthetic data and intermediate training stages; (2) Laser: using reinforcement learning to effectively compress chain-of-thought length, thereby improving reasoning efficiency; (3) SynLogic: based on large-scale synthetic, verifiable logical reasoning data, further enhancing multiple reasoning capabilities of models in reinforcement learning.
Bio: He Junxian is an assistant professor in the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology. He received his Ph.D. in Natural Language Processing from Carnegie Mellon University's School of Computer Science in 2022. His recent research focuses on large model reasoning. He serves as an area chair for ICLR, ACL, and EMNLP. His representative works include Unify-PEFT, C-Eval, CodeIO, SimpleRL, etc.

Speaker 2: Ding Ning

Speaker: Ding Ning
Speaker Affiliation: Tsinghua University
Title: Reinforcement Learning-Driven Reasoning Models: Dense Rewards, Policy Entropy, and Self-Evolution
Abstract: The emergence of reasoning models reveals another exploration-centered scaling trend, with reinforcement learning as the core technology. Although reinforcement learning has a rigorous theoretical framework, the generalization dimension introduced by reasoning models still brings huge research space. This report will introduce the speaker's recent series of work on reinforcement learning-driven reasoning models, including the construction and application of dense supervision, test-time reinforcement learning, and some unpublished research work (Implicit PRM, PRIME, TTRL, etc.), while providing prospects for this field.
Bio: Ding Ning is an assistant professor in the Department of Electronic Engineering at Tsinghua University. His research focuses on artificial intelligence, particularly exploring the theory, algorithms, and systems of general intelligence and professional reasoning capabilities, and is committed to applying them to innovative scientific discovery. He has published multiple papers in artificial intelligence conferences and journals such as Nature Machine Intelligence, ICLR, NeurIPS, ICML, and ACL, with over 7,000 Google Scholar citations. His led open-source achievements have received over 25,000 stars on GitHub. He has been selected for the China Association for Science and Technology Young Talent Support Program, received the ACL Best System Demonstration Paper Award, World Artificial Intelligence Conference Young Outstanding Paper Award and Yunfan Award, China Computing Conference Best Academic Paper Award, Tsinghua University Outstanding Doctoral Dissertation, Baidu Scholarship, Stanford Global Top 2% Scientists, and other honors. He recently proposed reinforcement learning methods integrating dense rewards PRIME, test-time reinforcement learning TTRL, and other works.

Speaker 3: Liu Qian

Speaker: Liu Qian
Speaker Affiliation: A Company in Singapore
Title: SimpleTIR: Large Models Can Autonomously Think and Multi-step Reason with Code
Abstract: Training large language models for multi-step tool-integrated reasoning (TIR) under zero reinforcement learning (Zero RL) settings often faces challenges of training instability and dependence on cold-start data. In this talk, we introduce the SimpleTIR framework, a method for training end-to-end multi-step reasoning models. SimpleTIR introduces a simple and efficient data filtering mechanism that successfully stabilizes the zero reinforcement learning training process for multi-step reasoning. This framework encourages models to autonomously generate and execute code, seamlessly integrating execution results into subsequent reasoning chains. Experiments on mathematical reasoning tasks show that SimpleTIR achieves state-of-the-art zero reinforcement learning performance in both single-step and multi-step settings, with stable and significant improvements in key metrics such as code generation frequency, chain-of-thought length, and overall performance, providing a stable and efficient path for directly enhancing multi-step reasoning capabilities based on foundation models.
Bio: Liu Qian is currently a research scientist at a company in Singapore. Previously, he was a joint doctoral student at Beihang University and Microsoft Research Asia. His main research directions are natural language processing, primarily including code generation and natural language reasoning. He has published dozens of papers at top conferences such as ICLR, NeurIPS, and ICML. His first-author paper "Reasoning Like Program Executors" received the Microsoft MLADS 2022 AI Symposium Outstanding Contribution Award, and he participated in StarCoder 1/2, well-known code generation models in the open-source community. He received the 2020 Baidu Scholarship nomination, was selected for KAUST Rising Stars in AI 2024, and received the Beijing Outstanding Doctoral Dissertation Nomination Award in 2023. Additionally, he is one of the co-founders of the MLNLP community and served as the program committee chair for the first MLNLP conference.

Speaker 4: Feng Yiren

Speaker: Feng Yiren
Speaker Affiliation: The Hong Kong University of Science and Technology
Title: Large Model Reasoning: More Diverse, Knowledge-Rich, and Rigorous
Abstract: This report will explore how to enhance the reasoning capabilities of large language models in diverse, knowledge-intensive, and mathematically rigorous scenarios. First, we propose the Multirole-R1 framework, which enhances the diversity and accuracy of subjective questions through multi-role perspectives, combined with reinforcement learning to optimize reasoning diversity. Second, for multimodal retrieval-augmented generation, we propose an end-to-end optimization method based on global reward backpropagation, efficiently integrating heterogeneous knowledge and improving factuality. Finally, we introduce a Hybrid Reasoning framework that significantly improves mathematical problem-solving capabilities through hybrid reasoning combining natural language and formal language, breaking through the limitations of traditional natural language reasoning. Experiments show that these methods achieve leading levels in diverse reasoning, knowledge enhancement, and mathematical rigor, providing new insights for high-level reasoning in large models.
Bio: Feng Yiren, Yi R. (May) Fung, is currently an assistant professor in the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology, and is an emerging scholar in the fields of artificial intelligence, natural language processing, and computational social science. She received her Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2024, under the supervision of renowned scholar Professor Heng Ji. Professor Feng's research focuses on human-centered trustworthy artificial intelligence, particularly addressing key issues such as information integrity, foundation model knowledge boundary awareness, and multimodal multilingual social contextualized reasoning. With an H-index of 23, she has consecutively received outstanding paper awards at ACL 2024 and NAACL 2024, as well as the NAACL 2021 Best Demo Paper Award. She has led and participated in multiple major national projects in the United States and served as an area chair for top conferences such as ACL and NeurIPS. Her research is closely related to trustworthy artificial intelligence and privacy protection, achieving breakthrough progress in areas such as dialogue system privacy protection, large language model knowledge boundary detection, and refusal response training, which highly aligns with the proposed research directions of privacy protection and security verification in this project.

Speaker 5: Li Yafu

Speaker: Li Yafu
Speaker Affiliation: Shanghai AI Lab
Title: Evolution Path of Large Model Reasoning Capabilities: From Off-policy Reinforcement to Test-time Adaptive Optimization
Abstract: With the rapid development of large language models in mathematical reasoning and complex tasks, promoting the continuous evolution of their reasoning capabilities and better aligning with user preferences has become an important research direction. This report focuses on two core paths: first, LUFFY (Learning to Reason Under Off-policy Guidance), which achieves capability leaps in large models under off-policy reinforcement learning by introducing external strong trajectories, significantly improving model performance in mathematical and general reasoning tasks, and effectively breaking through the limitations of weak models; second, TPO (Test-time Preference Optimization), which proposes a test-time adaptive optimization method based on text feedback, allowing models to iteratively correct and flexibly align with user preferences during reasoning without parameter updates, demonstrating excellent performance in multiple evaluations. The report will systematically introduce the above progress and explore cutting-edge paths for large model reasoning capabilities from training-time capability breakthroughs to inference-time adaptive optimization.
Bio: Li Yafu is a researcher at Shanghai AI Lab, with main research directions including large language model reasoning, trustworthy artificial intelligence, and machine translation. He received his Ph.D. from the joint training program of Zhejiang University and Westlake University, and previously obtained a Master's degree in Artificial Intelligence from the University of Edinburgh and a Bachelor's degree in Electronic Information Engineering from Wuhan University. He has published multiple research results at top international conferences such as ACL, EMNLP, ICLR, and ICML, received ACL 2023 Best Paper Nomination, and serves as an area chair for ACL and EMNLP and as a reviewer for multiple international conferences and journals. During his doctoral studies, he received the National Scholarship, was selected for the Tencent Rhino-Bird Elite Talent Program, and received the Outstanding Scholarship.

General Swarm Intelligence

Host: Bai Lei

Host: Bai Lei
Host Affiliation: Shanghai AI Lab
Bio: Bai Lei is a young scientist at Shanghai AI Lab and director of the AI for Science Center. He received his Ph.D. from the University of New South Wales and subsequently served as a postdoctoral researcher at the University of Sydney. His main research directions include scientific multimodal large models and general scientific discovery systems. He has published over 100 academic papers in top-tier journals and conferences in the artificial intelligence field such as Nature sub-journals, IEEE TPAMI, NeurIPS, CVPR, and KDD, and has long served as a reviewer or program committee member for related journals and conferences. Based on his research work, he has been selected for national and Shanghai talent programs, received the 2024 IEEE TCSVT Best Paper Award, 2022 World Artificial Intelligence Conference Yunfan Award, 2020 University of New South Wales Engineering Research Excellence Award, 2019 Google Doctoral Fellowship, and other honors.

Host: Wang Siwei

Host: Wang Siwei
Host Affiliation: Intelligent Game and Decision Laboratory
Bio: Wang Siwei is an assistant researcher at the Intelligent Game and Decision Laboratory (State Key Laboratory). His main research directions include large-scale multimodal data analysis and large model multi-agent systems. He has published over 30 papers in top international conferences and journals such as NeurIPS, ICML, ICLR, CVPR, ICCV, IEEE TPAMI, TIP, and TKDE, with over 5,000 academic citations and 4 ESI highly cited papers. He serves as an area chair for CCF-A conferences such as NeurIPS, ICML, ICLR, CVPR, AAAI, IJCAI, and ACMMM, and as an editorial board member for the first-tier journal Pattern Recognition. He has led and participated in multiple projects from the Science and Technology Commission, Ministry of Science and Technology, and National Natural Science Foundation.

Speaker 1: Hao Jianye

Speaker: Hao Jianye
Speaker Affiliation: Tianjin University, Huawei Noah
Title: Embodied Intelligence
Abstract: This report will first introduce the technical background and foundations of large models, then introduce embodied intelligence technology in the era of large models, sharing key challenges and latest industry progress that embodied intelligence faces in achieving scaling law from three aspects: data, models, and optimization and reasoning.
Bio: Hao Jianye, Ph.D., is a professor at the School of Intelligent Computing, Tianjin University, and director of the Huawei Noah Decision Reasoning Laboratory. His main research directions are reinforcement learning, embodied intelligence, and multi-agent systems. He has published over 100 papers in CCF-A international conferences and journals in the artificial intelligence field, and authored 3 monographs. He has received funding from over 10 projects including the National Natural Science Foundation Outstanding Youth Fund, Ministry of Science and Technology 2030 AI Major Project, and Foundation Committee AI Major Cultivation Project. His research results have won 3 international conference best paper awards and 4 NeurIPS conference competition championships. Related achievements have been widely applied in domestic industrial basic software intelligence, autonomous driving, game AI, internet advertising and recommendation, 5G network optimization, industrial logistics scheduling, and other fields.

Speaker 2: Chen Weineng

Speaker: Chen Weineng
Speaker Affiliation: South China University of Technology
Title: Group Evolution Methods and Applications for Consensus Optimization
Abstract: Swarm intelligence is an important direction for the development of new-generation artificial intelligence. Academician Li Wei pointed out: "Internet-based swarm intelligence is one of the core research areas of new-generation artificial intelligence." Consensus evolution and optimization in distributed network swarm systems is a fundamental and core problem faced by multi-agent and swarm intelligence systems. Traditional multi-agent distributed optimization methods based on gradient descent face bottlenecks when dealing with non-convex, black-box problems. This report will explore combining the natural parallel distributed characteristics of swarm intelligence optimization with network distributed multi-agent consensus theory, proposing multi-agent distributed evolutionary optimization theory and methodology, constructing a "guidable-scalable-trustworthy" distributed multi-agent swarm intelligence optimization methodology, and exploring consensus evolution mechanisms at three levels—consensus evolution based on dynamics and incentive mechanisms, consensus evolution based on learning, and large model-driven consensus evolution—and introduce related applications.
Bio: Chen Weineng is a professor, doctoral supervisor, and associate dean at the School of Computer Science and Engineering, South China University of Technology. His main research direction is swarm intelligence, evolutionary computation, and their applications. He has published over 200 papers in international journals and international conferences, including over 90 IEEE Transactions long papers; he has led the National Science and Technology Innovation 2030—"New Generation Artificial Intelligence" major project, National Natural Science Foundation Enterprise Innovation Joint Fund key support project, National Key R&D Program international cooperation and exchange project, National Natural Science Foundation-Royal Society Newton Fund project, etc., and serves as the head of the Guangdong-Hong Kong Joint Innovation Platform for Big Data and Computational Intelligence. He received the National Outstanding Youth Science Fund in 2016 and the Guangdong Outstanding Youth Science Fund in 2015; he received the Huo Yingdong Young Teacher Award in 2018. He currently serves as vice chairman of the IEEE Guangzhou Section, chairman of the IEEE SMC Guangzhou Branch, standing committee member of the China Computer Federation Collaborative Computing Professional Committee, and committee member of the Artificial Intelligence and Pattern Recognition Professional Committee. He serves as associate editor for international journals IEEE TEVC, IEEE TNNLS, and Complex & Intelligent Systems.

Speaker 3: Yin Zhenfei

Speaker: Yin Zhenfei
Speaker Affiliation: Shanghai AI Lab
Title: Building AI Society with Agents - Find the Scaling Law of Agents
Abstract: AI agents based on LLMs or VLMs have already demonstrated their exceptional ability to solve complex problems, and increasingly, these models are being extended to a wide range of downstream applications, such as workflow automation on operating systems, scientific research and discovery, and embodied AI. The integration of foundation models like VLM, VLA, and generative models, combined with external scaffolds like memory mechanisms, system prompts, external knowledge bases, and toolkits, has enabled the emergence of systematic agents capable of tackling complex, long-sequence tasks. However, human society is a complex system formed by diverse organizations, where multiple individuals collaborate and compete within a set of environmental rules to achieve unified goals or indirectly influence the environment's state. Thus, we also envision that multi-agent systems, built upon the aforementioned foundation models, will exhibit the potential to scale from individual agents to organizational entities. This talk will review the history of AI agents, briefly discuss the architectures of foundation model-based single agents in various fields, and focus on swarm intelligence for multi-agent task completion. Finally, we will explore how, as these agents are deployed, they form collective intelligence, creating a coexistence between humans and AI agents within society.
Bio: Yin Zhenfei is a Xingqi researcher at Shanghai AI Lab and a visiting scholar at Oxford University. Her research directions include multimodal foundation models, multi-agent systems, and embodied intelligence. She has initiated and led multiple representative open-source projects, covering key components from underlying models (Intern), system platforms (CAMEL, MASWorks), social simulation (OASIS) to embodied collaboration (MARS), systematically constructing large model intelligent agent infrastructure with collaboration, adaptation, and generalization capabilities. She has published over 20 papers at top conferences such as NeurIPS, ICLR, ICML, and ICCV, and has long served as a reviewer for top conferences and journals such as ICLR, NeurIPS, ICML, ARR, and TPAMI.

Speaker 4: Chen Siheng

Speaker: Chen Siheng
Speaker Affiliation: Shanghai Jiao Tong University
Title: Collective Intelligence Across Digital and Physical Spaces
Abstract: This talk investigates how collective intelligence—emerging from coordinated multi-agent systems—can enhance capabilities in both digital and physical domains. In digital environments, we explore how large language model (LLM)-based agents, when organized into collaborative multi-agent systems, demonstrate emergent abilities in general-purpose task-solving, code generation, and scientific reasoning. By engaging in task decomposition, debate, and consensus, these agents can solve complex problems that exceed the capabilities of any single model. In parallel, physical environments offer compelling evidence for the power of embodied collective intelligence. We highlight vehicle-road cooperative autonomous driving as a representative case, where multi-agent coordination among vehicles, infrastructure, and cloud systems significantly enhances real-time perception, planning, and decision-making. These systems showcase how collaboration in dynamic, uncertain environments can improve safety, efficiency, and adaptability.
Bio: Chen Siheng is an associate professor and doctoral supervisor at the School of Artificial Intelligence, Shanghai Jiao Tong University. He received his Ph.D. from Carnegie Mellon University (CMU) and was selected for the national-level young talent program. He previously worked at the UBER ATG autonomous driving department in the United States. He has undertaken research projects including the Foundation Committee's Original Exploration Project, General Project, Ministry of Science and Technology AI 2030 Major Project, and Shanghai Science and Technology Commission AI Special Project. His research focuses on multi-agent systems, and he has published over 100 papers in journals and conferences such as Nature Communications, Nature Computational Science, T-PAMI, NeurIPS, ICML, ICLR, and CVPR, with over 10,000 Google citations. He has received honors including the IEEE Signal Processing Society Best Young Author Paper Award, ASME Structural Health Monitoring Association Best Paper Runner-Up Award, 2018 GlobalSIP Conference Best Paper Award, and Mitsubishi Electric Laboratories President's Award.

Speaker 5: Qian Chen

Speaker: Qian Chen
Speaker Affiliation: Shanghai Jiao Tong University
Title: Routing Efficiency Mechanisms for Large Model Group Collaboration
Abstract: In the era of continuous evolution of large models and intelligent agents, group collaboration has become a key pathway to unleashing computational power and intelligence potential. Multi-agent collaboration not only breaks through the capability boundaries of individual intelligence but also endows systems with unprecedented scalability and broad prospects. However, efficient collaboration is not a "free lunch"—as collaboration scale and complexity rapidly grow, problems such as information exchange redundancy, inefficient collaborative routing, and difficulty in experience reuse become increasingly prominent, becoming major efficiency bottlenecks for improving overall performance. This report will systematically review three types of core costs in intelligent agent collaboration, and propose three optimization directions—efficient interaction, efficient routing, and efficient reasoning—targeting these efficiency bottlenecks, helping to create a more cost-effective and resilient new paradigm for intelligent agent group collaboration.
Bio: Qian Chen is a doctoral supervisor with research directions including large language models, autonomous intelligent agents, and multi-agent systems. He previously conducted research at Tsinghua University's postdoctoral station and Tencent's AI Platform Department, and was selected for Tsinghua University's "Shuimu Scholar" and Tencent's "Tech Expert" programs. He has led the development of multi-agent collaboration framework ChatDev, agent mutual learning technology Co-Learning, large-scale group collaboration and emergence mechanism research MacNet, avatar collaboration iAgents, and other related achievements.

Speaker 6: Yang Cheng

Speaker: Yang Cheng
Speaker Affiliation: Beijing University of Posts and Telecommunications
Title: Research on Efficient Communication Protocols for Large Model Multi-Agent Systems
Abstract: Large language models (LLMs) currently demonstrate many human-like intelligences such as reasoning, planning, and tool use, and can serve as the brain of intelligent agents to automatically handle various complex tasks. However, whether these large language model agents can learn to communicate and collaborate effectively like humans, and collaborate on tasks faster and better, remains an urgent question to explore. This report will start from the design of efficient communication protocols between agents and introduce the latest progress in large language model agent collaboration research.
Bio: Yang Cheng is an associate professor and doctoral supervisor at Beijing University of Posts and Telecommunications. He has long been engaged in research related to data mining and natural language processing, published over 40 CCF-A papers in related fields, with over 15,000 Google Scholar citations. Related achievements have received provincial and ministerial awards including the 2020 Ministry of Education Natural Science First Prize (ranked fourth). He has received the China Chinese Information Society Outstanding Doctoral Dissertation Award, China Association for Artificial Intelligence Wu Wenjun Young Scientist Award, was selected for the China Association for Science and Technology "Young Talent Support Program," and has been selected for Stanford University's Global Top 2% Scientists list for three consecutive years.

Youth Talent Forum

Chair: Zhao Dawei

Chair: Zhao Dawei
Chair Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Bio: Zhao Dawei is a researcher, doctoral supervisor, national-level young talent, Taishan Scholar Young Expert, and leader of Shandong Province Science and Technology Innovation Team. He currently serves as Associate Director of the School of Computing, Qilu University of Technology (Shandong Academy of Sciences), Associate Director of Shandong Computer Center (National Supercomputing Center in Jinan), and Associate Director of the Key Laboratory of Computing Internet and Information Security, Ministry of Education. He has published over 100 papers in well-known domestic and international journals and conferences; obtained over 20 authorized invention patents; led over 20 projects including National Key R&D Program, National Natural Science Foundation (General and Youth projects), Science and Technology Innovation 2030 "New Generation Artificial Intelligence" major project (task), Science-Education-Industry Integration Innovation major project, and Shandong Province Natural Science Foundation; as the first completer, he has received 1 Shandong Province Science and Technology Progress Second Prize and 1 Shandong Province Natural Science Academic Innovation Award, and participated in receiving 2 Shandong Province Science and Technology Progress Second Prizes. Main research directions: conducting research on vulnerability mining, intrusion detection, security situation assessment, attack response, and other offensive and defensive technologies for complex network systems such as computing networks, industrial control systems, and social networks, as well as network structure analysis, network robustness, and network propagation dynamics research.

Host: Liang Yan

Host: Liang Yan
Host Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Bio: Liang Yan, female, Master's degree, Director of Human Resources Department at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences). She has long been dedicated to personnel, talent, and organizational cadre work, and is familiar with national, provincial, and municipal personnel and talent policies.

Speaker 1: Liu Rui

Speaker: Liu Rui
Speaker Affiliation: Inner Mongolia University
Title: Emotionally and Intellectually Capable Human-Machine Speech Dialogue
Abstract: Dialogue speech generation is one of the key tasks in human-machine speech dialogue, with broad applications in human-computer interaction, metaverse, and other fields, attracting joint attention from academia and industry in recent years. Empathy is the ultimate goal of artificial intelligence development, and how to construct dialogue speech generation models that are both emotionally and intellectually capable is a key problem that urgently needs to be solved. This report will introduce the team's research on dialogue speech generation from different perspectives including heterogeneous graph context modeling, generative dialogue generation framework, and chain-of-thought inspired chain understanding and generation framework. While improving emotional understanding and expression capabilities, it enhances the interpretability of emotional understanding and expression in human-machine speech dialogue scenarios.
Bio: Liu Rui is a professor and doctoral supervisor at the School of Computer Science (School of Software) and School of Artificial Intelligence, Inner Mongolia University. She was selected for the China Association for Science and Technology Young Talent Support Program, Inner Mongolia Outstanding Youth, the 7th China Young Science and Technology Workers Association, and is a senior member of the China Computer Federation (CCF); she serves as an expert reviewer for the National Natural Science Foundation and China Scholarship Council. She has led over 10 national/provincial and ministerial projects including National Natural Science Foundation General Project, National Natural Science Foundation Youth Project, Inner Mongolia Autonomous Region Outstanding Youth Foundation Project, Inner Mongolia Autonomous Region Key R&D and Achievement Transformation Plan Project, and Inner Mongolia Autonomous Region Grassland Talent Program. Her main research direction is multilingual human-machine speech interaction, with related achievements published as first or corresponding author in CCF-A/CAAI-A academic conferences or Chinese Academy of Sciences Zone 1 Top journals such as IEEE-TASLP, IEEE-TAFFC, Neural Networks, Information Fusion, ACL, ACMMM, AAAI, ICASSP, and INTERSPEECH. She serves as an editorial board member for international journals IEEE TAFFC/ACM TALLIP/INNFUS.

Speaker 2: Fu Kexue

Speaker: Fu Kexue
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: Exploration of Computational Pathology and Image Segmentation Tasks Driven by Language Models
Abstract: With the continuous development of large language model technology, the application scope of language models has exceeded text processing and has also shown good prospects in visual information processing and other fields. This report focuses on the core idea of "language model-driven" and explores its cutting-edge progress and application value in two major computational vision tasks: computational pathology and image segmentation. It analyzes how to quickly adapt existing language models or vision-language models to traditional computer vision tasks, enhance the generalization ability and accuracy of visual models for different scenarios, and finally summarizes research ideas and potential challenges for the in-depth application of language models in visual information processing.
Bio: Fu Kexue, Ph.D., is a Distinguished Researcher at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Shandong Province Taishan Scholar Young Expert, Executive Committee Member of the China Computer Federation Digital Medicine Branch, and Director of the China Association of Chinese Medicine Informatics Integrated Traditional Chinese and Western Medicine Surgery Intelligent Diagnosis and Treatment Branch. He has long been engaged in research in computer vision, medical image processing, and embodied intelligence. As first author, he has published over 30 high-level papers in top international journals and conferences such as IEEE TPAMI, CVPR, ICCV, NeurIPS, and AAAI, with 1 ESI highly cited paper; he has proposed a series of high-precision point set registration methods based on deep graph matching, 2D-3D registration, multimodal registration, unconstrained registration, and multiple vision-language models, which have been widely cited by domestic and international peers; he has served as a reviewer for top international conferences such as CVPR, ICCV, ECCV, AAAI, MICCAI, ACM-MM, journals such as TPAMI/TVCG, and as forum chair for the Second Shandong Computer Vision Conference.

Speaker 3: Gao Yongbiao

Speaker: Gao Yongbiao
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: Enhanced Label Distribution Learning Research
Abstract: Label Distribution Learning (LDL) is a key machine learning paradigm for solving ambiguous learning tasks. However, traditional LDL methods perform poorly when dealing with sequential ambiguous tasks and have limited applications when facing imbalanced label distributions. This report approaches from the perspective of reinforcement learning sequential decision-making, explaining how to integrate the two, using reinforcement learning to solve LDL dynamic decision problems, using LDL to resolve ambiguity in reinforcement learning tasks, and introducing methods that use dynamic decoupling and momentum allocation mechanisms to solve imbalanced label distribution learning.
Bio: Gao Yongbiao, Ph.D., is a Distinguished Associate Professor and master's supervisor at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences). His main research directions include machine learning, artificial intelligence, computer vision, and multimodal/language large models. Related research results have been published in top international journals and conferences such as TNNLS, TMM, IJCAI, and ICASSP. He has been invited to serve as AC, Meta Reviewer, or reviewer for international conferences and journals such as ICML, NeurIPS, ICLR, CVPR, IJCAI, ICME, UAI, TNNLS, TKDE, and TAI. He has led the development of "Nuclear Shadow Intelligence Analysis," "Sky Observation Intelligence Solution," and "Wisdom Health Ark" vertical application domain multimodal/language large models. He has led multiple research projects including National Natural Science Foundation and Shandong Province Natural Science Foundation.

Speaker 4: Li Jiachen

Speaker: Li Jiachen
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: 3D Spatial Perception for Augmented Reality
Abstract: In the digital wave, augmented reality technology is profoundly transforming many fields. This report focuses on 3D spatial perception for augmented reality, emphasizing the integration and application of 3D object tracking, 3D reconstruction, and augmented reality. Through high-precision 3D object tracking technology, we can accurately capture dynamic changes in objects; based on 3D reconstruction algorithms, we integrate multi-source spatial data to construct accurate models of real scenes. On this basis, augmented reality applications can achieve precise information overlay and interaction, bringing immersive experiences to industries, education, culture, and other fields.
Bio: Li Jiachen, Ph.D., postdoctoral researcher at Zhejiang University, Distinguished Associate Researcher and master's supervisor at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Shandong Province Taishan Scholar Young Expert, and research backbone of the Information Strategy and Standards Research Team. He has led 1 National Natural Science Foundation Youth Science Fund project, 1 Shandong Province Key R&D Plan (Major Science and Technology Innovation Project) project, and 1 Shandong Province Natural Science Foundation Youth Project, and has published over 10 SCI/EI papers.

Speaker 5: Liu Chensheng

Speaker: Liu Chensheng
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: Power Stealth Attack Detection and Localization Based on Spatiotemporal Networks
Abstract: Due to the complex coupling relationships between power information and physical systems, timely and accurate detection and localization of network attacks is of great significance for ensuring stable system operation. This project considers the problem of stealth attack detection and localization under new energy random disturbances, starting from mining the temporal-spatial correlation characteristics of power measurement data, designing an attack detection and identification framework based on spatiotemporal networks, achieving detection and identification of multiple types of attacks under unknown random disturbances and dynamic grid topology, significantly improving the system's anti-attack capability.
Bio: Liu Chensheng, Ph.D. from Shanghai Jiao Tong University, conducted postdoctoral research at the University of Alberta, Canada, and East China University of Science and Technology from 2018-2021, served as Distinguished Researcher at East China University of Science and Technology from 2021-2024, and is currently a researcher and master's supervisor at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences). His research directions include artificial intelligence security, cyber-physical system security, and smart grid optimization and control. He has led National Natural Science Foundation major project sub-projects and National Natural Science Foundation general projects; he has published over 20 research papers in top journals in the field such as IEEE Trans. Smart Grid and IEEE Trans. Power Systems. He was selected as Shandong Province Taishan Scholar Young Expert in 2024, selected for the Postdoctoral Innovation Talent Support Program and Shanghai Super Postdoctoral Incentive Program in 2019, and received the China Automation Society Natural Science Second Prize in 2023.

Speaker 6: Song Weizhao

Speaker: Song Weizhao
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: Data-Driven and Event-Triggered Cooperative Control of Heterogeneous Multi-Agent Systems
Abstract: Cooperative control of Multi-Agent Systems (MAS) has broad applications in aircraft formation, robot cooperation, and sensor fusion. This report addresses problems in MAS cooperative control applications such as high system dynamic complexity, high communication load and energy constraints, and significant communication topology vulnerability. It reviews our series of research work in fully distributed event-triggered control and model-free adaptive control of heterogeneous MAS, as well as recent work on distributed cooperative control against DoS attacks. The research core runs through triggered communication, system robustness, and fully distributed implementation of MAS.
Bio: Song Weizhao, Ph.D., is a Distinguished Associate Researcher at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences). His research directions include swarm systems, data-driven control, event-triggered control, and system security and their applications. He has published 15 high-level academic papers in academic journals and conferences such as IEEE Trans. Cybern., IEEE Trans. Neural Netw. Learn. Syst., IEEE Trans. Syst., Man, Cybern., Syst., and Inf. Sci.

Speaker 7: Tong Fenghua

Speaker: Tong Fenghua
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: Transformer-Based Image Compressed Sensing Model
Abstract: Convolutional neural networks dominate the field of image processing but suffer from local inductive bias problems, while Transformers with self-attention mechanisms can capture global context, thus solving this problem. However, how to inherit and integrate their advantages to enhance image compression reconstruction quality is currently a hot topic in the field of deep compressed sensing. In this report, we introduce a hybrid architecture based on Transformer and dynamic convolution that significantly improves image compressed sensing reconstruction quality by integrating the representation capabilities of local and global features.
Bio: Tong Fenghua, Ph.D., is a Distinguished Associate Researcher and master's supervisor at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), and leader of the Shandong Province Higher Education Young Innovation Team. She has long been dedicated to research in compressed sensing theory and applications, publishing over 20 high-level academic papers in well-known domestic and international journals in signal and information processing and artificial intelligence fields such as IEEE Trans Inf Theory, IJCAI, and EAAI, including 9 CCF-A or Chinese Academy of Sciences Zone 1 journals; she holds over 10 authorized invention patents; she has led 1 National Natural Science Foundation Youth Fund project, 1 Shandong Province Natural Science Foundation Youth project, and 1 National Key Laboratory Open Project.

Speaker 8: Wang Changwei

Speaker: Wang Changwei
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: Exploration of Multimodal Large Models Balancing Efficiency and Reliability
Abstract: This report focuses on innovative exploration of multimodal large models in terms of efficiency and reliability. For inference speed bottlenecks, the AASD framework is proposed to achieve 2x acceleration without accuracy loss through KV cache compression and target-draft attention mechanisms; for visual hallucination problems, the DuCAR method is designed through dual-modal collaborative attention reinforcement, combined with visual CLS-driven sampling and cross-modal dynamic sampling strategies, effectively suppressing irrelevant information interference and improving interaction efficiency. Experiments show that AASD significantly improves inference efficiency in mainstream MLLMs without compromising model accuracy and reliability, while DuCAR achieves SOTA hallucination mitigation effects on POPE/CHAIR benchmarks, and also improves operational efficiency by removing interfering tokens. Both works break through the efficiency and reliability limitations of multimodal models from the perspectives of inference acceleration and hallucination removal respectively, providing new ideas for constructing efficient and reliable multimodal systems.
Bio: Wang Changwei, Ph.D., is a Distinguished Associate Researcher and master's supervisor at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Shandong Province Taishan Scholar Young Expert, and recipient of the Chinese Academy of Sciences President's Special Award. His main research directions include multimodal learning, embodied intelligence, and model lightweighting. He has published over 40 high-level academic papers in CCF-A/Chinese Academy of Sciences Zone 1 journals (IEEE TPAMI, IEEE TIP, IEEE TNNLS, IEEE TMM) and conferences (ICCV, CVPR, ICML, NeurIPS, AAAI, DAC), including 20 CCF-A/Chinese Academy of Sciences Zone 1 papers as first/corresponding (including joint) author, 3 ESI highly cited papers, 1 CCF-B international conference best paper finalist, and 1 IEEE Transactions cover article.

Speaker 9: Wang Xin

Speaker: Wang Xin
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization
Abstract: Cross-client data heterogeneity in federated learning leads to knowledge fusion bias, causing consensus distortion and category representation ambiguity, constraining the collaborative improvement of model generalization and personalization capabilities. Existing methods rely on static alignment strategies and struggle to adaptively harmonize global consensus with local characteristics. This report addresses two types of data heterogeneity—category heterogeneity and domain shift—and proposes two federated learning frameworks respectively:
1. FedMate achieves dynamic fusion through bilateral optimization mechanisms: on the server side, it integrates sample scale, parameter state, and prediction uncertainty to construct dynamic global prototypes, fine-tuning classifiers to maintain global consistency; on the client side, it designs complementary classification fusion modules for advantageous discriminative training, combined with cost-aware feature transmission to balance performance and communication overhead.
2. FedSaaS focuses on semantic segmentation tasks and proposes a class-consistency representation framework: class prototype alignment models unified representations based on local/global class samples, using server-side prototype supervision for client global branches; adversarial harmonization mechanisms dynamically coordinate global and local branch contributions, supplemented by multi-level contrastive loss to strengthen semantic space representation consistency.
Both innovative frameworks jointly demonstrate that dynamic representation alignment mechanisms are the key path to breaking through federated learning heterogeneity dilemmas, improving model robustness while maintaining data privacy, providing reliable technical support for secure deployment of distributed intelligent systems and cross-domain data collaborative applications.
Bio: Wang Xin, Ph.D. from Zhejiang University, is a Distinguished Researcher and master's supervisor at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Shandong Province Taishan Scholar Young Expert, and leader of the Shandong Province Higher Education Young Innovation Team. His main research directions include distributed artificial intelligence, AI security and privacy protection, and large-small model collaboration. In recent years, he has led 12 projects including National Natural Science Foundation Youth Fund, National Key R&D Program sub-projects, and Shandong Province Natural Science Foundation General and Youth projects; he has published over 50 high-level academic papers in well-known journals/conferences such as IEEE TIFS, TMC, TSP, IJCAI, and AAAI, including over 20 papers as first author/corresponding author; he has received multiple honors including Shandong Province Outstanding Youth Science Fund, Shandong Province Taishan Scholar Young Expert, and China Computer Federation Outstanding Doctoral Dissertation Award.

Speaker 10: Zhang Shuhui

Speaker: Zhang Shuhui
Speaker Affiliation: Qilu University of Technology (Shandong Academy of Sciences)
Title: Multimodal Large Model-Driven Intelligent Perception and Interaction
Abstract: With the rapid development of artificial intelligence technology, multimodal large models have become a hot research topic in the field of artificial intelligence. This report will introduce the latest research progress of our team in multimodal large model-driven intelligent perception and interaction, including multimodal data fusion, cross-modal understanding and generation, and intelligent interaction system design. We will explore how to use multimodal large models to achieve more natural and intelligent human-computer interaction, and discuss the challenges and future development directions in this field.
Bio: Zhang Shuhui, Ph.D., is a Distinguished Associate Researcher and master's supervisor at the School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), and leader of the Shandong Province Higher Education Young Innovation Team. Her main research directions include multimodal learning, computer vision, and artificial intelligence. She has published over 30 high-level academic papers in well-known domestic and international journals and conferences such as IEEE TPAMI, IEEE TIP, CVPR, ICCV, and AAAI, including over 15 papers as first author/corresponding author; she has led multiple research projects including National Natural Science Foundation Youth Fund and Shandong Province Natural Science Foundation projects; she has received honors including Shandong Province Taishan Scholar Young Expert and Shandong Province Outstanding Youth Science Fund.