Code Large Models Forum

Hosts: Wanxiang Che (Harbin Institute of Technology), Ge Li (Peking University)

Speaker 1: Hui Liu

Speaker: Hui Liu
Title: Code Refactoring and Optimization Based on Large Models
Abstract: This talk explores the potential and challenges of large model technology in code optimization, comparing the difficulties and differences between code generation and code optimization based on large models. It analyzes the prospects of large model technology in the field of code optimization. Using software refactoring as an example, it investigates automatic code optimization based on large models, discussing the key technical challenges and potential strategies to address them.
Personal Profile: Hui Liu is a professor at Beijing Institute of Technology and the Secretary-General of the CCF Software Engineering Committee. He has long been engaged in research on software development environments, with over 30 academic papers accepted and published in ICSE, ESEC/FSE, ASE, ISSTA, IEEE TSE, ACM TOSEM, among others. Some of his work has been adopted and integrated into mainstream IDEs like Eclipse. He has received the ESEC/FSE 2023 Distinguished Paper Award, ICSE 2022 Distinguished Paper Award, RE 2021 Best Paper Award, and the IET Premium Award (2016).

Speaker 2: Lin Shi

Speaker: Lin Shi
Title: Large Model Code Generation Based on Interactive Requirement Clarification
Abstract: With the significant advancement of AI large models, software development is gradually entering a new era of intelligence. However, it is not easy for developers to write a clear and comprehensive Prompt. Unclear requirement expressions in the Prompt make it difficult for large models to identify the true intentions behind developers, which is one of the major obstacles encountered by large model code generation in practice. This presentation will introduce our latest research in optimizing code generation capabilities, exploring methods based on interactive requirement clarification to help large models better understand user intentions, thereby improving the effectiveness of large model code generation.
Personal Profile: Lin Shi is a professor at Beihang University and a senior member of CCF. His research interests include intelligent software engineering, including intelligent code, intelligent requirements engineering, open source software, trustworthy AI, etc. He has published over 50 papers in high-level international conferences such as IJCAI, ICSE, FSE, ASE in the fields of artificial intelligence and software engineering, and has received three Outstanding Paper Awards: ACM SIGSOFT Distinguished Paper Award (ASE21), two consecutive Outstanding Paper Awards at the International Requirements Engineering Conference (RE21, RE20). He has led and participated in multiple national projects and key cooperation projects with leading enterprises. He also serves as a reviewer for several prestigious international conferences and journals including ICSE, ASE, FSE, and TOSEM.

Speaker 3: Shuai Lu

Speaker: Shuai Lu
Title: Trustworthy Code Generation
Abstract: In recent years, large language models have demonstrated remarkable code generation capabilities. However, large models cannot guarantee the accuracy of generated code, especially for complex algorithm implementations or engineering codes, where it is often challenging to generate correct programs in one attempt. To address this issue, the presentation will discuss how to introduce software engineering practices such as program testing or formal verification into the era of large models. Leveraging the powerful generation capabilities of large models, the presentation aims to enhance the credibility of code generation by enabling self-verification within the models. Additionally, it focuses on automating the formal verification process of programs using large models, aiming to verify code reliability from a theoretical proof perspective.
Personal Profile: Shuai Lu is a researcher at Microsoft Research Asia. He graduated from Peking University in 2021, specializing in code intelligence and natural language processing. His research focuses on leveraging deep learning technologies for automating software development to empower programmers. His primary research interests include code autocompletion, code generation, and programming language pretraining models. His research contributions have been published in top AI and software engineering conferences such as NeurIPS, ICLR, ACL, ICSE, FSE, with over three thousand citations on Google Scholar.

Speaker 4: Tao Yu

Speaker: Tao Yu
Title: OSWorld: Benchmarking Open-Task Multimodal Agents in Real Computing Environments
Abstract: With the advancements in Visual-Language Models (VLMs), the emergence of autonomous digital agents holds promise to revolutionize human-computer interaction, enhancing accessibility and productivity. These multimodal agents autonomously perform complex reasoning, decision-making, and multi-step action plans across different environments. In this talk, I will primarily introduce OSWorld, a dedicated real computing environment designed to advance the development of agents capable of executing a wide range of digital tasks across various operating systems, interfaces, and applications. I will share insights into cutting-edge VLMs performing open tasks in the OSWorld environment. Additionally, I will discuss some of the latest works in this direction, including fine-tuning retrievers for diverse environment adaptation and enhancing LLM capabilities through tool integration. The presentation will conclude with a discussion on current and future research prospects in this rapidly evolving field.
Personal Profile: Tao Yu is an Assistant Professor of Computer Science at the University of Hong Kong, specializing in natural language processing. He obtained his Ph.D. from Yale University and was a postdoctoral researcher at the University of Washington UWNLP. His research aims to construct language model agents capable of translating language instructions into executable code or actions in real-world environments, including databases, web applications, and the physical world. This forms the core of next-generation natural language interfaces that interact with and learn from the real world through dialogue, facilitating human interaction with data analytics, web applications, and robotic instructions. He has received the Google Research Scholar Award and the Amazon Research Award.

Speaker 5: Qingfu Zhu

Speaker: Qingfu Zhu
Title: Multilingual Code Models
Abstract: In recent years, the development of code model technology has flourished, leading to the aggregation of more programming language data into large models and thus expanding code generation tasks from single programming languages to multiple ones. Meanwhile, since 95% of the global population speaks non-English native languages, extending code generation tasks to multiple natural languages is equally crucial. This presentation will compare the performance differences of code models among various programming languages and natural languages, introduce methods to enhance performance in low-resource languages, and explore attempts to leverage the multilingual capabilities of code models to improve downstream task performance.
Personal Profile: Qingfu Zhu is an Assistant Professor at Harbin Institute of Technology, with a joint Ph.D. from the University of California, Santa Barbara. His research focuses on natural language processing and code generation. He has published multiple papers in top international conferences in natural language processing, including ACL, AAAI, EMNLP, etc. He has led and participated in several projects funded by the National Natural Science Foundation of China and the "New Generation Artificial Intelligence" Major Program of Science and Technology Innovation 2030.

Speaker 6: Lixing Li

Speaker: Lixing Li
Title: Intelligent Software Development Applications Based on aiXcoder Code Model
Abstract: AI-driven intelligent development based on large models is currently a hot topic and trend in software development technology and tools. There is increasing demand from enterprises for AI-driven software development applications based on code models, yet they face many challenges. The aiXcoder team has been exploring and practicing in this field for over 10 years, pioneering AI-based intelligent development and driving advancements. This presentation will focus on the latest developments of aiXcoder in the field of code models, discussing their explorations and reflections on implementing AI-driven software development technologies and paradigms based on large models.
Personal Profile: Lixing Li is the Chief Operating Officer of aiXcoder, with a Ph.D. in Computer Software and Theory from Peking University/Chinese Academy of Sciences. He has previously served as the algorithm lead of Alibaba Youku Search Team, co-founder and CIO of a medical AI startup, accumulating over 15 years of experience in AI algorithm research and team management. His current responsibilities include leading the research, development, and application deployment of aiXcoder's intelligent software development system.