Evaluation Tasks
The Twenty-third China National Conference on Computational Linguistics (CCL24-Eval)
Technical Evaluation Task Release
Conference Website:http://cips-cl.org/static/CCL2024/en/index.html
The 23rd China National Conference on Computational Linguistics (CCL 2024) will be held in Taiyuan, Shanxi Province, from July 25 to 28, 2024, organized by the Chinese Information Processing Society of China and hosted by Shanxi University.
This conference will continue to organize the Chinese language processing technology evaluation CCL24-Eval. After the initial collection of evaluation tasks, the CCL24-Eval organizing committee has confirmed 10 evaluation tasks, covering research directions such as semantic parsing, classical chinese analysis, essay fluency evaluation, sign language translation, and multimodal understanding. Researchers are welcome to participate in the evaluation competition. Each evaluation task will establish several first, second, and third prizes based on the competition results, and the Chinese Information Processing Society of China will issue official honorary certificates.
Computing Power Sponsorship Information
The computing power support for this evaluation task is generously sponsored by Beijing Parallel Technology Co., Ltd., providing two types of graphics card configurations (choose one of two), with 500 yuan of free computing power per team.
Configuration 1: Graphics card type N40-4090-24G, its configuration is as follows:
CPU: AMD EPYC 7402 (48C)@2.8GHz
Memory: 512GB
GPU: 8*NVIDIA®GeForce®RTX 4090
Graphics Memory: 8*24GB (936.2 GB/s)
Node Interconnect: RoCE 2 * 25Gbps (RDMA protocol)
Operating System: CentOS 7.9
Billing Model: On-demand 4.8 RMB/card/hour
300G hard disk will be provided free of charge to each participating team, and if additional capacity is needed, it will be billed at 2000 yuan/T/year.
Configuration 2: Graphics card type N26-V100-32G, its configuration is as follows:
CPU: Platinum82 series (80vCPU)v6@2.5GHz
Memory: 320GB
GPU: 8*NVIDIA®Tesla®V100 SXM2
Graphics Memory: 8*32GB (897 GB/s)
NVLink: Bidirectional communication 300 GB/s
Operating System: CentOS 7.8
Billing Model: On-demand 5.3 RMB/card/hour
300G hard disk will be provided free of charge to each participating team, and if additional capacity is needed, it will be billed at 58 yuan/G/month.
Thanks to Beijing Paratera Technology Co., Ltd. for their generous sponsorship. We welcome outstanding teams from all walks of life to actively sign up for the competition!
Notes:
1. Teams must register in the name of a teacher.
2. The data of each team account will be saved for one year.
3. Each participating team account provides 500 yuan of free computing power. Exceeding this amount will be restricted.
4. The 4090 cluster is activated by default. If V100 or other resources are needed, please negotiate separately.
5. Accounts default to 8-card permissions. Additional permissions need to be negotiated separately.
6. Accounts need to provide the following information: 1. Name and phone number 2. School and department 3. Email for account creation
Forum Invited Reports
Invited Speaker 1: Qi Zhang
Invited Speaker 1: Qi Zhang
Speaker: Professor Qi Zhang (Fudan University)
Title: Reflections on Evaluation Methods for Large Models
Abstract: Since 2023, large language models have seen rapid development and have demonstrated unprecedented capabilities in various fields. However, comprehensive evaluation of these large models faces numerous challenges, including broad scope, high workload, and significant evaluation costs. This report will discuss the difficulties in evaluating large models, potential solutions, and systematically review current evaluation methods and frameworks.
Personal Profile: Qi Zhang is a Professor at the School of Computer Science and Technology, Fudan University, and a PhD supervisor. He serves as the Deputy Director of the Shanghai Key Laboratory of Intelligent Information Processing. He is also a board member of the Chinese Information Processing Society of China (CIPS), a standing committee member of the CCF Large Model Forum, a committee member of the CIPS Large Model Committee, and a standing committee member of the Young Workers Committee of the Artificial Intelligence Society. He has served as the Program Chair, Area Chair, and Tutorial Chair at important international and domestic conferences such as ACL, EMNLP, COLING, SIGIR, and the National Information Retrieval Conference. He has published over 150 papers, holds 4 US patents, and authored books including "Introduction to Natural Language Processing" and "Large Scale Language Models: From Theory to Practice." He has developed the Fudan MouSi Multimodal Large Model, the world's first robustness evaluation platform for natural language processing, TextFlint, and the world's first unified jailbreak attack framework, EasyJailbreak. His research achievements have received numerous awards, including the WSDM 2014 Best Paper Nomination, COLING 2018 Area Chair Recommendation, NLPCC 2019 Outstanding Paper Award, COLING 2022 Outstanding Paper Award, NeurIPS 2023 Instruction Workshop Best Paper Award, and ICLR 2024 Highlight Paper Award. He has been supported by the Shanghai "Morning Light" Talent Program and Fudan University's "Excellence 2025" Talent Cultivation Program. He has received awards such as the Qian Weichang Chinese Information Processing Science and Technology First Prize, Hanwang Youth Innovation First Prize, Shanghai Science and Technology Progress Second Prize, Ministry of Education Science and Technology Progress Second Prize, ACM Shanghai Rising Star Nomination Award, and IBM Faculty Award.
Invited Speaker 2: Deyi Xiong
Invited Speaker 2: Deyi Xiong
Speaker: Professor Deyi Xiong (Tianjin University)
Title: Exploration, Practice, and Reflections on Evaluating Chinese Large Models
Abstract: Evaluating large models is crucial for their practical deployment, serving as a vital tool for measuring the boundaries of model capabilities and identifying potential risks. This report will provide an overview of black-box and white-box evaluation methods for large models, analyze the strengths and weaknesses of various evaluation approaches, and discuss the concepts and frameworks for evaluating the full lifecycle of Chinese large models. It will summarize the practical experiences and findings of TJUNLP in evaluating Chinese large models over the past two years, including the construction of evaluation benchmarks, platforms, systems, and standards. Additionally, it will offer reflections and outlooks on the development and safety governance of large model capabilities based on evaluations.
Personal Profile: Deyi Xiong is a Professor and PhD supervisor at the School of Intelligence and Computing, Tianjin University, and the head of the Natural Language Processing Laboratory. He is the Director of the Tianjin Belt and Road Joint Laboratory for Language Intelligence and Technology. His primary research areas include natural language processing, with a focus on large language models, machine translation, AI alignment, commonsense reasoning, and cognitive computing. He has published over 150 papers in prominent international journals and conferences such as IEEE TPAMI, AI, AAAI, ACL, and has authored one Chinese and one English monograph. He has filed and been granted over 30 invention patents and has participated in the formulation of multiple standards related to large models. He has received funding from more than 20 projects, including the National Key R&D Program "Intergovernmental International Science and Technology Innovation Cooperation," the Newton Advanced Fellowship of the Royal Society, the Industrial Innovation Task of the Ministry of Industry and Information Technology, and the Key R&D Program of Yunnan Province. His awards include the Beijing Science and Technology Award Second Prize, the Chinese Information Processing Society Youth Innovation First Prize, and several other prestigious accolades. He has served as Co-Chair of the Program Committee for IALP 2012 & 2021, Co-Chair of the Program Committee for CWMT 2017, and as (Senior) Area Chair, Sponsorship Chair, and Demo Chair for numerous renowned international conferences, including NeurIPS, ACL, EMNLP, NAACL, COLING, and AACL. He is the Executive Editor of TACL and CL, Associate Editor of ACM TALLIP, and Section Editor for Data in Brief. He has led the development of the Renwen Fuxi Large Model and the OpenEval Large Model Open Evaluation Platform.
Evaluation Tasks
Task 1: The Second Chinese Frame Semantic Parsing
Task 1: The Second Chinese Frame Semantic Parsing
Task Overview
Frame Semantic Parsing (FSP) is a fine-grained semantic analysis task based on frame semantics. Its goal is to extract frame semantic structures from sentences to achieve a deep understanding of events or situations in the sentence. Frame semantic parsing is of great significance for downstream tasks such as reading comprehension, text summarization, and relation extraction.
In natural language, meaning is mostly conveyed on a word-by-word basis, but there are also many phenomena where the meanings of words aggregate, forming new meanings in phrases. For example, the phrase "爱买不买" (literally "love buy not buy") conveys that the speaker does not care or is not interested in whether the other party will buy something. In frame semantic analysis, this phrase should activate the "emotional response" frame as a whole. If individual verbs like "爱" (love) and "买" (buy) are taken as target words, activating frames such as liking and buying, then the unique emotional color of the phrase cannot be captured.
Construction Grammar argues that language is composed of fixed, meaningful units called constructions, which can be simple words or phrases, as well as complex sentences or discourses. For example, in the phrase "爱买不买" (literally "love buy not buy"), the corresponding construction is "爱V不V" ("love V not V"). This construction is a holistic expression of semantics, indicating indifference or nonchalance towards a certain action, and should be activated as a whole to evoke the corresponding frame.
To enhance the capability of frame semantic parsing and further achieve deep understanding of language, we have introduced the second Chinese Frame Semantic Parsing Evaluation, which includes frame semantic parsing data with constructions as "target words".
This evaluation includes the following three sub-tasks:
- Frame Identification: Identify the frames activated by the given target words or constructions in the sentence.
- Argument Identification: Identify the boundary range of arguments dominated by the given target words or constructions in the sentence.
- Role Identification: Predict the semantic role labels of arguments in the argument range identification task.
This evaluation includes two tracks: open and closed. In the open track, participating teams can use large models like ChatGPT for reasoning, but fine-tuning is prohibited, and they must submit the prompt templates they use. In the closed track, the parameters of the participating models will be limited.
Organizers and Contact Persons
- Organizers: Ru Li, Hongye Tan (Shanxi University); Baobao Chang (Peking University); Xinyu Dai (Nanjing University)
- Task Leader: Zhichao Yan (Ph.D. student at Shanxi University, 202312407023@email.sxu.edu.cn)
- Task Contact: Juncai Li (Ph.D. student at Shanxi University, 202312407010@email.sxu.edu.cn)
Task Awards
For each track, the following awards will be given;
- First Prize: 0-2, total prize of 2 laptops;
- Second Prize: 0-2, total prize of 1200 RMB;
- Third Prize: 0-2, total prize of 800 RMB.
Sponsorship
- The laptops are sponsored by Baixin Information Technology Co., Ltd.;
- The evaluation prize money is jointly sponsored by Song Xiaomin, responsible person of Sitonholy (Tianjin) Technology Co., Ltd. and Jiehui Technology in Taiyuan City.
Task Website
https://github.com/SXUCFN/The-2nd-Chinese-Frame-Semantic-Parsing
Task 2: Chinese Parataxis Graph Parsing
Task 2: Chinese Parataxis Graph Parsing
Task Overview
Chinese Parataxis Graph (CPG) is a semantic representation graph centered around events, represented as a single-rooted directed graph. Nodes in the graph correspond to units carrying events, entities, and attributes, while edges are directed edges representing semantic relationships between units.
CPG, while adhering to human cognition of language, considers the operability for practical applications. It is hierarchically structured to facilitate the design of subsequent semantic analysis paths, aiming to achieve a semantic representation scheme that is both universal and extensible. According to hierarchy, CPG can be decomposed into multiple sub-parts. CPG consists of two main parts: event structure and entity structure:
Event structure is divided into internal and external structures. The internal structure includes argument structure centered around event words, modality structure, and spatiotemporal structure. The external structure comprises a relational event structure formed by multiple events.
Entity structure consists of internal and external structures. The internal structure includes entity attributes and attribute value structure, while the external structure comprises an entity relational event structure formed by multiple entities.
The 2024 Chinese Parataxis Graph Semantic Parsing Evaluation Task only requires the generation of sentence-level parataxis graph frameworks. The input unit is a sentence, and the output is the parataxis graph framework structure. There is no need to generate internal semantic classifications such as refined entity structures, modality structures, spatiotemporal structures, etc. Only determining whether it belongs to that structural component is required, and the provided corpus is also labeled at a coarse granularity.
For example, in the sentence "他哭肿了眼睛" (He cried and his eyes swelled), the task is to automatically parse out the following set of triplets:
{(他,哭,A0), (眼睛,肿,A0), (他,眼睛,EntityRel), (了,哭,Time), (了,肿,Time), (哭,因果关系,原因事件), (肿,因果关系,结果事件), (哭,ROOT,CoreWord)}
The dataset for this evaluation comes from manually annotated international Chinese educational reading texts and the Penn Treebank corpus. The evaluation is an open test, allowing the use of external resources.
Organizers and Contact Persons
Organizers: Endong Xun (Language Resources High-Level Specialized Center, Beijing Language and Culture University), Gaoqi Rao (International Chinese Studies Institute, Beijing Language and Culture University), Gongbo Tang (School of Information Science, Beijing Language and Culture University)
Contact Persons: Mengxi Guo (Master's student, Beijing Language and Culture University, guo_mengxi@foxmail.com), Meng Li (Ph.D. student, Beijing Language and Culture University)
Task Awards
This evaluation will set up first, second, and third prizes, providing a total prize of 7000 yuan.
Task Website
https://github.com/gertrude95/Chinese-Parataxis-Graph-Parsing
Task 3: The Fourth Chinese Spatial Cognition Evaluation (SpaCE 2024)
Task 3: The Fourth Chinese Spatial Cognition Evaluation (SpaCE 2024)
Task Overview
Spatial expression describes the spatial orientation relationship between objects, which is a common phenomenon in natural language. To accurately understand the semantics of spatial expressions in text, it is necessary to not only have linguistic knowledge but also to invoke spatial cognitive abilities, construct spatial scenes, and make inferences related to spatial orientation information based on world knowledge.
The Spatial Cognition Evaluation (SpaCE) aims to test the level of machine understanding of Chinese spatial semantics. It has been held for three consecutive years since 2021. Existing evaluation results show that compared to the average level of ordinary humans, there is a significant gap in the level of Chinese spatial semantic understanding of machines, especially in tasks that require high-level spatial cognition processing. Spatial semantic understanding remains a highly challenging task for NLP systems, including large language models.
To further enhance the machine's understanding of spatial semantics, we have launched the fourth Chinese Spatial Cognition Evaluation (SpaCE 2024). Compared to the previous three editions, this evaluation pays more attention to testing the spatial semantic understanding ability of large language models. The aim is to assess the comprehensive ability of machines to understand Chinese spatial semantics on a test dataset.
Therefore, SpaCE 2024 will no longer divide tasks into tracks but will instead test the following five levels of spatial semantic understanding in the form of multiple-choice questions:
- Recognition of Spatial Information Anomalies: Selecting language expressions in the text that represent abnormal spatial information.
- Recognition of Spatial Information Entities: Selecting the referents or targets of spatial information in the text.
- Recognition of Spatial Information Roles: Selecting the semantic roles of spatial information in the text, or selecting the spatial expression corresponding to the given semantic role.
- Inference of Spatial Orientation Information: Making inferences based on spatial knowledge and selecting the inference results.
- Discrimination of Spatial Semantic Relationships: Selecting spatial semantic words or phrases that constitute two text expressions with different forms but similar or different meanings.
Organizers and Contact Persons
- Organizers: Weidong Zhan, Zhifang Sui (Peking University)
- Task Contact: Xiao Liming (Ph.D. student, Peking University, lmxiao@stu.pku.edu.cn)
Task Awards
- First Prize: 0-1 teams, total prize money of 12,000 RMB;
- Second Prize: 0-2 teams, total prize money of 12,000 RMB;
- Third Prize: 0-4 teams, total prize money of 12,000 RMB.
Sponsorship
The prize money for this evaluation is sponsored by Huawei.
Task Website
https://2030nlp.github.io/SpaCE2024
Task 4: The Fourth Chinese Abstract Meaning Representation Parsing Evaluation (CAMRP 2024)
Task 4: The Fourth Chinese Abstract Meaning Representation Parsing Evaluation (CAMRP 2024)
Task Overview
Abstract Meaning Representation (AMR)is a semantic representation method that has emerged in recent years. It abstracts the semantic structure of a sentence into a single-rooted, directed acyclic graph.
Chinese Abstract Meaning Representation (CAMR) has been improved based on the AMR framework to better suit the characteristics of the Chinese language. It retains the strong ability of AMR to represent the semantics of whole sentences while adding annotations for concept alignment and relation alignment. The task of CAMRP 2024 is to parse sentences in Classical Chinese and output CAMR semantic graphs that include information on concept alignment and relation alignment. The performance of the models will be evaluated based on the F1 score under the Align-smatch evaluation metric. Compared to previous competitions, this year's evaluation includes 2500 sentences of Classical Chinese as validation and test sets, focusing on evaluating the performance of models in parsing Ancient Chinese AMR. Additionally, the training set from previous years, which contains 16,576 sentences in Modern Chinese, will be used to observe the transfer learning ability of parsing systems on Classical Chinese.
Organizers and Contact Persons
- Organizers: Bin Li, Minxuan Feng, Weiguang Qu, Junsheng Zhou (Nanjing Normal University)
- Task Contact: Zhixing Xu (Ph.D. student at Nanjing Normal University, xzx0828@live.com)
TaskAwards
This evaluation will set up first, second, and third prizes, providing a total prize of 7000 RMB.
Task Website
https://github.com/GoThereGit/Chinese-AMR/
Task 5: Classical Chinese Event Detection
Task 5: Classical Chinese Event Detection
Task Overview
Event extraction is the process of identifying and extracting relevant event information from natural language text. Due to the complex syntax and semantics of classical Chinese, which has a limited scope of use, information extraction tasks for classical Chinese still face significant challenges. We have constructed a hierarchical logic-based event type system for classical Chinese, consisting of 9 major categories and 67 subcategories. Based on this event type system and the corpus of "The Twenty-Four Histories," we have created the Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection (CHED), which includes 8122 annotated event instances (including trigger words and event types). The task aims to evaluate the performance of algorithms for detecting historical events in classical Chinese, and it includes two subtasks:
- Subtask 1: Trigger Identification
- Subtask 2: Event Type Classification
This task requires identifying event trigger words in the text and marking their positions. Trigger words are mainly monosyllabic words that best represent the occurrence of an event, typically the predicate verb in a sentence (although other sentence components are also possible).
This task requires determining the event type for each trigger word based on our constructed event type system (see the task website).
For example, in the sentence "进军建德,擒贼帅赵桑干." ("Advance to Jiande and capture the bandit leader Zhao Sanggan."), the word "进军" ("advance") represents the event of "dispatching troops to Jiande," and the word "擒" ("capture") represents the event of "capturing the enemy leader Zhao Sanggan." Therefore, in this sentence, the trigger words are "进军" and "擒," representing the event types "Military-Preparation-Deployment" and "Military-Operation-Capture" respectively.
Organizers and Contact Person
- Organizers: Yanqiu Shao, Wei Li (Beijing Language and Culture University)
- Task Contact: Zhenbing Feng (Master's student, Beijing Language and Culture University, blcu_lcclab@163.com)
Task Website
https://github.com/NLPInBLCU/CHED2024
Task 6: Chinese Essay Rhetoric Recognition and Understanding
Task 6: Chinese Essay Rhetoric Recognition and Understanding
Task Overview
In the learning process of primary and secondary school students, rhetorical devices are not only a core part of reading comprehension and writing skills but also an indispensable element in shaping excellent literary works. Identifying and understanding the use of rhetoric in student essays can help improve their writing skills, guiding them towards higher-quality narratives and descriptions. However, this requires a significant amount of manual effort, posing challenges for teachers in essay assessment and teaching. With the development of education and the widespread use of the internet, many researchers and institutions have begun exploring the use of computer technology to achieve automatic essay grading, where the use of rhetorical devices plays an important role in teachers' essay assessment.
The evaluation will focus on the "Understanding of Rhetorical Devices in Elementary and Middle School Essays" task, categorizing rhetorical devices into metaphor, simile, exaggeration, and parallelism. It will further classify these four rhetorical devices in detail, including their objects and contents, such as:
- Identification of rhetorical form types in elementary and middle school essays
- Identification of rhetorical content types in elementary and middle school essays
- Extraction of rhetorical components in elementary and middle school essays
There are a total of three tracks, providing more criteria for understanding rhetorical devices in elementary and middle school essays.
Organizers and Contact Persons
- Organizers: Nuowei Liu, Xinhao Chen, Yupei Ren, Man Lan, Xiaopeng Bai, Yuanbin Wu (East China Normal University), Shaoguang Mao, Yan Xia (Microsoft Research Asia)
- Task Contact: Nuowei Liu (East China Normal University, nwliu@stu.ecnu.edu.cn)
Task Awards
The organizers of the task will provide a prize of 10,000 RMB to the winning team.
Task Website
https://github.com/cubenlp/CERRU
Task 7: Chinese Essay Fluency Evaluation
Task 7: Chinese Essay Fluency Evaluation
Task Overview
The Chinese Essay Fluency Evaluation (CEFE) task aims to identify and correct errors that affect fluency in essays. Current work typically treats fluency evaluation as a standalone natural language processing task, lacking systematic integration from multiple levels and perspectives. Unlike errors based on rule-generated data or interlanguage data from Chinese learners, as well as grammatical errors in spoken or written language by other native speakers, errors in essays by elementary and middle school students are more diverse and involve more complex grammar knowledge. Therefore, we systematically define fine-grained error types that affect the fluency of essays from lexical, syntactic, semantic, and other perspectives, and provide correction suggestions. Compared to last year, to comprehensively evaluate the fluency of essays, this year's evaluation task adds an essay fluency rating task and includes 1,200 additional sentences in the training set. This evaluation task is designed with the following three tracks:
- Identification of Types of Grammatically Incorrect Sentences in Elementary and Middle School Essays: Recognize different types of grammatically incorrect sentences in essays.
- Rewriting Grammatically Incorrect Sentences in Elementary and Middle School Essays: Rewrite grammatically incorrect sentences in essays to make them correct.
- Fluency Rating of Elementary and Middle School Essays: Evaluate the fluency level of essays.
Organizers and Contact Persons
- Organizers: Xinlin Zhuang, Xinshu Shen, Hongyi Wu, Yupei Ren, Xiaopeng Bai, Man Lan, Yuanbin Wu(East China Normal University), Shaoguang Mao, Yan Xia, Tao Ge (Microsoft Research Asia)
- Task Contact: Xinlin Zhuang (East China Normal University, zhuangxinlin2022@163.com)
Task Awards
The task organizers will provide a prize of 10,000 RMB to the winning team.
Task Website
https://github.com/cubenlp/2024CCL_CEFE
Task 8: Evaluation on Commonsense Reasoning and Moral Understanding in Children's Stories (CRMU)
Task 8: Evaluation on Commonsense Reasoning and Moral Understanding in Children's Stories (CRMU)
Task Overview
The Evaluation on Commonsense Reasoning and Moral Understanding in Children's Stories (CRMU) aims to evaluate Chinese pretrained language models and large-scale language models from multiple perspectives in commonsense reasoning and moral understanding. This evaluation includes the following two subtasks:
- Commonsense Reasoning: Given a story and a commonsense question, select the correct answer from the provided candidates.
- Moral Matching: Based on a given story, select the most appropriate moral from multiple candidate morals.
The data used in this evaluation is collected from classic fables gathered from websites. The questions and options for the commonsense reasoning task are manually annotated and involve types of common knowledge including temporal, spatial, biological, physical, and social knowledge. For the moral matching task, questions and options are provided in a combination of manually annotated and automatically generated manners.
Organizers and Contact Persons
- Organizers: Hongye Tan, Ru Li, Hu Zhang (Shanxi University); Kui Yu (Hefei University of Technology)
- Task Leader: Yaxin Guo (Ph.D. student, Shanxi University, 202112407002@email.sxu.edu.cn)
- Task Contact: Guohang Yan (Master's student, Shanxi University, yanguohang@qq.com)
Task Website
https://github.com/SXU-YaxinGuo/CRMU
Task 9: Chinese Vision-Language Understanding Evaluation
Task 9: Chinese Vision-Language Understanding Evaluation
Task Overview
Chinese Vision-Language Understanding Evaluation (CVLUE) aims to comprehensively evaluate the modeling and understanding capabilities of Chinese multimodal pretrained models in tasks such as Image-Text Retrieval, Visual Question Answering, Visual Grounding, and Visual Dialog. This evaluation includes the following five subtasks:
- Image Retrieval: Retrieve the corresponding image from a set of candidates based on a given text description.
- Text Retrieval: Retrieve the corresponding text description from a set of candidates based on a given image.
- Visual Question Answering: Answer questions based on a given image using phrases.
- Visual Grounding: Identify entities in an image corresponding to a given image and text description.
- Visual Dialog: Select the most appropriate response text from a set of candidates based on a given image and dialog history.
Organizers and Contact Persons
- Organizers: Zhiguo Wan, Yuxuan Wang (Zhejiang Lab); Wanxiang Che (Harbin Institute of Technology)
- Task contact: Yijun Liu (Harbin Institute of Technology, yijunliu@ir.hit.edu.cn)
Task Website
https://github.com/WangYuxuan93/CVLUE
Task 10: Quality Evaluation of Sign Language Avatars Translation (QESLAT)
Task 10: Quality Evaluation of Sign Language Avatars Translation (QESLAT)
Task Overview
With the advancement of technology, Sign Language Avatars have become an important tool for facilitating communication between the deaf community and society. Sign Language Avatars provide real-time translation services for the deaf by simulating sign language gestures, helping to break down language barriers and enhance the social participation of the deaf community. To ensure that Sign Language Avatars can provide accurate, natural, and easily understandable sign language translations, it is crucial to evaluate their translation quality. This evaluation aims to assess the naturalness and accuracy of Sign Language Avatars translating Chinese into Chinese Sign Language, ensuring that they conform to sign language grammar rules and are understandable and acceptable to the deaf community.
This evaluation is under the guidance of the Chinese Association of the Deaf's Sign Language Research and Promotion Committee. The evaluation team, consisting of deaf individuals and professional sign language interpreters certified by the Chinese Association of the Deaf's Sign Language Research and Promotion Committee, will manually evaluate the translation results of the Sign Language Avatars. The evaluation will focus on the accuracy of sign language grammar, the naturalness and readability of expression, and whether it meets the understanding of the deaf community. It will also consider factors such as the clarity of gestures, fluency, and semantic consistency with the original Chinese text. Specifically, the evaluation will include the following four criteria:
- Sign Language Grammar Accuracy: Whether the Sign Language Avatar follows the word order rules of Chinese Sign Language, the accuracy of gestures, and whether grammar markers are correctly expressed in the translation.
- Naturalness: Evaluating the coherence of gestures, whether the translation conforms to the daily expression habits of the deaf community, and whether non-verbal elements such as facial expressions, body posture, and spatial layout are naturally integrated into the translation.
- Readability: Evaluating the performance of the Sign Language Avatar in terms of clarity, consistency, and adaptability.
- Cultural Adaptability: Whether cultural differences and social context adaptability are considered in the translation, and whether the emotional nuances of the original text are accurately conveyed.
Organizers and Contact Persons
- Organizers: Dengfeng Yao (Beijing Union University/Tsinghua University, tjtdengfeng@buu.edu.cn), Guowei Yang (Henan University of Economics and Law/Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing), Peng Jin (Leshan Normal University Special Education Language Intelligence Sichuan Provincial Key Laboratory of Philosophy and Social Sciences), Yidong Chen (Xiamen University), Cong Xu (Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing/China National Center for the Promotion of Sign Language by Huaxia Publishing House), Haixu Wang (Qinghai Radio and Television Station/China Braille and Sign Language Research and Application Center), Bin Chen (Zhuzhou Voice of Hand Information Technology Co., Ltd.), Li Quan Wu (Shenzhen Information Accessibility Research Association), Gang Shen (Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing), Huaming Chen (Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing), Chunda Liu (Beijing Sign Language Research Association), Yanli Ding (Beijing Union University National Language and Text Promotion Base), Ke Hu (Beijing Union University), Lan Chen (Shenzhen Link Accessibility Co.,Ltd.), and Tiantian Yuan (Tianjin University of Technology).
- Task Contact: Yuan Zhao (Master's student at Beijing Union University, 1398396428@qq.com)
Task Website
https://github.com/ann-yuan/QESLAT-2024
Overall Schedule for Technical Evaluation
- Deadline for task solicitation: January 31, 2024
- Task online release time: February 4, 2024
- Overall evaluation end time: May 31, 2024
- Task organizers are required to determine and publish the scores and rankings of participating teams before this deadline
- Submission of technical reports in Chinese or English: May 31, 2024
- For the task organizers to understand the methods of the participating teams, and the technical report is also one of the considerations for the awards. Failure to submit a technical report will result in disqualification from the awards.
- Feedback on Chinese or English technical reports: June 5, 2024
- Task organizers will conduct initial evaluations of the technical reports and provide feedback.
- Formal submission of Chinese and English evaluation papers: June 10, 2024
- Including the Overview paper written by the task organizers, excellent Chinese and English technical reports recommended by the task organizers (please improve them according to the suggestions of the task organizers before submission), and entering double-blind review.
- Announcement of winners: June 15, 2024
- Notification of paper acceptance: June 25, 2024 (Technical reports are an important consideration for awards, but not necessarily accepted for publication)
- Camera-ready paper submission: July 1, 2024
- Accepted papers will be included in the ACL/CCL Anthology
- CCL 2024 Evaluation Workshop: July 25-28, 2024
- Task organizers will present an Overview report, present awards, and moderate sessions. Winning teams will present technical reports.
Please contact the task organizers or evaluation chairs if you have any questions.
February 5, 2024