Evaluation Tasks


The Twenty-third China National Conference on Computational Linguistics (CCL24-Eval)
Technical Evaluation Task Release

The 23rd China National Conference on Computational Linguistics (CCL 2024) will be held in Taiyuan, Shanxi Province, from July 25 to 28, 2024, organized by the Chinese Information Processing Society of China and hosted by Shanxi University.

This conference will continue to organize the Chinese language processing technology evaluation CCL24-Eval. After the initial collection of evaluation tasks, the CCL24-Eval organizing committee has confirmed 10 evaluation tasks, covering research directions such as semantic parsing, classical chinese analysis, essay fluency evaluation, sign language translation, and multimodal understanding. Researchers are welcome to participate in the evaluation competition. Each evaluation task will establish several first, second, and third prizes based on the competition results, and the Chinese Information Processing Society of China will issue official honorary certificates.

Computing Power Sponsorship Information

The computing power support for this evaluation task is generously sponsored by Beijing Parallel Technology Co., Ltd., providing two types of graphics card configurations (choose one of two), with 500 yuan of free computing power per team.

Configuration 1: Graphics card type N40-4090-24G, its configuration is as follows:
CPU: AMD EPYC 7402 (48C)@2.8GHz
Memory: 512GB
GPU: 8*NVIDIA®GeForce®RTX 4090
Graphics Memory: 8*24GB (936.2 GB/s)
Node Interconnect: RoCE 2 * 25Gbps (RDMA protocol)
Operating System: CentOS 7.9
Billing Model: On-demand 4.8 RMB/card/hour
300G hard disk will be provided free of charge to each participating team, and if additional capacity is needed, it will be billed at 2000 yuan/T/year.

Configuration 2: Graphics card type N26-V100-32G, its configuration is as follows:
CPU: Platinum82 series (80vCPU)v6@2.5GHz
Memory: 320GB
GPU: 8*NVIDIA®Tesla®V100 SXM2
Graphics Memory: 8*32GB (897 GB/s)
NVLink: Bidirectional communication 300 GB/s
Operating System: CentOS 7.8
Billing Model: On-demand 5.3 RMB/card/hour
300G hard disk will be provided free of charge to each participating team, and if additional capacity is needed, it will be billed at 58 yuan/G/month.

Thanks to Beijing Paratera Technology Co., Ltd. for their generous sponsorship. We welcome outstanding teams from all walks of life to actively sign up for the competition!

Notes:
1. Teams must register in the name of a teacher.
2. The data of each team account will be saved for one year.
3. Each participating team account provides 500 yuan of free computing power. Exceeding this amount will be restricted.
4. The 4090 cluster is activated by default. If V100 or other resources are needed, please negotiate separately.
5. Accounts default to 8-card permissions. Additional permissions need to be negotiated separately.
6. Accounts need to provide the following information: 1. Name and phone number 2. School and department 3. Email for account creation

Evaluation Tasks

Task 1: The Second Chinese Frame Semantic Parsing

Task Overview

Frame Semantic Parsing (FSP) is a fine-grained semantic analysis task based on frame semantics. Its goal is to extract frame semantic structures from sentences to achieve a deep understanding of events or situations in the sentence. Frame semantic parsing is of great significance for downstream tasks such as reading comprehension, text summarization, and relation extraction.

In natural language, meaning is mostly conveyed on a word-by-word basis, but there are also many phenomena where the meanings of words aggregate, forming new meanings in phrases. For example, the phrase "爱买不买" (literally "love buy not buy") conveys that the speaker does not care or is not interested in whether the other party will buy something. In frame semantic analysis, this phrase should activate the "emotional response" frame as a whole. If individual verbs like "爱" (love) and "买" (buy) are taken as target words, activating frames such as liking and buying, then the unique emotional color of the phrase cannot be captured.

Construction Grammar argues that language is composed of fixed, meaningful units called constructions, which can be simple words or phrases, as well as complex sentences or discourses. For example, in the phrase "爱买不买" (literally "love buy not buy"), the corresponding construction is "爱V不V" ("love V not V"). This construction is a holistic expression of semantics, indicating indifference or nonchalance towards a certain action, and should be activated as a whole to evoke the corresponding frame.

To enhance the capability of frame semantic parsing and further achieve deep understanding of language, we have introduced the second Chinese Frame Semantic Parsing Evaluation, which includes frame semantic parsing data with constructions as "target words".

This evaluation includes the following three sub-tasks:

  • Frame Identification: Identify the frames activated by the given target words or constructions in the sentence.
  • Argument Identification: Identify the boundary range of arguments dominated by the given target words or constructions in the sentence.
  • Role Identification: Predict the semantic role labels of arguments in the argument range identification task.

This evaluation includes two tracks: open and closed. In the open track, participating teams can use large models like ChatGPT for reasoning, but fine-tuning is prohibited, and they must submit the prompt templates they use. In the closed track, the parameters of the participating models will be limited.

Organizers and Contact Persons

  • Organizers: Ru Li, Hongye Tan (Shanxi University); Baobao Chang (Peking University); Xinyu Dai (Nanjing University)
  • Task Leader: Zhichao Yan (Ph.D. student at Shanxi University, 202312407023@email.sxu.edu.cn)
  • Task Contact: Juncai Li (Ph.D. student at Shanxi University, 202312407010@email.sxu.edu.cn)

Task Awards

For each track, the following awards will be given;

  • First Prize: 0-2, total prize of 2 laptops;
  • Second Prize: 0-2, total prize of 1200 RMB;
  • Third Prize: 0-2, total prize of 800 RMB.

Sponsorship

  • The laptops are sponsored by Baixin Information Technology Co., Ltd.;
  • The evaluation prize money is jointly sponsored by Song Xiaomin, responsible person of Sitonholy (Tianjin) Technology Co., Ltd. and Jiehui Technology in Taiyuan City.

Task Website

https://github.com/SXUCFN/The-2nd-Chinese-Frame-Semantic-Parsing

Task 2: Chinese Parataxis Graph Parsing

Task Overview

Chinese Parataxis Graph (CPG) is a semantic representation graph centered around events, represented as a single-rooted directed graph. Nodes in the graph correspond to units carrying events, entities, and attributes, while edges are directed edges representing semantic relationships between units.

CPG, while adhering to human cognition of language, considers the operability for practical applications. It is hierarchically structured to facilitate the design of subsequent semantic analysis paths, aiming to achieve a semantic representation scheme that is both universal and extensible. According to hierarchy, CPG can be decomposed into multiple sub-parts. CPG consists of two main parts: event structure and entity structure:

Event structure is divided into internal and external structures. The internal structure includes argument structure centered around event words, modality structure, and spatiotemporal structure. The external structure comprises a relational event structure formed by multiple events.

Entity structure consists of internal and external structures. The internal structure includes entity attributes and attribute value structure, while the external structure comprises an entity relational event structure formed by multiple entities.

The 2024 Chinese Parataxis Graph Semantic Parsing Evaluation Task only requires the generation of sentence-level parataxis graph frameworks. The input unit is a sentence, and the output is the parataxis graph framework structure. There is no need to generate internal semantic classifications such as refined entity structures, modality structures, spatiotemporal structures, etc. Only determining whether it belongs to that structural component is required, and the provided corpus is also labeled at a coarse granularity.

For example, in the sentence "他哭肿了眼睛" (He cried and his eyes swelled), the task is to automatically parse out the following set of triplets:

{(他,哭,A0), (眼睛,肿,A0), (他,眼睛,EntityRel), (了,哭,Time), (了,肿,Time), (哭,因果关系,原因事件), (肿,因果关系,结果事件), (哭,ROOT,CoreWord)}

The dataset for this evaluation comes from manually annotated international Chinese educational reading texts and the Penn Treebank corpus. The evaluation is an open test, allowing the use of external resources.

Organizers and Contact Persons

Organizers: Endong Xun (Language Resources High-Level Specialized Center, Beijing Language and Culture University), Gaoqi Rao (International Chinese Studies Institute, Beijing Language and Culture University), Gongbo Tang (School of Information Science, Beijing Language and Culture University)

Contact Persons: Mengxi Guo (Master's student, Beijing Language and Culture University, guo_mengxi@foxmail.com), Meng Li (Ph.D. student, Beijing Language and Culture University)

Task Awards

This evaluation will set up first, second, and third prizes, providing a total prize of 7000 yuan.

Task Website

https://github.com/gertrude95/Chinese-Parataxis-Graph-Parsing

Task 3: The Fourth Chinese Spatial Cognition Evaluation (SpaCE 2024)

Task Overview

Spatial expression describes the spatial orientation relationship between objects, which is a common phenomenon in natural language. To accurately understand the semantics of spatial expressions in text, it is necessary to not only have linguistic knowledge but also to invoke spatial cognitive abilities, construct spatial scenes, and make inferences related to spatial orientation information based on world knowledge.

The Spatial Cognition Evaluation (SpaCE) aims to test the level of machine understanding of Chinese spatial semantics. It has been held for three consecutive years since 2021. Existing evaluation results show that compared to the average level of ordinary humans, there is a significant gap in the level of Chinese spatial semantic understanding of machines, especially in tasks that require high-level spatial cognition processing. Spatial semantic understanding remains a highly challenging task for NLP systems, including large language models.

To further enhance the machine's understanding of spatial semantics, we have launched the fourth Chinese Spatial Cognition Evaluation (SpaCE 2024). Compared to the previous three editions, this evaluation pays more attention to testing the spatial semantic understanding ability of large language models. The aim is to assess the comprehensive ability of machines to understand Chinese spatial semantics on a test dataset.

Therefore, SpaCE 2024 will no longer divide tasks into tracks but will instead test the following five levels of spatial semantic understanding in the form of multiple-choice questions:

  • Recognition of Spatial Information Anomalies: Selecting language expressions in the text that represent abnormal spatial information.
  • Recognition of Spatial Information Entities: Selecting the referents or targets of spatial information in the text.
  • Recognition of Spatial Information Roles: Selecting the semantic roles of spatial information in the text, or selecting the spatial expression corresponding to the given semantic role.
  • Inference of Spatial Orientation Information: Making inferences based on spatial knowledge and selecting the inference results.
  • Discrimination of Spatial Semantic Relationships: Selecting spatial semantic words or phrases that constitute two text expressions with different forms but similar or different meanings.

Organizers and Contact Persons

  • Organizers: Weidong Zhan, Zhifang Sui (Peking University)
  • Task Contact: Xiao Liming (Ph.D. student, Peking University, lmxiao@stu.pku.edu.cn)

Task Awards

  • First Prize: 0-1 teams, total prize money of 12,000 RMB;
  • Second Prize: 0-2 teams, total prize money of 12,000 RMB;
  • Third Prize: 0-4 teams, total prize money of 12,000 RMB.

Sponsorship

The prize money for this evaluation is sponsored by Huawei.

Task Website

https://2030nlp.github.io/SpaCE2024

Task 4: The Fourth Chinese Abstract Meaning Representation Parsing Evaluation (CAMRP 2024)

Task Overview

Abstract Meaning Representation (AMR)is a semantic representation method that has emerged in recent years. It abstracts the semantic structure of a sentence into a single-rooted, directed acyclic graph.

Chinese Abstract Meaning Representation (CAMR) has been improved based on the AMR framework to better suit the characteristics of the Chinese language. It retains the strong ability of AMR to represent the semantics of whole sentences while adding annotations for concept alignment and relation alignment. The task of CAMRP 2024 is to parse sentences in Classical Chinese and output CAMR semantic graphs that include information on concept alignment and relation alignment. The performance of the models will be evaluated based on the F1 score under the Align-smatch evaluation metric. Compared to previous competitions, this year's evaluation includes 2500 sentences of Classical Chinese as validation and test sets, focusing on evaluating the performance of models in parsing Ancient Chinese AMR. Additionally, the training set from previous years, which contains 16,576 sentences in Modern Chinese, will be used to observe the transfer learning ability of parsing systems on Classical Chinese.

Organizers and Contact Persons

  • Organizers: Bin Li, Minxuan Feng, Weiguang Qu, Junsheng Zhou (Nanjing Normal University)
  • Task Contact: Zhixing Xu (Ph.D. student at Nanjing Normal University, xzx0828@live.com)

TaskAwards

This evaluation will set up first, second, and third prizes, providing a total prize of 7000 RMB.

Task Website

https://github.com/GoThereGit/Chinese-AMR/

Task 5: Classical Chinese Event Detection

Task Overview

Event extraction is the process of identifying and extracting relevant event information from natural language text. Due to the complex syntax and semantics of classical Chinese, which has a limited scope of use, information extraction tasks for classical Chinese still face significant challenges. We have constructed a hierarchical logic-based event type system for classical Chinese, consisting of 9 major categories and 67 subcategories. Based on this event type system and the corpus of "The Twenty-Four Histories," we have created the Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection (CHED), which includes 8122 annotated event instances (including trigger words and event types). The task aims to evaluate the performance of algorithms for detecting historical events in classical Chinese, and it includes two subtasks:

  • Subtask 1: Trigger Identification
  • This task requires identifying event trigger words in the text and marking their positions. Trigger words are mainly monosyllabic words that best represent the occurrence of an event, typically the predicate verb in a sentence (although other sentence components are also possible).

  • Subtask 2: Event Type Classification
  • This task requires determining the event type for each trigger word based on our constructed event type system (see the task website).

    For example, in the sentence "进军建德,擒贼帅赵桑干." ("Advance to Jiande and capture the bandit leader Zhao Sanggan."), the word "进军" ("advance") represents the event of "dispatching troops to Jiande," and the word "擒" ("capture") represents the event of "capturing the enemy leader Zhao Sanggan." Therefore, in this sentence, the trigger words are "进军" and "擒," representing the event types "Military-Preparation-Deployment" and "Military-Operation-Capture" respectively.

Organizers and Contact Person

  • Organizers: Yanqiu Shao, Wei Li (Beijing Language and Culture University)
  • Task Contact: Zhenbing Feng (Master's student, Beijing Language and Culture University, blcu_lcclab@163.com)

Task Website

https://github.com/NLPInBLCU/CHED2024

Task 6: Chinese Essay Rhetoric Recognition and Understanding

Task Overview

In the learning process of primary and secondary school students, rhetorical devices are not only a core part of reading comprehension and writing skills but also an indispensable element in shaping excellent literary works. Identifying and understanding the use of rhetoric in student essays can help improve their writing skills, guiding them towards higher-quality narratives and descriptions. However, this requires a significant amount of manual effort, posing challenges for teachers in essay assessment and teaching. With the development of education and the widespread use of the internet, many researchers and institutions have begun exploring the use of computer technology to achieve automatic essay grading, where the use of rhetorical devices plays an important role in teachers' essay assessment.

The evaluation will focus on the "Understanding of Rhetorical Devices in Elementary and Middle School Essays" task, categorizing rhetorical devices into metaphor, simile, exaggeration, and parallelism. It will further classify these four rhetorical devices in detail, including their objects and contents, such as:

  • Identification of rhetorical form types in elementary and middle school essays
  • Identification of rhetorical content types in elementary and middle school essays
  • Extraction of rhetorical components in elementary and middle school essays

There are a total of three tracks, providing more criteria for understanding rhetorical devices in elementary and middle school essays.

Organizers and Contact Persons

  • Organizers: Nuowei Liu, Xinhao Chen, Yupei Ren, Man Lan, Xiaopeng Bai, Yuanbin Wu (East China Normal University), Shaoguang Mao, Yan Xia (Microsoft Research Asia)
  • Task Contact: Nuowei Liu (East China Normal University, nwliu@stu.ecnu.edu.cn)

Task Awards

The organizers of the task will provide a prize of 10,000 RMB to the winning team.

Task Website

https://github.com/cubenlp/CERRU

Task 7: Chinese Essay Fluency Evaluation

Task Overview

The Chinese Essay Fluency Evaluation (CEFE) task aims to identify and correct errors that affect fluency in essays. Current work typically treats fluency evaluation as a standalone natural language processing task, lacking systematic integration from multiple levels and perspectives. Unlike errors based on rule-generated data or interlanguage data from Chinese learners, as well as grammatical errors in spoken or written language by other native speakers, errors in essays by elementary and middle school students are more diverse and involve more complex grammar knowledge. Therefore, we systematically define fine-grained error types that affect the fluency of essays from lexical, syntactic, semantic, and other perspectives, and provide correction suggestions. Compared to last year, to comprehensively evaluate the fluency of essays, this year's evaluation task adds an essay fluency rating task and includes 1,200 additional sentences in the training set. This evaluation task is designed with the following three tracks:

  • Identification of Types of Grammatically Incorrect Sentences in Elementary and Middle School Essays: Recognize different types of grammatically incorrect sentences in essays.
  • Rewriting Grammatically Incorrect Sentences in Elementary and Middle School Essays: Rewrite grammatically incorrect sentences in essays to make them correct.
  • Fluency Rating of Elementary and Middle School Essays: Evaluate the fluency level of essays.

Organizers and Contact Persons

  • Organizers: Xinlin Zhuang, Xinshu Shen, Hongyi Wu, Yupei Ren, Xiaopeng Bai, Man Lan, Yuanbin Wu(East China Normal University), Shaoguang Mao, Yan Xia, Tao Ge (Microsoft Research Asia)
  • Task Contact: Xinlin Zhuang (East China Normal University, zhuangxinlin2022@163.com)

Task Awards

The task organizers will provide a prize of 10,000 RMB to the winning team.

Task Website

https://github.com/cubenlp/2024CCL_CEFE

Task 8: Evaluation on Commonsense Reasoning and Moral Understanding in Children's Stories (CRMU)

Task Overview

The Evaluation on Commonsense Reasoning and Moral Understanding in Children's Stories (CRMU) aims to evaluate Chinese pretrained language models and large-scale language models from multiple perspectives in commonsense reasoning and moral understanding. This evaluation includes the following two subtasks:

  • Commonsense Reasoning: Given a story and a commonsense question, select the correct answer from the provided candidates.
  • Moral Matching: Based on a given story, select the most appropriate moral from multiple candidate morals.

The data used in this evaluation is collected from classic fables gathered from websites. The questions and options for the commonsense reasoning task are manually annotated and involve types of common knowledge including temporal, spatial, biological, physical, and social knowledge. For the moral matching task, questions and options are provided in a combination of manually annotated and automatically generated manners.

Organizers and Contact Persons

  • Organizers: Hongye Tan, Ru Li, Hu Zhang (Shanxi University); Kui Yu (Hefei University of Technology)
  • Task Leader: Yaxin Guo (Ph.D. student, Shanxi University, 202112407002@email.sxu.edu.cn)
  • Task Contact: Guohang Yan (Master's student, Shanxi University, yanguohang@qq.com)

Task Website

https://github.com/SXU-YaxinGuo/CRMU

Task 9: Chinese Vision-Language Understanding Evaluation

Task Overview

Chinese Vision-Language Understanding Evaluation (CVLUE) aims to comprehensively evaluate the modeling and understanding capabilities of Chinese multimodal pretrained models in tasks such as Image-Text Retrieval, Visual Question Answering, Visual Grounding, and Visual Dialog. This evaluation includes the following five subtasks:

  • Image Retrieval: Retrieve the corresponding image from a set of candidates based on a given text description.
  • Text Retrieval: Retrieve the corresponding text description from a set of candidates based on a given image.
  • Visual Question Answering: Answer questions based on a given image using phrases.
  • Visual Grounding: Identify entities in an image corresponding to a given image and text description.
  • Visual Dialog: Select the most appropriate response text from a set of candidates based on a given image and dialog history.

Organizers and Contact Persons

  • Organizers: Zhiguo Wan, Yuxuan Wang (Zhejiang Lab); Wanxiang Che (Harbin Institute of Technology)
  • Task contact: Yijun Liu (Harbin Institute of Technology, yijunliu@ir.hit.edu.cn)

Task Website

https://github.com/WangYuxuan93/CVLUE

Task 10: Quality Evaluation of Sign Language Avatars Translation (QESLAT)

Task Overview

With the advancement of technology, Sign Language Avatars have become an important tool for facilitating communication between the deaf community and society. Sign Language Avatars provide real-time translation services for the deaf by simulating sign language gestures, helping to break down language barriers and enhance the social participation of the deaf community. To ensure that Sign Language Avatars can provide accurate, natural, and easily understandable sign language translations, it is crucial to evaluate their translation quality. This evaluation aims to assess the naturalness and accuracy of Sign Language Avatars translating Chinese into Chinese Sign Language, ensuring that they conform to sign language grammar rules and are understandable and acceptable to the deaf community.

This evaluation is under the guidance of the Chinese Association of the Deaf's Sign Language Research and Promotion Committee. The evaluation team, consisting of deaf individuals and professional sign language interpreters certified by the Chinese Association of the Deaf's Sign Language Research and Promotion Committee, will manually evaluate the translation results of the Sign Language Avatars. The evaluation will focus on the accuracy of sign language grammar, the naturalness and readability of expression, and whether it meets the understanding of the deaf community. It will also consider factors such as the clarity of gestures, fluency, and semantic consistency with the original Chinese text. Specifically, the evaluation will include the following four criteria:

  • Sign Language Grammar Accuracy: Whether the Sign Language Avatar follows the word order rules of Chinese Sign Language, the accuracy of gestures, and whether grammar markers are correctly expressed in the translation.
  • Naturalness: Evaluating the coherence of gestures, whether the translation conforms to the daily expression habits of the deaf community, and whether non-verbal elements such as facial expressions, body posture, and spatial layout are naturally integrated into the translation.
  • Readability: Evaluating the performance of the Sign Language Avatar in terms of clarity, consistency, and adaptability.
  • Cultural Adaptability: Whether cultural differences and social context adaptability are considered in the translation, and whether the emotional nuances of the original text are accurately conveyed.

Organizers and Contact Persons

  • Organizers: Dengfeng Yao (Beijing Union University/Tsinghua University, tjtdengfeng@buu.edu.cn), Guowei Yang (Henan University of Economics and Law/Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing), Peng Jin (Leshan Normal University Special Education Language Intelligence Sichuan Provincial Key Laboratory of Philosophy and Social Sciences), Yidong Chen (Xiamen University), Cong Xu (Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing/China National Center for the Promotion of Sign Language by Huaxia Publishing House), Haixu Wang (Qinghai Radio and Television Station/China Braille and Sign Language Research and Application Center), Bin Chen (Zhuzhou Voice of Hand Information Technology Co., Ltd.), Li Quan Wu (Shenzhen Information Accessibility Research Association), Gang Shen (Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing), Huaming Chen (Sign Language Research and Promotion Committee, China Association of the Deaf and Hard of Hearing), Chunda Liu (Beijing Sign Language Research Association), Yanli Ding (Beijing Union University National Language and Text Promotion Base), Ke Hu (Beijing Union University), Lan Chen (Shenzhen Link Accessibility Co.,Ltd.), and Tiantian Yuan (Tianjin University of Technology).
  • Task Contact: Yuan Zhao (Master's student at Beijing Union University, 1398396428@qq.com)

Task Website

https://github.com/ann-yuan/QESLAT-2024

Overall Schedule for Technical Evaluation

  • Deadline for task solicitation: January 31, 2024
  • Task online release time: February 4, 2024
  • Overall evaluation end time: May 31, 2024
    • Task organizers are required to determine and publish the scores and rankings of participating teams before this deadline
  • Submission of technical reports in Chinese or English: May 31, 2024
    • For the task organizers to understand the methods of the participating teams, and the technical report is also one of the considerations for the awards. Failure to submit a technical report will result in disqualification from the awards.
  • Feedback on Chinese or English technical reports: June 5, 2024
    • Task organizers will conduct initial evaluations of the technical reports and provide feedback.
  • Formal submission of Chinese and English evaluation papers: June 10, 2024
    • Including the Overview paper written by the task organizers, excellent Chinese and English technical reports recommended by the task organizers (please improve them according to the suggestions of the task organizers before submission), and entering double-blind review.
  • Announcement of winners: June 15, 2024
  • Notification of paper acceptance: June 25, 2024 (Technical reports are an important consideration for awards, but not necessarily accepted for publication)
  • Camera-ready paper submission: July 1, 2024
    • Accepted papers will be included in the ACL/CCL Anthology
  • CCL 2024 Evaluation Workshop: July 25-28, 2024
    • Task organizers will present an Overview report, present awards, and moderate sessions. Winning teams will present technical reports.

Please contact the task organizers or evaluation chairs if you have any questions.

CCL 2024 Evaluation Chairs:

Hongfei Lin, Dalian University of Technology

Bin Li, Nanjing Normal University

Hongye Tan, Shanxi University

February 5, 2024