Multi_SummEval

About

While automatic summarization evaluation methods developed for English are routinely applied to other languages, this is the first attempt to systematically quantify their panlinguistic efficacy. We take a summarization corpus for eight different languages (EN, ID, FR, TR, ZH, RU, DE, ES), and manually annotate generated summaries for focus (precision) and coverage (recall).

Paper

Fajri Koto, Jey Han Lau, and Timothy Baldwin. Evaluating the Efficacy of Summarization Evaluation across Languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

Focus and Coverage

We examine content-based summarization evaluation from the aspects of precision and recall, in the form of focus and coverage to compare system-generated summaries to groundtruth summaries.

MTurk

We use the customized direct assessment method for annotation.
We use Amazon Mechanical Turk for annotation. You can find the MTurk user interface at mturk/html.
Jupyter notebooks number 0-3 are used to pre- and post-process the MTurk annotation
In this repository, we only provide annotation process for ID, FR, TR, ZH, RU, DE, ES. Annotation process for EN will be released seperately because the data is from the FFCI paper.

Data (Annotation Result)

You can find all annotation result in folder resulting_data.
The provided scores are the normalized z-score.

Traditional metrics (ROUGE, METEOR, BLEU)

You can use jupyter notebooks number 5-6 to compute traditional metrics and its Pearson and Spearman correlations.
Please note that for ZH and RU you need to convert all character/word to its latin form by using jupyter notebook 5. transform_RU_and_ZH.ipynb.

BERTScore and MoverScore

We already provide all of the output of BERTScore and MoverScore in folder bert_score and mover_score, respectively
You can use jupyter notebook number 7-8 to compute its Pearson and Spearman correlations.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
bert_score		bert_score
mover_score		mover_score
mturk		mturk
resulting_data		resulting_data
0. create_samples.ipynb		0. create_samples.ipynb
1. create_images.ipynb		1. create_images.ipynb
2. post_annotation_check_HIT.ipynb		2. post_annotation_check_HIT.ipynb
3. post_annotation_generate_clean.ipynb		3. post_annotation_generate_clean.ipynb
4. convert_samples_to_pred_gold.ipynb		4. convert_samples_to_pred_gold.ipynb
5. compute_traditional_metrics.ipynb		5. compute_traditional_metrics.ipynb
5. transform_RU_and_ZH.ipynb		5. transform_RU_and_ZH.ipynb
6. correlation_of_traditional_metrics.ipynb		6. correlation_of_traditional_metrics.ipynb
6. correlation_of_traditional_metrics_combined.ipynb		6. correlation_of_traditional_metrics_combined.ipynb
7. bert_scores_monolingual_layer_selection.ipynb		7. bert_scores_monolingual_layer_selection.ipynb
7. bert_scores_multilingual_layer_selection.ipynb		7. bert_scores_multilingual_layer_selection.ipynb
7. correlation_of_bert_scores_combined.ipynb		7. correlation_of_bert_scores_combined.ipynb
8. correlation_of_mover_scores.ipynb		8. correlation_of_mover_scores.ipynb
8. correlation_of_mover_scores_combined.ipynb		8. correlation_of_mover_scores_combined.ipynb
README.md		README.md
times.ttf		times.ttf
txt2img.py		txt2img.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi_SummEval

About

Paper

Focus and Coverage

MTurk

Data (Annotation Result)

Traditional metrics (ROUGE, METEOR, BLEU)

BERTScore and MoverScore

About

Releases

Packages

Languages

fajri91/Multi_SummEval

Folders and files

Latest commit

History

Repository files navigation

Multi_SummEval

About

Paper

Focus and Coverage

MTurk

Data (Annotation Result)

Traditional metrics (ROUGE, METEOR, BLEU)

BERTScore and MoverScore

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages