Code For "Dialog Summarization for Software Collaborative Platform via Tuning Pre-trained Models"

Ⅰ Preprocess

For Github data.

python ./preprocess/process_github_discussions.py

For Gitter data.

/disentangle/disentanglement/tools/preprocessing/

Ⅱ Dialog Disentanglement

Gitter dialogs Disentanglement (method 1)

python disentangle.py gitter
--train src/disentanglement/proposed_dataset/original_format/*/*.annotation.txt
--hidden 512
--layers 2
--nonlin softsign
--word-vectors src/disentanglement/data/glove-ubuntu.txt
--epochs 5
--dynet-autobatch
--drop 0
--learning-rate 0.018804
--learning-decay-rate 0.103
--seed 10
--clip 3.740
--weight-decay 1e-07
--opt sgd
--max-dist 51

python disentangle.py gitter
--model ./gitter.dy_9.model
--test ./data/gitter/ethereum/welcome/content.annotation.txt
--test-start 0
--hidden 512
--layers 2
--nonlin softsign
--word-vectors src/disentanglement/data/glove-ubuntu.txt
--max-dist 51

For GitterAnalysis (method 2)

csv_construct.py 
data_cleaning.py 
merge.tab.py 
thread_identification.py 
evaluation.py`

Ⅲ Tuning Github

python github_finetune_t5_title.py \
--visible_gpu 1 \
--max_source_length 256 \
--max_target_length 20 \
--log_name log/log \
--do_train \
--do_eval \
--do_test \
--train_batch_size 10 \
--train_batch_size 10 \
--eval_batch_size 10 \

python github_prompt_t5_title.py \
--visible_gpu 1 \
--max_source_length 256 \
--max_target_length 20 \
--log_name log/log \
--do_train \
--do_eval \
--do_test \
--train_batch_size 10 \
--train_batch_size 10 \
--eval_batch_size 10 \

python github_finetune_gpt2_title.py \
--model_name_or_path gpt2 \
--model_name gpt2 \
--do_train \
--do_eval \
--do_predict \
--train_file data/github/train.csv \
--validation_file  data/github/valid.csv \
--test_file data/github/test.csv \
--source_prefix "summarize: " \
--output_dir ./output_dir \
--overwrite_output_dir \
--per_device_train_batch_size=10 \
--per_device_eval_batch_size=10 \
--predict_with_generate \
--eval_steps=50 \
--logging_steps=50 \
--num_train_epochs=20 \
--learning_rate=1e-4 \
--max_source_length=256 \
--generation_max_length=276 \
--text_column src \
--summary_column tgt \
--evaluation_strategy epoch \
--save_strategy epoch \
--load_best_model_at_end True \

Ⅳ Tuning Gitter

python gitter_prompt_t5_title.py \
--visible_gpu 1 \
--max_source_length 256 \
--max_target_length 20 \
--log_name log/log \
--do_train \
--do_eval \
--do_test \
--train_batch_size 10 \
--train_batch_size 10 \
--eval_batch_size 10 \

python gitter_finetune_t5_title.py \
--visible_gpu 1 \
--max_source_length 256 \
--max_target_length 20 \
--log_name log/log \
--do_train \
--do_eval \
--do_test \
--train_batch_size 10 \
--train_batch_size 10 \
--eval_batch_size 10 \

Ⅴ Calculate the metrics such as bleu rouge

python eval.py

Ⅵ Annotation Tool

/tools/dialog_GUI/view.py

VII Requirements

pytorch 1.12.0
openprompt 1.0.1
tokenizers 0.12.1
nltk 3.7
numpy 1.22.3
nlgeval

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
IR		IR
RQ2		RQ2
RQ3		RQ3
data		data
disentangle		disentangle
metric		metric
preprocess		preprocess
statistic_data		statistic_data
tools/dialog_GUI		tools/dialog_GUI
README.md		README.md
bleu.py		bleu.py
eval.py		eval.py
github_finetune_gpt2_title.py		github_finetune_gpt2_title.py
github_finetune_t5_title.py		github_finetune_t5_title.py
github_prompt_gpt2_title.py		github_prompt_gpt2_title.py
github_prompt_t5_title.py		github_prompt_t5_title.py
gitter_finetune_t5_title.py		gitter_finetune_t5_title.py
gitter_prompt_t5_title.py		gitter_prompt_t5_title.py
gitter_transfer_inference.py		gitter_transfer_inference.py
inference.py		inference.py
my_lib.py		my_lib.py
test_api.py		test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code For "Dialog Summarization for Software Collaborative Platform via Tuning Pre-trained Models"

Ⅰ Preprocess

Ⅱ Dialog Disentanglement

Ⅲ Tuning Github

Ⅳ Tuning Gitter

Ⅴ Calculate the metrics such as bleu rouge

Ⅵ Annotation Tool

VII Requirements

About

Releases

Packages

Languages

GuodongFan/ADSum

Folders and files

Latest commit

History

Repository files navigation

Code For "Dialog Summarization for Software Collaborative Platform via Tuning Pre-trained Models"

Ⅰ Preprocess

Ⅱ Dialog Disentanglement

Ⅲ Tuning Github

Ⅳ Tuning Gitter

Ⅴ Calculate the metrics such as bleu rouge

Ⅵ Annotation Tool

VII Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages