PoT Evaluation code #1

Coobiw · 2024-07-22T09:15:43Z

Hi, thanks for your great work! Will you release your PoT evaluation code or share some details about it on ChartQA test split? I want to reproduce this result. Thanks for your advice and reply!

Coobiw · 2024-07-22T18:25:06Z

After my own DIY as following code, chartgemma reaches 74.64 on ChartQA(64.0 human + 85.28 aug). Is there any question on my implementation? Thanks for your advice and reply!

        def execute_python_code(code):
            old_stdout = sys.stdout
            new_stdout = io.StringIO()
            sys.stdout = new_stdout

            status = True
            try:
                exec(code)
            except Exception as e:
                status = False
            finally:
                sys.stdout = old_stdout

            if status:
                output = new_stdout.getvalue()
            else:
                output = None
            return output, status
        response, status = execute_python_code(response)
        if status:
            answer = response
            print(answer)
        else:
            answer = ""
            print("error running...")

AhmedMasryKU · 2024-07-23T00:32:35Z

Hi @Coobiw

I am cleaning the remaining codebase and will try to release it when I get some time. However, here are some ideas that we used to optimize the performance on the validation set before running the model on the testing set:

Use the following prompt: "program of thought:" + question.
After executing the code and getting the output text, change "True" and "False" to "Yes" and "No".
Clean the output string from unnecessary characters (\n, ')

Also, we used the following implementation of the evaluation metric: https://github.com/vis-nlp/UniChart/blob/bd6004bc8fe9ef8ce9a6cdfd88712f845d78b918/model/chartqa_model.py#L36

Coobiw · 2024-07-23T05:13:02Z

using the following code, it can reach 76.56(67.84 human + 85.28 aug):

        answer = answer.replace("True","Yes").replace("False","No")
        answer = answer.strip()

Clean the output string from unnecessary characters (\n, ')
Is strip() enough to do this?

AhmedMasryKU · 2024-07-24T03:37:48Z

I will clean up and share the code that you can use to reproduce the results by this weekend. Sorry, I am a bit busy today and tomorrow.

Coobiw · 2024-07-24T03:59:41Z

OK! Thanks for your help!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoT Evaluation code #1

PoT Evaluation code #1

Coobiw commented Jul 22, 2024

Coobiw commented Jul 22, 2024 •

edited

Loading

AhmedMasryKU commented Jul 23, 2024

Coobiw commented Jul 23, 2024

AhmedMasryKU commented Jul 24, 2024

Coobiw commented Jul 24, 2024

PoT Evaluation code #1

PoT Evaluation code #1

Comments

Coobiw commented Jul 22, 2024

Coobiw commented Jul 22, 2024 • edited Loading

AhmedMasryKU commented Jul 23, 2024

Coobiw commented Jul 23, 2024

AhmedMasryKU commented Jul 24, 2024

Coobiw commented Jul 24, 2024

Coobiw commented Jul 22, 2024 •

edited

Loading