-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoT Evaluation code #1
Comments
After my own DIY as following code, chartgemma reaches def execute_python_code(code):
old_stdout = sys.stdout
new_stdout = io.StringIO()
sys.stdout = new_stdout
status = True
try:
exec(code)
except Exception as e:
status = False
finally:
sys.stdout = old_stdout
if status:
output = new_stdout.getvalue()
else:
output = None
return output, status
response, status = execute_python_code(response)
if status:
answer = response
print(answer)
else:
answer = ""
print("error running...") |
Hi @Coobiw I am cleaning the remaining codebase and will try to release it when I get some time. However, here are some ideas that we used to optimize the performance on the validation set before running the model on the testing set:
Also, we used the following implementation of the evaluation metric: https://github.com/vis-nlp/UniChart/blob/bd6004bc8fe9ef8ce9a6cdfd88712f845d78b918/model/chartqa_model.py#L36 |
using the following code, it can reach answer = answer.replace("True","Yes").replace("False","No")
answer = answer.strip()
|
I will clean up and share the code that you can use to reproduce the results by this weekend. Sorry, I am a bit busy today and tomorrow. |
OK! Thanks for your help!! |
Hi, thanks for your great work! Will you release your PoT evaluation code or share some details about it on ChartQA test split? I want to reproduce this result. Thanks for your advice and reply!
The text was updated successfully, but these errors were encountered: