-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate JSON Schema to YAML regex #1022
base: main
Are you sure you want to change the base?
Conversation
Thank you for the PR! It looks like many tests related to these changes are failing… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this! Thanks so much for implementing.
Could we harden this a bit with some test cases? I've written some test cases that could help
I have already copied some tests from Given that you're refactoring some of the code on your branch, what is the easiest way to go forward to minimize the amount of duplication / deprecation of code? |
I've refactored Please let me know if you have any other questions. |
@patricebechard is there anything I can do to help with this? |
sorry, was quite busy lately, but I can work on it this week, will let you know if I need help with anything |
No worries at all, thanks for your continued work! |
I was finally able to make some changes including support for indentation. Some caveats:
I am also making sure that we support both quoted and unquoted strings for YAML. Since one of the main advantages of using YAML for guided generation is that there are less tokens, if we add a double quote every time there is a string, we end up with a generation which is almost as big as the one obtained in JSON, so transitioning to YAML would not make sense. |
We may want to smoke test qualitative generation performance when using YAML. Out of scope for this PR, but disallowing quotes may, in some cases, confuse the model.
This is fine for now. Thanks for getting this working! Please let me know if this PR is ready for review. |
Not exactly sure how the coverage is computed here. It says the coverage for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting json_schema.py
types fix.
Good work integrating the additional yaml tests. I'd like to smoke test a bit before we merge as well to ensure there aren't some edge cases we're missing.
) | ||
DATE = r'("(?:\d{4})-(?:0[1-9]|1[0-2])-(?:0[1-9]|[1-2][0-9]|3[0-1])"|\'(?:\d{4})-(?:0[1-9]|1[0-2])-(?:0[1-9]|[1-2][0-9]|3[0-1])\'|(?:\d{4})-(?:0[1-9]|1[0-2])-(?:0[1-9]|[1-2][0-9]|3[0-1]))' | ||
TIME = r'("(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z)?"|\'(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z)?\'|(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z)?)' | ||
UUID = r'("[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}"|\'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\'|[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this only be in yaml_schema.py
? It allows for single-quote and no-quote datetime, date, time, and UUID.
This is a tentative implementation of a regex generator for arbitrary YAML given a JSON Schema. This PR relates to #923 .
There are still some issues:
@rlouf @lapp0