Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error messages #63

Merged
merged 34 commits into from
Sep 1, 2023
Merged

Better error messages #63

merged 34 commits into from
Sep 1, 2023

Conversation

brentyi
Copy link
Owner

@brentyi brentyi commented Aug 25, 2023

This PR takes an initial step in the direction of #62. I expect there's more work to be done; I'd also like to add tests before merging. @vwxyzjn if you have any thoughts or suggestions I'd also be interested!

(1)
We previously relied on argparse for most of the error messages reported by tyro. We continue to do this, but the formatting now better emphasize the actual error.

Before:
image

After:
image

(2)
When an invalid value is passed in, we show the argument's documentation + helptext.

Before:
image

After:
image

(3)
When an argument that doesn't exist is passed in, we now highlight similar arguments.

Before:
image

After:
image

(4)
Finally, when subcommands are used, similar arguments are coupled with the subcommands that they can be found in:

image

@codecov
Copy link

codecov bot commented Aug 25, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.16% 🎉

Comparison is base (5340498) 98.96% compared to head (4e71903) 99.12%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #63      +/-   ##
==========================================
+ Coverage   98.96%   99.12%   +0.16%     
==========================================
  Files          23       23              
  Lines        1829     1953     +124     
==========================================
+ Hits         1810     1936     +126     
+ Misses         19       17       -2     
Flag Coverage Δ
unittests 99.12% <100.00%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
tyro/_fields.py 99.66% <ø> (ø)
tyro/_parsers.py 98.66% <ø> (ø)
tyro/_argparse_formatter.py 97.02% <100.00%> (+2.78%) ⬆️
tyro/_arguments.py 100.00% <100.00%> (ø)
tyro/_calling.py 100.00% <100.00%> (ø)
tyro/_cli.py 100.00% <100.00%> (ø)
tyro/_instantiators.py 98.72% <100.00%> (+<0.01%) ⬆️
tyro/_strings.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brentyi
Copy link
Owner Author

brentyi commented Aug 26, 2023

(5)

Improvements when subcommands are present.

For example, when we pass in an argument in the wrong location:

# Incorrect
python 03_multiple_subcommands.py dataset:mnist --dataset.binary True optimizer:sgd

# Correct
python 03_multiple_subcommands.py dataset:mnist optimizer:sgd --dataset.binary True

The old error would be completely irrelevant:
image

Now:
image


This has consequences for a lot of things. For example, in nerfstudio --data is a valid argument but --dataset is not.

Previously, if you ran ns-train nerfacto --dataset /some_path the error would be super cryptic:
image

Now, we get:
image

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Aug 26, 2023

Love it! Maybe in the error message, also give out the docs for the Similar arguments:, so the user would not necessarily need to do python my.py --help.

image

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Aug 27, 2023

Maybe another suggestion is that the Parsing error should appear before the usage section, since the Parsing error is more informative and directly helpful.

image

@brentyi
Copy link
Owner Author

brentyi commented Aug 27, 2023

Makes sense! I added the help messages, so as an example if you write --model.features-per-level instead of --pipeline.model.features-per-level you now get:

image

On the ordering, do you feel strongly about this? To me it's slightly more intuitive to have the most relevant (error) message on the bottom, since it reduces the likelihood that the user needs to scroll in order to see it. Kind of like a stack trace; it's also closer to the original argparse error format.

In any case I'm going to let this PR stew for a few days before merging + releasing; I'd like to spend more time thinking about edge cases + test cases.

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Aug 27, 2023

Nice! Did you push the related changes? I am still getting the same result.

ip install git+https://github.com/brentyi/tyro.git@brent/better_errors
Collecting git+https://github.com/brentyi/tyro.git@brent/better_errors
  Cloning https://github.com/brentyi/tyro.git (to revision brent/better_errors) to /tmp/pip-req-build-a_nma7k_
  Running command git clone --filter=blob:none --quiet https://github.com/brentyi/tyro.git /tmp/pip-req-build-a_nma7k_
  Running command git checkout -b brent/better_errors --track origin/brent/better_errors
  Switched to a new branch 'brent/better_errors'
  Branch 'brent/better_errors' set up to track remote branch 'brent/better_errors' from 'origin'.
  Resolved https://github.com/brentyi/tyro.git to commit ef6e733d301c8f8cd0609d986cab59e4012905c5
image

@brentyi
Copy link
Owner Author

brentyi commented Aug 28, 2023

Does it work with your original flag, --rewar.flag? The currently similarity metric is just based on difflib and fairly naive, so --reward.track and --track aren't considered similar enough. I can add some heuristics for catching cases like this before merging.

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Aug 30, 2023

Btw really good work! I am loving the changes.

On the ordering, do you feel strongly about this? To me it's slightly more intuitive to have the most relevant (error) message on the bottom, since it reduces the likelihood that the user needs to scroll in order to see it. Kind of like a stack trace; it's also closer to the original argparse error format.

I do like the fact that the error shows up in the bottom, but at the same time most of the messages do not feel useful. For example, I have the following with the latest PR. It's not until the end that I see the most useful message.

Maybe as an alternative you can say "for usage info please run python xx.py --help` and just show the parse error message?

 python lm_human_preference_details/train_both_accelerate.py --track
2023-08-30 17:10:35.136557: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
usage: train_both_accelerate.py [-h] [--exp-name STR] [--seed INT]
                                [--reward.exp-name STR]
                                [--reward.seed INT]
                                [--reward.track | --reward.no-track]
                                [--reward.wandb-project-name STR]
                                [--reward.wandb-entity {None}|STR]
                                [--reward.cuda | --reward.no-cuda]
                                [--reward.base-model STR]
                                [--reward.label-dataset STR]
                                [--reward.local-batch-size INT]
                                [--reward.gradient-accumulation-steps INT]
                                [--reward.lr FLOAT] [--reward.eps FLOAT]
                                [--reward.rollout-batch-size INT]
                                [--reward.local-normalize-samples INT]
                                [--reward.debug-normalize INT]
                                [--reward.normalize-before | 
--reward.no-normalize-before]
                                [--reward.normalize-after | 
--reward.no-normalize-after]
                                [--reward.print-sample-output-freq INT]
                                [--reward.save-path STR]
                                [--reward.use-tensorflow-adam | 
--reward.no-use-tensorflow-adam]
                                [--reward.task.query-length INT]
                                [--reward.task.query-dataset STR]
                                [--reward.task.query-prefix STR]
                                [--reward.task.query-suffix STR]
                                [--reward.task.start-text {None}|STR]
                                [--reward.task.end-text {None}|STR]
                                [--reward.task.response-length INT]
                                [--reward.task.temperature FLOAT]
                                [--reward.labels.type STR]
                                [--reward.labels.num-train INT]
                                [--reward.labels.num-labels INT]
                                [--reward.labels.source STR]
                                [--policy.exp-name STR]
                                [--policy.seed INT]
                                [--policy.track | --policy.no-track]
                                [--policy.wandb-project-name STR]
                                [--policy.wandb-entity {None}|STR]
                                [--policy.cuda | --policy.no-cuda]
                                [--policy.base-model STR]
                                [--policy.print-sample-output-freq INT]
                                [--policy.save-path STR]
                                [--policy.use-tensorflow-adam | 
--policy.no-use-tensorflow-adam]
                                [--policy.task.query-length INT]
                                [--policy.task.query-dataset STR]
                                [--policy.task.query-prefix STR]
                                [--policy.task.query-suffix STR]
                                [--policy.task.start-text {None}|STR]
                                [--policy.task.end-text {None}|STR]
                                [--policy.task.response-length INT]
                                [--policy.task.truncate-token INT]
                                [--policy.task.truncate-after INT]
                                [--policy.task.penalty-reward-value INT]
                                [--policy.task.temperature FLOAT]
                                [--policy.rewards.kl-coef FLOAT]
                                [--policy.rewards.trained-model 
{None}|STR]
                                [--policy.ppo.total-episodes INT]
                                [--policy.ppo.local-batch-size INT]
                                [--policy.ppo.gradient-accumulation-steps 
INT]
                                [--policy.ppo.nminibatches INT]
                                [--policy.ppo.noptepochs INT]
                                [--policy.ppo.lr FLOAT]
                                [--policy.ppo.eps FLOAT]
                                [--policy.ppo.vf-coef FLOAT]
                                [--policy.ppo.cliprange FLOAT]
                                [--policy.ppo.cliprange-value FLOAT]
                                [--policy.ppo.gamma FLOAT]
                                [--policy.ppo.lam FLOAT]
                                [--policy.ppo.whiten-rewards | 
--policy.ppo.no-whiten-rewards]
                                [{policy.rewards.adaptive-kl:adaptive-kl-p
arams,policy.rewards.adaptive-kl:None}]

╭─ Parsing error ──────────────────────────────────────────────────────────╮
│ Unrecognized arguments: --track                                          │
│ ──────────────────────────────────────────────────────────────────────── │
│ Similar arguments:                                                       │
│     --policy.track, --policy.no-track                                    │
│         if toggled, this experiment will be tracked with Weights and     │
│         Biases (default: False)                                          │
│             in train_both_accelerate.py --help                           │
│     --reward.track, --reward.no-track                                    │
│         if toggled, this experiment will be tracked with Weights and     │
│         Biases (default: False)                                          │
│             in train_both_accelerate.py --help                           │
╰──────────────────────────────────────────────────────────────────────────╯

@brentyi
Copy link
Owner Author

brentyi commented Aug 31, 2023

Ok! We now won't print usage messages that are 400 characters or longer:

image

@brentyi brentyi merged commit 83592d6 into main Sep 1, 2023
10 checks passed
@brentyi brentyi deleted the brent/better_errors branch September 1, 2023 07:43
@vwxyzjn
Copy link
Contributor

vwxyzjn commented Sep 3, 2023

Ok! We now won't print usage messages that are 400 characters or longer:

image

Nice! FYI absl does this:

poetry run python falcon.py 
FATAL Flags parsing error:
  flag --dataset_name=None: Flag --dataset_name must have a value other than None.
  flag --ckpt_path=None: Flag --ckpt_path must have a value other than None.
Pass --helpshort or --helpfull to see help on flags.

Maybe the Pass --helpshort or --helpfull to see help on flags message could be something to consider as well. Also possibly this can be made configurable as well (e.g., output_help_text_on_error=True).

@brentyi
Copy link
Owner Author

brentyi commented Sep 4, 2023

Interesting! To clarify, are you proposing both:

  • To introduce distinct --helpshort and --helpfull flags;
  • To include help messages like Flag --dataset_name must have a value other than None.?

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Sep 4, 2023

To include help messages like Flag --dataset_name must have a value other than None.?

This. Currently

from dataclasses import asdict, dataclass, field
import tyro
@dataclass
class Args:
    exp_name: str
    seed: int = 1
    track: bool = False

args = tyro.cli(Args)

gives

image

but when it's long, it's hardly readable. E.g.,

Screenshot 2023-09-04 at 9 57 26 AM

Ok! We now won't print usage messages that are 400 characters or longer:

I am also proposing to make this configurable so that the user can choose not print usage messages that are N characters or longer. Personally I would set N=0 because I could just type python teest.py --help if I want the help text — when the command fails, I am only interested in the relevant errors.

Thanks for being patient with my feature requests!

@brentyi brentyi mentioned this pull request Sep 4, 2023
@brentyi
Copy link
Owner Author

brentyi commented Sep 4, 2023

Just made a PR with these suggestions, thanks!

I am also proposing to make this configurable so that the user can choose not print usage messages that are N characters or longer. Personally I would set N=0 because I could just type python teest.py --help if I want the help text — when the command fails, I am only interested in the relevant errors.

For now I just (did the equivalent of) setting N=0, for things like this (especially while I still feel like we're iterating) I'd prefer to err on the side of being prescriptive. It's hard to remove configurable parameters once they've been added. 🙂

Thanks for being patient with my feature requests!

I appreciate your help with making the CLI experience better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants