Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update typecheck err msg #32880

Merged
merged 14 commits into from
Nov 4, 2024
Merged

Update typecheck err msg #32880

merged 14 commits into from
Nov 4, 2024

Conversation

hjtran
Copy link
Contributor

@hjtran hjtran commented Oct 19, 2024

The error message for applying a pcollection of incorrect type to a PTransform is really long and difficult to parse. For example, with this pipeline:

with beam.Pipeline() as p:                                                         
                                                                                   
  def square_root(x: float):                                                       
    return math.sqrt(x)                                                            
                                                                                   
  p | beam.Create(['foo', 'bar']) | beam.Map(square_root)  

The error message is:

output: Traceback (most recent call last):
File "/opt/playground/backend/executable_files/7322872d-f8c2-4c0a-8ae7-b8f19386bc87/7322872d-f8c2-4c0a-8ae7-b8f19386bc87.py", line 48, in <module>
  p | beam.Create(['foo', 'bar']) | beam.Map(square_root)   
File "/usr/local/lib/python3.10/site-packages/apache_beam/pvalue.py", line 138, in __or__
  return self.pipeline.apply(ptransform, self)
File "/usr/local/lib/python3.10/site-packages/apache_beam/pipeline.py", line 746, in apply
  transform.type_check_inputs(pvalueish)
File "/usr/local/lib/python3.10/site-packages/apache_beam/transforms/ptransform.py", line 949, in type_check_inputs
  raise TypeCheckError(
apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'Map(square_root)': requires <class 'float'> but got <class 'str'> for x
Full type hint:
IOTypeHints[inputs=((<class 'float'>,), {}), outputs=((Any,), {})]
strip_iterable()

based on:
IOTypeHints[inputs=((<class 'float'>,), {}), outputs=((Iterable[Any],), {})]
File "/opt/playground/backend/executable_files/7322872d-f8c2-4c0a-8ae7-b8f19386bc87/7322872d-f8c2-4c0a-8ae7-b8f19386bc87.py", line 48, in <module>
    p | beam.Create(['foo', 'bar']) | beam.Map(square_root)
File "/usr/local/lib/python3.10/site-packages/apache_beam/transforms/core.py", line 2083, in Map
    wrapper = with_output_types(
File "/usr/local/lib/python3.10/site-packages/apache_beam/typehints/decorators.py", line 861, in annotate_output_types
    f._type_hints = th.with_output_types(return_type_hint)  # pylint: disable=protected-access

based on:
  IOTypeHints[inputs=((<class 'float'>,), {}), outputs=None]
  File "/opt/playground/backend/executable_files/7322872d-f8c2-4c0a-8ae7-b8f19386bc87/7322872d-f8c2-4c0a-8ae7-b8f19386bc87.py", line 48, in <module>
      p | beam.Create(['foo', 'bar']) | beam.Map(square_root)
  File "/usr/local/lib/python3.10/site-packages/apache_beam/transforms/core.py", line 2078, in Map
      wrapper = with_input_types(
  File "/usr/local/lib/python3.10/site-packages/apache_beam/typehints/decorators.py", line 774, in annotate_input_types
      th = getattr(f, '_type_hints', IOTypeHints.empty()).with_input_types(

This error message is really long and the extra "based on..." context (in my experience and the users I've talked to's experience) hasn't been helpful in figuring out the issue.

When there's a type hint issue, I want to know:

  • Which transform is causing the issue
  • What input type it expects
  • What input type it's getting instead
  • What transform is producing that input type.

I've simplified the error message to answer those questions a bit more directly, so now the above example produces the following error message:

apache_beam.typehints.decorators.TypeCheckError: The transform 'Map(square_root)' requires PCollections of type '<class 'float'>' but was applied to a PCollection of type '<class 'str'>' (produced by the transform 'Create').

I've only applied this change to type hint violations for the main input, not for side inputs which I've left alone (though I think the "based on..." messages probably aren't useful there either)

@hjtran hjtran marked this pull request as ready for review October 29, 2024 20:31
@hjtran
Copy link
Contributor Author

hjtran commented Oct 29, 2024

There are still some failing tests that are just checking the error string.

Before spending more time fixing them, I'd like to see what people think of this error message first.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@hjtran
Copy link
Contributor Author

hjtran commented Oct 29, 2024

assign set of reviewers

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @shunping for label python.
R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@hjtran
Copy link
Contributor Author

hjtran commented Oct 31, 2024

Bump @shunping @chamikaramj , could I get your quick feedback on just the error message here (no need to look at the code yet)? If it looks good, I'll fix up the rest of the tests and wait for another round of approval

@shunping
Copy link
Contributor

shunping commented Oct 31, 2024

@hjtran, I totally agree with you. I myself encountered a similar situation recently, and I think it is a great idea to simplify the error message and give cx a better experience. Thanks for contributing!

+ @jrmccluskey, who is our expert of python typehint, for any more advice.

Comment on lines +941 to +954
arg_hints = iter(hints.items())
element_arg, element_hint = next(arg_hints)
if not typehints.is_consistent_with(
bindings.get(element_arg, typehints.Any), element_hint):
transform_nest_level = self.label.count("/")
split_producer_label = pvalueish.producer.full_label.split("/")
producer_label = "/".join(
split_producer_label[:transform_nest_level + 1])
raise TypeCheckError(
f"The transform '{self.label}' requires "
f"PCollections of type '{element_hint}' "
f"but was applied to a PCollection of type"
f" '{bindings[element_arg]}' "
f"(produced by the transform '{producer_label}'). ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this type check is done up here rather than modifying the check nested in the for loop? It looks kind of redundant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is specifically a check for the main element while the loop is checking presumably side inputs.

@@ -26,7 +26,7 @@ requires = [
# Avoid https://github.com/pypa/virtualenv/issues/2006
"distlib==0.3.7",
# Numpy headers
"numpy>=1.14.3,<2.2.0", # Update setup.py as well.
"numpy>=1.14.3,<1.27", # Update setup.py as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change (I'm assuming this was unintentional)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, yeah I didn't change this intentionally. Maybe weird git operation artifact. Reverting

@jrmccluskey
Copy link
Contributor

I definitely like the new form factor for the error, that's much cleaner. I've gotten decently good at parsing the current messages since I've been in and around the typehinting code a lot, but making it more clear for users is a big win (and may also make my life easier when I do more typehinting updates.)

Comment on lines -2131 to -2138

expected_msg = \
"Type hint violation for 'CombinePerKey': " \
"requires Tuple[TypeVariable[K], Union[<class 'float'>, <class 'int'>, " \
"<class 'numpy.float64'>, <class 'numpy.int64'>]] " \
"but got Tuple[None, <class 'str'>] for element"

self.assertStartswith(e.exception.args[0], expected_msg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and one of the other messages got too hard/frustrating to update to use with regex, so I just did some basic spot checking. The number of cases that the new asserts wouldn't catch but the old would I think are pretty few.

@hjtran
Copy link
Contributor Author

hjtran commented Nov 4, 2024

Bump @jrmccluskey
(note, I'm away from computer for the month of November after today)

Copy link
Contributor

@jrmccluskey jrmccluskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@jrmccluskey jrmccluskey merged commit 76c5d56 into apache:master Nov 4, 2024
91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants