Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49565][SQL] Improve auto-generated expression aliases with pipe SQL operators #49245

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Dec 19, 2024

What changes were proposed in this pull request?

This RP improves auto-generated expression aliases with pipe SQL operators.

For example, consider the pipe SQL syntax query:

table t
|> extend 1

Previously, the analyzed plan was:

Project [x#x, y#x, pipeexpression(1, false, EXTEND) AS pipeexpression(1)#x]
+- SubqueryAlias spark_catalog.default.t
   +- Relation spark_catalog.default.t[x#x,y#x] csv

After this PR, it is:

Project [x#x, y#x, pipeexpression(1, false, EXTEND) AS 1#x]
+- SubqueryAlias spark_catalog.default.t
   +- Relation spark_catalog.default.t[x#x,y#x] csv

Note that the output aliases visible in the resulting DataFrame for the query derive from the AS <alias> part of the analyzed plans shown.

Why are the changes needed?

This improves the user experience with pipe SQL syntax.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

Existing golden file tests update to show the improved aliases.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Dec 19, 2024
@dtenedor dtenedor marked this pull request as ready for review December 19, 2024 18:42
@dtenedor
Copy link
Contributor Author

cc @cloud-fan @gengliangwang

@cloud-fan
Copy link
Contributor

Shall we make AliasResolution to properly support PipeExpression?

@dtenedor
Copy link
Contributor Author

dtenedor commented Dec 20, 2024

Shall we make AliasResolution to properly support PipeExpression?

@cloud-fan I tried this, but the PipeExpression needs to check whether its child expression tree contains any aggregate functions, and this needs its child tree to be fully resolved first (which may not have happened yet by the time we run the ResolveAliases rule).

@cloud-fan
Copy link
Contributor

@dtenedor after a few iterations, ResolveAliases should see PipeExpression with its child resolved, right? The thing is that we should have consistent auto alias generation behavior between normal and pipe SQL syntaxes, e.g.

      case ne: NamedExpression => ne
      case go @ GeneratorOuter(g: Generator) if g.resolved => MultiAlias(go, Nil)
      case e if !e.resolved => u
      case g: Generator => MultiAlias(g, Nil)
      case c @ Cast(ne: NamedExpression, _, _, _) => Alias(c, ne.name)()
      case e: ExtractValue if extractOnly(e) => Alias(e, toPrettySQL(e))()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants