Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add current_date and current_timestamp functions #548

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

richtia
Copy link
Member

@richtia richtia commented Aug 29, 2023

PR to add current_date and current_timestamp functions'

@richtia richtia marked this pull request as ready for review August 29, 2023 21:56
EpsilonPrime
EpsilonPrime previously approved these changes Aug 29, 2023
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions for clarification.

Comment on lines 696 to 698
The `mode` option determines how often the date is evaluated. STREAMING mode
evaluates the current date for each row. BATCH mode evaluates the current date
once when the query starts and uses the same result for every row.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the mode is BATCH, and the function is used multiple times within a single query, should it have the same value for both invocations? In other words, imagine the query is:

SELECT CURRENT_DATE():mode=BATCH AS a, CURRENT_DATE():mode=BATCH AS b FROM ...

From the description I think that every row of a should have the same value. Also, every row of b should have the same value. However, must the values in a be identical to the values in b?

Copy link
Member Author

@richtia richtia Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, a would not always be identical to b. Each CURRENT_DATE() will get evaluated once separately and then those will be used...so if there is a date/time change between each evaluation they'll be different.

Do you have a suggestion for how I should update the description?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my confusion is around the words "when the query starts". I interpret that to be the same point in time for all function calls within a query. So, if we don't want to guarantee that, then I'd suggest removing those words.

BATCH mode evaluates the current date one and uses the same result for every row.

Note that many current engines do guarantee that a is always equal to b and they specifically use the "when the query starts" wording to do so.

Postgres:

It is important to know that CURRENT_TIMESTAMP and related functions return the start time of the current transaction; their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the "current" time, so that multiple modifications within the same transaction bear the same time stamp.

Presto:

Returns the current timestamp as of the start of the query.

Spark:

Returns the current timestamp at the start of query evaluation as a TimestampType column. All calls of current_timestamp within the same query return the same value.

Note that, if the query is distributed, then this becomes a rather tricky thing to guarantee.

My preference would probably be to state that we do guarantee it to be the same:

BATCH mode evaluates the current date once and uses the same result for every call to this function in the query.

If an engine is distributed then either it has to solve this problem somehow (e.g. by evaluating once at the start of the query and sending that down to the workers) or reject BATCH mode.

Copy link
Member Author

@richtia richtia Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details from the other backends! I was mainly basing my thought around this (probably very outdated) example on sql server: https://stackoverflow.com/questions/4056355/selecting-getdate-function-twice-in-a-select-list-same-value-for-both

I'm good with guaranteeing the same results though. Just updated!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace let me know if the updated descriptions are good with you

extensions/functions_datetime.yaml Outdated Show resolved Hide resolved
-
name: "current_date"
description: >-
Return the current date.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it obvious that this intended to be in UTC?

Unlike timestamp, the date doesn't have enough information to do time zone conversion, so a parameter or separate function would be needed to be able to support both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options I see are using the locale the server was set up with and UTC. So perhaps we need an option. That could also apply to current_timestamp (either to return a timestamp with the timezone of current or UTC).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively if all of the engines just return UTC then we should require that.

@jacques-n
Copy link
Contributor

I made some comments in the last sync. Basically, I don't think the "batch" mode of these is actually a function. It would be what I would call a contextual function. The function needs to evaluate and is determinant given the same context values. Context values are things like the time when the query ran, the timezone of the user, the currently selected database, etc. These context functions could be a new unique expression type or could be an extended version of what we talked before for variables in prepared statements (since they are basically pointers to external variables).

For the "STREAM" case, I think this is fine. However, I think we should probably use a different name to avoid confusion from people coming from the SQL world, where there is always the "BATCH" behavior.

@jacques-n jacques-n added awaiting-user-input This issue is waiting on further input from users and removed awaiting SMC approval labels Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-input This issue is waiting on further input from users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants