-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Opaque context in expression evaluation #75
Open
TennyZhuang
wants to merge
2
commits into
main
Choose a base branch
from
expr-opaque-context
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+161
−0
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
--- | ||
feature: expr-opaque-context | ||
authors: | ||
- "TennyZhuang" | ||
start_date: "2023/09/06" | ||
--- | ||
|
||
# RFC: Opaque context in expression evaluation | ||
|
||
## Motivation | ||
|
||
### Why we want to introduce a context? | ||
|
||
When implementing some util functions in pg, we found that they heavily depends on some context: | ||
|
||
* `current_database`/`current_user`/`pg_postmaster_start_time`/...: depends on session variables | ||
* `pg_is_in_recovery`: depends on session states | ||
* `pg_get_userbyid`/`obj_description`/...: depends on catalog cache | ||
* `pg_cancel_backend`: depends on meta client | ||
|
||
Currently, we implement them all in binder, and only literals can be argument to these function calls. Some complicated queries can't be supported: | ||
|
||
```sql | ||
SELECT | ||
idx.schemaname, | ||
idx.tablename, | ||
idx.indexname, | ||
obj_description( | ||
format('%s.%s', | ||
quote_ident(idx.schemaname), | ||
quote_ident(idx.indexname) | ||
)::regclass | ||
) AS comment | ||
FROM pg_indexes AS idx; | ||
``` | ||
|
||
It is acceptable for this query to be executed only in local execution mode. | ||
|
||
### Why it should be an opaque context? | ||
|
||
In brief, we don't want to introduce the cyclic dependencies between `frontend` and `expr`. | ||
|
||
When we want to introduce an extra context argument to `Expression::eval` method, what should the signature be? | ||
|
||
```rust | ||
async fn eval(&self, ctx: ?, input: &DataChunk) -> Result<ArrayRef>; | ||
``` | ||
|
||
If we define a concrete FrontendContext here, it should be like this: | ||
|
||
```rust | ||
async fn eval(&self, ctx: Option<FrontendContext>, input: &DataChunk) -> Result<ArrayRef>; | ||
``` | ||
|
||
There are several drawbacks: | ||
|
||
1. **The `expr` crate depends on the `frontend` crate.** | ||
2. It's hard for us to provide more context from other crates. e.g. the backend may provide a part of context. | ||
3. Override some context variables is a little hard. We have to modify the variable, and don't forget to recover it when exiting context. | ||
|
||
The key point is that the call stack is in the ABA form: `frontend::run_query` -> `executor::exec` -> `expr::eval` -> `frontend::current_schema` | ||
|
||
And the `context` should be opaque for `executor` and `expr`, but concrate for `frontend`. | ||
|
||
## Design | ||
|
||
We'll utilize the [`provide_any`](https://github.com/nrc/provide-any) crate, although it's rejected by std. | ||
|
||
In `expr`: | ||
|
||
```rust | ||
pub trait Context: Provider {} | ||
// We can also provide some utilities such as `ChainedContext` here. | ||
pub trait Expression { | ||
fn eval(&self, ctx: &dyn Context>, chunk: &DataChunk); | ||
// ... | ||
} | ||
``` | ||
|
||
In `frontend`: | ||
|
||
```rust | ||
pub struct FrontendContext { | ||
meta_client: MetaClient, | ||
session_variables: SessionVariables, | ||
catalog: Catalog, | ||
tz: TimeZone, | ||
} | ||
impl Provider for FrontendContext { | ||
fn provide<'a>(&'a self, demand: &mut Demand<'a>) { | ||
demand | ||
.provide_ref::<MetaClient>(&self.meta_client) | ||
.provide_ref::<SessionVariables>(&self.session_variables) | ||
.provide_ref::<Catalog>(&self.catalog) | ||
.provide_value::<Tz>(self.tz); | ||
} | ||
} | ||
impl Context for FrontendContext {} | ||
``` | ||
|
||
In `frontend/expr`: | ||
|
||
```rust | ||
struct PgGetUserByIdExpression; | ||
// TODO: we can simplify the codes by another #[function] macro. | ||
impl expr::Expression for PgGetUserByIdExpression { | ||
fn eval(&self, ctx: &dyn Context>, chunk: &DataChunk) { | ||
let Ok(catalog) = request_ref::<Catalog>(ctx) else { | ||
return Err("Not supported in the current context"); | ||
}; | ||
// retrieve the ids from chunk. | ||
Ok(res) | ||
} | ||
} | ||
// TODO: We should also consider how to construct the expression without depending on frontend. | ||
``` | ||
|
||
In `executor`: | ||
|
||
```rust | ||
pub struct BackendContext { | ||
tz: TimeZone, | ||
} | ||
impl Provider for BackendContext { | ||
fn provide<'a>(&'a self, demand: &mut Demand<'a>) { | ||
// We can only provide a subset of context. | ||
demand | ||
.provide_value::<Tz>(self.tz); | ||
} | ||
} | ||
``` | ||
|
||
We can also dispatch some different logics based on the context: | ||
|
||
```rust | ||
pub trait Reporter {} | ||
// In frontend | ||
pub struct NoticeReporter; | ||
impl Reporter for NoticeReporter {} | ||
demand.provide_ref<dyn Reporter>(&self.notice_reporter); | ||
// In backend | ||
pub struct ErrorTableReporter; | ||
impl Reporter for ErrorTableReporter {} | ||
demand.provide_ref<dyn Reporter>(&self.error_table_reporter); | ||
``` | ||
|
||
## Related work | ||
|
||
The pattern was widely used in golang. `context` is a part of std lib. | ||
|
||
There are also a hook [`useContext`](https://react.dev/reference/react/useContext) in react, which is excatly the same pattern as the RFC proposed. | ||
|
||
## Alternatives | ||
|
||
### task_local | ||
|
||
tokio's [`task_local` macro](https://docs.rs/tokio/latest/tokio/macro.task_local.html) can achieve the same result even simpler, and the only cost is the extensive use of global variables. | ||
|
||
## Further questions | ||
|
||
Can the pattern be used for contexts in other module? |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task_local
is not that "global" as the calling stack or the scope of evaluating an expression tree is quite clear. I would vote for this approach as it introduces minimal changes to the codebase.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this approach. Adding
context
to theeval
interface will make things more complicated and prevent easy use for most of the pure functions. Every time we calleval
, we have to find out a context to pass in...It's annoying to let developers pass context everywhere themselves. I once tried that in madsim, but soon gave it up and made the context global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can even pass a () as context here. I don’t think it’s a heavy problem.
The key concept is pass it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also add a eval_with_empty_context.
It’s really simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, all
eval
needs a context and I'd prefer pass it explicitly instead of the one from an unknown scope.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not good. "The lifetime of the context will be longer".
I think evaluating expression doesn't happen everywhere. For example, in backend, it should only happen in a few executors: Project and Join (for inner join condition). Passing an explicit context helps to make it less error pruning.
On the other hand, if a developer does need to
eval
in some strange place,get_context
will returnNone
and the evaluation should act in default behavior (such as withUTC+0
timezone). The missing context (likeContext::empty()
) always warns him about this behavior, as long as he takes the risk, he can do it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, from my understanding, @TennyZhuang's point to introduce
provide
instead of using a simplestruct ExprContext
is he wants to make it more flexible. For example, ifBackendContext
exists, then theeval()
function can get it; otherwise, it getsNone
. So if you are sure that no context is needed somewhere (I don't know what can be the case... but just saying), it's possible to use an empty context.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary? We can pass the opaque context into the constructor function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does happen everywhere in unit tests. And we have to provide an empty context for each of them. However, I think providing a
eval_with_default_context
method is acceptable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well. Indeed. 😄
If it costs too many changes to replace
eval
toeval_with_default_context
, scoped task_local might look a little bit better. Either way LGTM.By the way, it's worth mentioning that using
provide
or not is orthogonal to whether to put context in function call arguments ortask_local
: you may also put theContext: Provider
as a task-local variable.