Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve (Expression|expr|Expr|ExprRef) UX & ergonomics #3616

Open
mattijn opened this issue Sep 27, 2024 · 10 comments
Open

Improve (Expression|expr|Expr|ExprRef) UX & ergonomics #3616

mattijn opened this issue Sep 27, 2024 · 10 comments

Comments

@mattijn
Copy link
Contributor

mattijn commented Sep 27, 2024

What is your suggestion?

As requested by @dangotbanned here: #3614 (review). #3614 uses an expression as a str with plain Vega Expression syntax. It would be nice to use the equivalent python syntax instead.

Have you considered any alternative solutions?

No response

@dangotbanned
Copy link
Member

Thanks for opening this @mattijn

The reason I wanted an issue was to investigate why the definitions marked as # BUG were not working.

Code block from review
import altair as alt
from vega_datasets import data

source = data.barley()

base = alt.Chart(source).encode(
    x=alt.X("sum(yield):Q").stack("zero"),
    y=alt.Y("site:O").sort("-x"),
    text=alt.Text("sum(yield):Q", format=".0f"),
)

# NOTE: Original `str` expressions
# tooltip = alt.expr("luminance(scale('color', datum.sum_yield))")
# color = alt.expr("luminance(scale('color', datum.sum_yield)) > 0.5 ? 'black' : 'white'")

luminance = alt.expr.luminance(alt.expr.scale("color", alt.datum.sum_yield))

# BUG: These definitions alone *should* work, but don't evaluate as expressions
tooltip = luminance
color = alt.expr.if_(luminance > 0.5, "black", "white")

# HACK: Actual additional steps required
tooltip = alt.expr(repr(luminance))
color = alt.expr(repr(color))

bars = base.mark_bar(tooltip=tooltip).encode(color="sum(yield):Q")
text = base.mark_text(align="right", dx=-3, color=color)

bars + text

I'm now seeing that this is the distinction between Expr (str) and ExprRef ({"expr": str}).

I think handling this implicitly could be error prone.


A simple alternative could be adding Expression.to_expr:

Diff altair/expr/core.py

diff --git a/altair/expr/core.py b/altair/expr/core.py
index e7872ada..095385b4 100644
--- a/altair/expr/core.py
+++ b/altair/expr/core.py
@@ -168,21 +168,24 @@ class OperatorMixin:
 class Expression(OperatorMixin, SchemaBase):
     """
     Expression.
 
     Base object for enabling build-up of Javascript expressions using
     a Python syntax. Calling ``repr(obj)`` will return a Javascript
     representation of the object and the operations it encodes.
     """

     _schema = {"type": "string"}

     def to_dict(self, *args, **kwargs):
         return repr(self)

+    def to_expr(self):
+        return {"expr": repr(self)}
+
     def __setattr__(self, attr, val) -> None:
         # We don't need the setattr magic defined in SchemaBase
         return object.__setattr__(self, attr, val)

     # item access
     def __getitem__(self, val):
         return GetItemExpression(self, val)

This would then allow the following syntax:

import altair as alt
from vega_datasets import data

source = data.barley()

base = alt.Chart(source).encode(
    x=alt.X("sum(yield):Q").stack("zero"),
    y=alt.Y("site:O").sort("-x"),
    text=alt.Text("sum(yield):Q", format=".0f"),
)

# NOTE: Original `str` expressions
# tooltip = alt.expr("luminance(scale('color', datum.sum_yield))")
# color = alt.expr("luminance(scale('color', datum.sum_yield)) > 0.5 ? 'black' : 'white'")

luminance = alt.expr.luminance(alt.expr.scale("color", alt.datum.sum_yield))

# NOTE: More ergonomic? # <-------------------------------------------------
tooltip = luminance.to_expr()
color = alt.expr.if_(luminance > 0.5, "black", "white").to_expr()

bars = base.mark_bar(tooltip=tooltip).encode(color="sum(yield):Q")
text = base.mark_text(align="right", dx=-3, color=color)

bars + text

It would only be needed as the last operation, so it makes sense to be appearing as the final method call.

We could instead use (expr|ExprRef), this was just the quick and easy solution I saw

@mattijn
Copy link
Contributor Author

mattijn commented Sep 27, 2024

We have had .ref(), but that was not necessary anymore at one moment, probably because we solved it implicitly. I think there is also .expr. I used it once in this example. I wish we are clever enough to know when it should be an expression as string or an expression as reference. It's near impossible to know what should be used as an Altair user without using debug tools such as the Vega editor.

@dangotbanned
Copy link
Member

We have had .ref(), but that was not necessary anymore at one moment, probably because we solved it implicitly. I think there is also .expr. I used it once in this example. I wish we are clever enough to know when it should be an expression as string or an expression as reference. It's near impossible to know what should be used as an Altair user without using debug tools such as the Vega editor.

Do you mean Parameter.ref() @mattijn?

class Parameter(_expr_core.OperatorMixin):
"""A Parameter object."""
_schema: t.ClassVar[_TypeMap[Literal["object"]]] = {"type": "object"}
_counter: int = 0
@classmethod
def _get_name(cls) -> str:
cls._counter += 1
return f"param_{cls._counter}"
def __init__(
self,
name: str | None = None,
empty: Optional[bool] = Undefined,
param: Optional[
VariableParameter | TopLevelSelectionParameter | SelectionParameter
] = Undefined,
param_type: Optional[Literal["variable", "selection"]] = Undefined,
) -> None:
if name is None:
name = self._get_name()
self.name = name
self.empty = empty
self.param = param
self.param_type = param_type
@utils.deprecated(
version="5.0.0",
alternative="to_dict",
message="No need to call '.ref()' anymore.",
)
def ref(self) -> dict[str, Any]:
"""'ref' is deprecated. No need to call '.ref()' anymore."""
return self.to_dict()

My suggestion was for Expression (and subclasses), but not OperatorMixin (which Parameter derives)

@mattijn
Copy link
Contributor Author

mattijn commented Sep 27, 2024

Ah yeah, that was for parameter. To speak for myself, I was happy that we could deprecate that one, since I was not able to explain when and when not to use it. The same feeling I have when we introduce an expression.to_expr()..🤔

@dangotbanned
Copy link
Member

Ah yeah, that was for parameter. To speak for myself, I was happy that we could deprecate that one, since I was not able to explain when and when not to use it. The same feeling I have when we introduce an expression.to_expr()..🤔

@mattijn Well that part would be solved by communicating this in annotations 😃

It would also be quite similar to the recent changes with conditions in #3567

api.py

class _ConditionExtra(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
# Likely a Field predicate
empty: Optional[bool]
param: Parameter | str
test: _TestPredicateType
value: Any
__extra_items__: _StatementType | OneOrSeq[_LiteralValue]
_Condition: TypeAlias = _ConditionExtra
"""
A singular, *possibly* non-chainable condition produced by ``.when()``.
The default **permissive** representation.
Allows arbitrary additional keys that *may* be present in a `Conditional Field`_
but not a `Conditional Value`_.
.. _Conditional Field:
https://vega.github.io/vega-lite/docs/condition.html#field
.. _Conditional Value:
https://vega.github.io/vega-lite/docs/condition.html#value
"""
class _ConditionClosed(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
# Parameter {"param", "value", "empty"}
# Predicate {"test", "value"}
empty: Optional[bool]
param: Parameter | str
test: _TestPredicateType
value: Any
_Conditions: TypeAlias = t.List[_ConditionClosed]
"""
Chainable conditions produced by ``.when()`` and ``Then.when()``.
All must be a `Conditional Value`_.
.. _Conditional Value:
https://vega.github.io/vega-lite/docs/condition.html#value
"""
_C = TypeVar("_C", _Conditions, _Condition)
class _Conditional(TypedDict, t.Generic[_C], total=False):
"""
A dictionary representation of a conditional encoding or property.
Parameters
----------
condition
One or more (predicate, statement) pairs which each form a condition.
value
An optional default value, used when no predicates were met.
"""
condition: Required[_C]
value: Any
IntoCondition: TypeAlias = Union[ConditionLike, _Conditional[Any]]
"""
Anything that can be converted into a conditional encoding or property.
Notes
-----
Represents all outputs from `when-then-otherwise` conditions, which are not ``SchemaBase`` types.
"""
class _Value(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
value: Required[Any]
__extra_items__: Any

A lot of this is about identifying either a dict with a key "condition" or an object with an attribute object.condition.

The same idea would apply for this case, but switching to "expr" or object.expr.

Since we know precisely what every schema will permit; I could look into how we can represent this scenario as part of SchemaInfo.to_type_repr()?

def to_type_repr( # noqa: C901
self,
*,
as_str: bool = True,
target: TargetType = "doc",
use_concrete: bool = False,
use_undefined: bool = False,
) -> str | list[str]:
"""
Return the python type representation of ``SchemaInfo``.
Includes `altair` classes, standard `python` types, etc.
Parameters
----------
as_str
Return as a string.
Should only be ``False`` during internal recursive calls.
target: {"annotation", "doc"}
Where the representation will be used.
use_concrete
Avoid base classes/wrappers that don't provide type info.
use_undefined
Wrap the result in ``altair.typing.Optional``.
"""

@mattijn
Copy link
Contributor Author

mattijn commented Sep 27, 2024

It would be quite an achievement if you manage, but you might be right that it is actually possible.

@dangotbanned
Copy link
Member

dangotbanned commented Sep 28, 2024

@mattijn I'll keep this comment updated with all the context I can dig up.

Related

tools.schemapi.utils.title_to_type_reprs

Had another look and I see this TODO (line 555) I added in (#3536) has some relation

tp_param: set[str] = {"ExprRef", "ParameterExtent"}
# In these cases, a `VariableParameter` is also always accepted.
# It could be difficult to differentiate `(Variable|Selection)Parameter`, with typing.
# TODO: A solution could be defining `Parameter` as generic over either `param` or `param_type`.
# - Rewriting the init logic to not use an `Undefined` default.
# - Any narrowing logic could be factored-out into `is_(selection|variable)_parameter` guards.
EXCLUDE_TITLE: set[str] = tp_param | {"RelativeBandSize"}
"""
`RelativeBandSize` excluded as it has a single property `band`,
but all instances also accept `float`.
"""
REMAP_TITLE = SchemaInfo._remap_title
title: str = self.title
tps: set[str] = set()
if not use_concrete:
tps.add("SchemaBase")
# NOTE: To keep type hints simple, we annotate with `SchemaBase` for all subclasses.
if title in tp_param:
tps.add("Parameter")

Relevant section of vega-lite-schema.json

https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/schema/vega-lite-schema.json#L8808-L8823

{
  "definitions": {
    "Expr": {
      "type": "string"
    },
    "ExprRef": {
      "additionalProperties": false,
      "properties": {
        "expr": {
          "description": "Vega expression (which can refer to Vega-Lite parameters).",
          "type": "string"
        }
      },
      "required": [
        "expr"
      ],
      "type": "object"
    }
  }
}

Most of altair.expr.core.py

altair/altair/expr/core.py

Lines 31 to 240 in cabf1e6

def _js_repr(val) -> str:
"""Return a javascript-safe string representation of val."""
if val is True:
return "true"
elif val is False:
return "false"
elif val is None:
return "null"
elif isinstance(val, OperatorMixin):
return val._to_expr()
else:
return repr(val)
# Designed to work with Expression and VariableParameter
class OperatorMixin:
def _to_expr(self) -> str:
return repr(self)
def _from_expr(self, expr) -> Any:
return expr
def __add__(self, other):
comp_value = BinaryExpression("+", self, other)
return self._from_expr(comp_value)
def __radd__(self, other):
comp_value = BinaryExpression("+", other, self)
return self._from_expr(comp_value)
def __sub__(self, other):
comp_value = BinaryExpression("-", self, other)
return self._from_expr(comp_value)
def __rsub__(self, other):
comp_value = BinaryExpression("-", other, self)
return self._from_expr(comp_value)
def __mul__(self, other):
comp_value = BinaryExpression("*", self, other)
return self._from_expr(comp_value)
def __rmul__(self, other):
comp_value = BinaryExpression("*", other, self)
return self._from_expr(comp_value)
def __truediv__(self, other):
comp_value = BinaryExpression("/", self, other)
return self._from_expr(comp_value)
def __rtruediv__(self, other):
comp_value = BinaryExpression("/", other, self)
return self._from_expr(comp_value)
__div__ = __truediv__
__rdiv__ = __rtruediv__
def __mod__(self, other):
comp_value = BinaryExpression("%", self, other)
return self._from_expr(comp_value)
def __rmod__(self, other):
comp_value = BinaryExpression("%", other, self)
return self._from_expr(comp_value)
def __pow__(self, other):
# "**" Javascript operator is not supported in all browsers
comp_value = FunctionExpression("pow", (self, other))
return self._from_expr(comp_value)
def __rpow__(self, other):
# "**" Javascript operator is not supported in all browsers
comp_value = FunctionExpression("pow", (other, self))
return self._from_expr(comp_value)
def __neg__(self):
comp_value = UnaryExpression("-", self)
return self._from_expr(comp_value)
def __pos__(self):
comp_value = UnaryExpression("+", self)
return self._from_expr(comp_value)
# comparison operators
def __eq__(self, other):
comp_value = BinaryExpression("===", self, other)
return self._from_expr(comp_value)
def __ne__(self, other):
comp_value = BinaryExpression("!==", self, other)
return self._from_expr(comp_value)
def __gt__(self, other):
comp_value = BinaryExpression(">", self, other)
return self._from_expr(comp_value)
def __lt__(self, other):
comp_value = BinaryExpression("<", self, other)
return self._from_expr(comp_value)
def __ge__(self, other):
comp_value = BinaryExpression(">=", self, other)
return self._from_expr(comp_value)
def __le__(self, other):
comp_value = BinaryExpression("<=", self, other)
return self._from_expr(comp_value)
def __abs__(self):
comp_value = FunctionExpression("abs", (self,))
return self._from_expr(comp_value)
# logical operators
def __and__(self, other):
comp_value = BinaryExpression("&&", self, other)
return self._from_expr(comp_value)
def __rand__(self, other):
comp_value = BinaryExpression("&&", other, self)
return self._from_expr(comp_value)
def __or__(self, other):
comp_value = BinaryExpression("||", self, other)
return self._from_expr(comp_value)
def __ror__(self, other):
comp_value = BinaryExpression("||", other, self)
return self._from_expr(comp_value)
def __invert__(self):
comp_value = UnaryExpression("!", self)
return self._from_expr(comp_value)
class Expression(OperatorMixin, SchemaBase):
"""
Expression.
Base object for enabling build-up of Javascript expressions using
a Python syntax. Calling ``repr(obj)`` will return a Javascript
representation of the object and the operations it encodes.
"""
_schema = {"type": "string"}
def to_dict(self, *args, **kwargs):
return repr(self)
def __setattr__(self, attr, val) -> None:
# We don't need the setattr magic defined in SchemaBase
return object.__setattr__(self, attr, val)
# item access
def __getitem__(self, val):
return GetItemExpression(self, val)
class UnaryExpression(Expression):
def __init__(self, op, val) -> None:
super().__init__(op=op, val=val)
def __repr__(self):
return f"({self.op}{_js_repr(self.val)})"
class BinaryExpression(Expression):
def __init__(self, op, lhs, rhs) -> None:
super().__init__(op=op, lhs=lhs, rhs=rhs)
def __repr__(self):
return f"({_js_repr(self.lhs)} {self.op} {_js_repr(self.rhs)})"
class FunctionExpression(Expression):
def __init__(self, name, args) -> None:
super().__init__(name=name, args=args)
def __repr__(self):
args = ",".join(_js_repr(arg) for arg in self.args)
return f"{self.name}({args})"
class ConstExpression(Expression):
def __init__(self, name) -> None:
super().__init__(name=name)
def __repr__(self) -> str:
return str(self.name)
class GetAttrExpression(Expression):
def __init__(self, group, name) -> None:
super().__init__(group=group, name=name)
def __repr__(self):
return f"{self.group}.{self.name}"
class GetItemExpression(Expression):
def __init__(self, group, name) -> None:
super().__init__(group=group, name=name)
def __repr__(self) -> str:
return f"{self.group}[{self.name!r}]"
IntoExpression: TypeAlias = Union[bool, None, str, float, OperatorMixin, Dict[str, Any]]

Some inconsistency in v5.api.param

Highlighting this since it is the last step for the preferred functional wrappers

def param(
name: str | None = None,
value: Optional[Any] = Undefined,
bind: Optional[Binding] = Undefined,
empty: Optional[bool] = Undefined,
expr: Optional[str | Expr | Expression] = Undefined,

core.VariableParameter.expr exists

parameter.param = core.VariableParameter(
name=parameter.name,
bind=bind,
value=value,
expr=expr,
**kwds,
)

core.TopLevelSelectionParameter.expr doesn't exist

parameter.param = core.TopLevelSelectionParameter(
name=parameter.name, bind=bind, value=value, expr=expr, **kwds
)

core.SelectionParameter.expr doesn't exist

parameter.param = core.SelectionParameter(
name=parameter.name, bind=bind, value=value, expr=expr, **kwds
)

Question

Does this mean a defining trait of a VariableParameter is that it wraps an expression?

Usage

Issues/PRs

Summary

  • We use Parameter as the annotation for ExprRef.
  • We do not annotate Expr in generated code
    • It is just an alias for str
  • Expression (and subclasses) represent Expr
  • expr return types (as of feat: Generate expr method signatures, docs #3600 (comment))
    • alt.expr(...) -> ExprRef
    • alt.expr.method(...) -> Expression
    • alt.expr.property -> Expression

@dangotbanned
Copy link
Member

@mattijn would you be okay if we renamed this issue?

Modifying the code used in #3614 example would be a benefit.
However, I'm thinking more along the lines of #3563 (comment) to provide a more ergonomic alternative.

My proposal of Expression.to_expr() is an alternative, but based on #3616 (comment) I'd like to explore if we could benefit from a larger change

@mattijn
Copy link
Contributor Author

mattijn commented Sep 28, 2024

Sure, issue can be renamed!

@dangotbanned dangotbanned changed the title adopt python syntax for expressions in the bar chart with label overlay example Improve (Expression|expr|Expr|ExprRef) UX & ergonomics Sep 28, 2024
@dangotbanned
Copy link
Member

dangotbanned commented Oct 1, 2024

I've started exploring some of #3616 (comment)

What I found really surprised me.

Testing

I made this small change, which simply adds Expr as an annotation when used in a schema.
This is to try to answer @mattijn's point in #3616 (comment) about knowing when to use a string

diff tools/schemapi/utils.py
diff --git a/tools/schemapi/utils.py b/tools/schemapi/utils.py
index 5c4a84f9..f3706ba9 100644
--- a/tools/schemapi/utils.py
+++ b/tools/schemapi/utils.py
@@ -568,6 +568,8 @@ class SchemaInfo:
             # NOTE: To keep type hints simple, we annotate with `SchemaBase` for all subclasses.
             if title in tp_param:
                 tps.add("Parameter")
+            if title == "Expr":
+                tps.add(title)
         elif self.is_value():
             value = self.properties["value"]
             t = value.to_type_repr(target="annotation", use_concrete=use_concrete)

Running schema generation resulted in only 3 changes?

diff altair/vegalite/v5/schema/core.py

diff --git a/altair/vegalite/v5/schema/core.py b/altair/vegalite/v5/schema/core.py
index 623b27ee..1ff7c660 100644
--- a/altair/vegalite/v5/schema/core.py
+++ b/altair/vegalite/v5/schema/core.py
@@ -21921,7 +21921,9 @@ class DerivedStream(Stream):
         between: Optional[Sequence[SchemaBase | Map]] = Undefined,
         consume: Optional[bool] = Undefined,
         debounce: Optional[float] = Undefined,
-        filter: Optional[str | SchemaBase | Sequence[str | SchemaBase]] = Undefined,
+        filter: Optional[
+            str | Expr | SchemaBase | Sequence[str | Expr | SchemaBase]
+        ] = Undefined,
         markname: Optional[str] = Undefined,
         marktype: Optional[SchemaBase | MarkType_T] = Undefined,
         throttle: Optional[float] = Undefined,
@@ -21981,7 +21983,9 @@ class MergedStream(Stream):
         between: Optional[Sequence[SchemaBase | Map]] = Undefined,
         consume: Optional[bool] = Undefined,
         debounce: Optional[float] = Undefined,
-        filter: Optional[str | SchemaBase | Sequence[str | SchemaBase]] = Undefined,
+        filter: Optional[
+            str | Expr | SchemaBase | Sequence[str | Expr | SchemaBase]
+        ] = Undefined,
         markname: Optional[str] = Undefined,
         marktype: Optional[SchemaBase | MarkType_T] = Undefined,
         throttle: Optional[float] = Undefined,
@@ -26715,7 +26719,7 @@ class VariableParameter(TopLevelParameter):
         self,
         name: Optional[str | SchemaBase] = Undefined,
         bind: Optional[SchemaBase | Map] = Undefined,
-        expr: Optional[str | SchemaBase] = Undefined,
+        expr: Optional[str | Expr | SchemaBase] = Undefined,
         react: Optional[bool] = Undefined,
         value: Optional[Any] = Undefined,
         **kwds,

Making the same change, but for ExprRef results in 5335 changes

Screenshot

image

Conclusion

I can't help but think that Expression should be abstracting ExprRef instead of Expr.

This would mean these methods no longer both produce str

import altair as alt
from altair.expr.core import FunctionExpression

example: FunctionExpression = alt.expr.random()

>>> example.to_dict() == example._to_expr()
True

Here OperatorMixin.to_dict() returns a str, but could return {"expr": "..."} if it subclassed ExprRef instead.

It would still make sense for OperatorMixin._to_expr() to return a str - and that maps more closely to Expr.


We'd then have the same type returned from all of:

  • expr(...)
  • expr.method(...)
  • expr.property

I think this is conceptually quite clean and easy to explain.
expr is a factory to produce expr objects (expressions).

Since that would account for 5335 annotations, we could use expr in them directly.
That would also solve minor typing issue in expr.__new__

@override
def __new__(cls: type[_ExprRef], expr: str) -> _ExprRef: # type: ignore[misc]
# NOTE: `mypy<=1.10.1` is not consistent with typing spec
# https://github.com/python/mypy/issues/1020
# https://docs.python.org/3/reference/datamodel.html#object.__new__
# https://typing.readthedocs.io/en/latest/spec/constructors.html#new-method
return _ExprRef(expr=expr)

dangotbanned added a commit that referenced this issue Oct 2, 2024
Spotted while investigating #3616 (comment)

When I originally wrote this test, I didn't realise I was comparing:
```py
GetAttrExpression(Parameter.name, "expr") == GetAttrExpression(Parameter.name, "expr")
```

The intention was to compare the contents were the same.

Due to `OperatorMixin.__eq__`, the previous assertion would always return `True`.
dangotbanned added a commit that referenced this issue Nov 1, 2024
- Initial use-case is `SelectionPredicateComposition` -> `PredicateComposition`
- Something similar might be a good idea for `Expression` -> `Expr` in #3616
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants