Skip to content

Commit

Permalink
Merge branch 'main' into chelsealin_numeric
Browse files Browse the repository at this point in the history
  • Loading branch information
chelsea-lin authored Feb 5, 2024
2 parents 8cccf5b + a34af25 commit 436ef1b
Show file tree
Hide file tree
Showing 38 changed files with 281 additions and 447 deletions.
1 change: 1 addition & 0 deletions .github/renovate.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
],
"automerge": false,
"labels": ["dependencies"],
"nix": { "enabled": true },
"packageRules": [
{
"matchManagers": ["docker-compose", "dockerfile", "github-actions"],
Expand Down
115 changes: 0 additions & 115 deletions .github/workflows/update-deps.yml

This file was deleted.

4 changes: 2 additions & 2 deletions compose.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
services:
clickhouse:
image: clickhouse/clickhouse-server:23.12.2.59-alpine
image: clickhouse/clickhouse-server:23.12.3.40-alpine
ports:
- 8123:8123 # http port
- 9000:9000 # native protocol port
Expand Down Expand Up @@ -94,7 +94,7 @@ services:
- trino

minio:
image: bitnami/minio:2024.1.29
image: bitnami/minio:2024.1.31
environment:
MINIO_ROOT_USER: accesskey
MINIO_ROOT_PASSWORD: secretkey
Expand Down
15 changes: 0 additions & 15 deletions docs/_freeze/how-to/extending/builtin/execute-results/html.json

This file was deleted.

8 changes: 4 additions & 4 deletions docs/contribute/04_maintainers_guide.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ Ibis maintainers are expected to handle the following tasks as they arise:

## Dependencies

A number of tasks that are typically associated with maintenance are partially or fully automated.

- [WhiteSource Renovate](https://www.whitesourcesoftware.com/free-developer-tools/renovate/) (Python library dependencies and GitHub Actions)
- [Custom GitHub Action](https://github.com/ibis-project/ibis/actions/workflows/update-deps.yml) (Nix dependencies)
Dependency updates are automated using [Mend Renovate](https://www.mend.io/renovate/).

### poetry

Occasionally you may need to lock [`poetry`](https://python-poetry.org) dependencies. Edit `pyproject.toml` as needed, then run:

```sh
poetry lock --no-update
poetry export --extras all --with dev --with test --with docs --without-hashes --no-ansi > requirements-dev.txt
```

The second step updates `requirements-dev.txt` for developers using `pip`.

## Adding examples

If you're not a maintainer, please open an issue asking us to add your example.
Expand Down
101 changes: 25 additions & 76 deletions docs/how-to/extending/builtin.qmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
---
execute:
freeze: auto
---

# Reference built-in functions


Expand All @@ -27,9 +22,10 @@ functions](https://duckdb.org/docs/sql/functions/char.html#text-similarity-funct
Let's expose the `mismatches` API.

```{python}
from ibis import udf
import ibis
ibis.options.interactive = True
@udf.scalar.builtin
@ibis.udf.scalar.builtin
def mismatches(left: str, right: str) -> int:
...
```
Expand All @@ -47,8 +43,6 @@ write in the function body **will be ignored**.
We can now call this function on any ibis expression:

```{python}
import ibis
con = ibis.duckdb.connect() # <1>
```

Expand All @@ -62,98 +56,51 @@ con.execute(expr)
Like any other ibis expression you can inspect the SQL:

```{python}
import ibis
ibis.to_sql(expr, dialect="duckdb") # <1>
```

1. The `dialect` keyword argument must be passed, because we constructed
a literal expression which has no backend attached.

Because built-in UDFs are ultimately Ibis expressions, they compose with the
rest of the library:
Similarly we can expose Duckdb's
[`jaro_winkler_similarity`](https://duckdb.org/docs/sql/functions/char.html#text-similarity-functions)
function. Let's alias it to `jw_sim` to illustrate some more of the Ibis `udf` API:

```{python}
ibis.options.interactive = True
@udf.scalar.builtin
def jaro_winkler_similarity(a: str, b: str) -> float:
@ibis.udf.scalar.builtin(name="jaro_winkler_similarity")
def jw_sim(a: str, b: str) -> float:
...
```

Because built-in UDFs are ultimately Ibis expressions, they compose with the
rest of the library:

```{python}
pkgs = ibis.read_parquet(
"https://storage.googleapis.com/ibis-tutorial-data/pypi/packages.parquet"
)
pandas_ish = pkgs[jaro_winkler_similarity(pkgs.name, "pandas") >= 0.9]
pandas_ish = pkgs[jw_sim(pkgs.name, "pandas") >= 0.9]
pandas_ish
```

Let's count the results:
### Defining Signatures

```{python}
pandas_ish.count()
```

There are a good number of packages that look similar to `pandas`!

### Snowflake

Similarly we can expose Snowflake's
[`jarowinkler_similarity`](https://docs.snowflake.com/en/sql-reference/functions/jarowinkler_similarity)
function.

Let's alias it to `jw_sim` to illustrate some more of the Ibis `udf` API:

```{python}
@udf.scalar.builtin(name="jarowinkler_similarity") # <1>
def jw_sim(left: str, right: str) -> float:
...
```

1. `target` is the name of the function in the backend. This argument is
required in this because the function name is different than the name of the
function in ibis.


Now let's connect to Snowflake and call our `jw_sim` function:

```{python}
import os
con = ibis.connect(os.environ["SNOWFLAKE_URL"])
```

```{python}
expr = jw_sim("snow", "shoe")
con.execute(expr)
```

And let's take a look at the SQL

```{python}
ibis.to_sql(expr, dialect="snowflake")
```

### Input types

Sometimes the input types of builtin functions are difficult to spell.
Sometimes the signatures of builtin functions are difficult to spell.

Consider a function that computes the length of any array: the elements in the
array can be floats, integers, strings and even other arrays. Spelling that
type is difficult.

Fortunately the `udf.scalar.builtin` decorator doesn't require you to specify
input types in these cases:
Fortunately, the `udf.scalar.builtin` decorator **only** requires you to
specify the type of the **return value**. The type of the function **parameters**
are **not** required. Thus, this is adequate:

```{python}
@udf.scalar.builtin(name="array_size")
@ibis.udf.scalar.builtin(name="array_length")
def cardinality(arr) -> int:
...
```

::: {.callout-caution}
## The return type annotation **is always required**.
:::

We can pass arrays with different element types to our `cardinality` function:

```{python}
Expand All @@ -164,14 +111,16 @@ con.execute(cardinality([1, 2, 3]))
con.execute(cardinality(["a", "b"]))
```

When you bypass input types the errors you get back are backend dependent:
When you do not specify input types, Ibis isn't able to catch typing errors
early, and they are only caught during execution.
The errors you get back are backend dependent:

```{python}
#| error: true
con.execute(cardinality("foo"))
```

Here, Snowflake is informing us that the `ARRAY_SIZE` function does not accept
Here, DuckDB is informing us that the `ARRAY_LENGTH` function does not accept
strings as input.


Expand All @@ -198,7 +147,7 @@ function that isn't exposed in ibis:
First, define the builtin aggregate function:

```{python}
@udf.agg.builtin
@ibis.udf.agg.builtin
def kurtosis(x: float) -> float: # <1>
...
```
Expand Down
6 changes: 3 additions & 3 deletions flake.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion ibis/backends/clickhouse/tests/test_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ def test_array_index(con, arr, gen_idx):
)
def test_array_concat(con, arrays):
expr = L([]).cast("!array<int8>")
expected = sum(arrays, [])
expected = sum(arrays, []) # noqa: RUF017
for arr in arrays:
expr += L(arr, type="!array<int8>")

Expand Down
8 changes: 4 additions & 4 deletions ibis/backends/pandas/execution/temporal.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ def execute_timestamp_truncate(op, data, **kwargs):
def execute_interval_from_integer_series(op, data, **kwargs):
unit = op.unit.short
resolution = op.unit.plural
cls = OFFSET_CLASS.get(unit, None)
cls = OFFSET_CLASS.get(unit)

# fast path for timedelta conversion
if cls is None:
Expand All @@ -142,7 +142,7 @@ def execute_interval_from_integer_series(op, data, **kwargs):
def execute_interval_from_integer_integer_types(op, data, **kwargs):
unit = op.unit.short
resolution = op.unit.plural
cls = OFFSET_CLASS.get(unit, None)
cls = OFFSET_CLASS.get(unit)

Check warning on line 145 in ibis/backends/pandas/execution/temporal.py

View check run for this annotation

Codecov / codecov/patch

ibis/backends/pandas/execution/temporal.py#L145

Added line #L145 was not covered by tests

if cls is None:
return pd.Timedelta(data, unit=unit)
Expand All @@ -154,7 +154,7 @@ def execute_cast_integer_to_interval_series(op, data, type, **kwargs):
to = op.to
unit = to.unit.short
resolution = to.unit.plural
cls = OFFSET_CLASS.get(unit, None)
cls = OFFSET_CLASS.get(unit)

if cls is None:
return data.astype(f"timedelta64[{unit}]")
Expand All @@ -166,7 +166,7 @@ def execute_cast_integer_to_interval_integer_types(op, data, type, **kwargs):
to = op.to
unit = to.unit.short
resolution = to.unit.plural
cls = OFFSET_CLASS.get(unit, None)
cls = OFFSET_CLASS.get(unit)

if cls is None:
return pd.Timedelta(data, unit=unit)
Expand Down
Loading

0 comments on commit 436ef1b

Please sign in to comment.