Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update audb.Dependencies methods benchmark #365

Merged
merged 8 commits into from
Feb 8, 2024
Merged

Update audb.Dependencies methods benchmark #365

merged 8 commits into from
Feb 8, 2024

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented Feb 7, 2024

This updates the benchmark of audb.Dependencies methods to be more independent of the actual implementation of the dtypes used, and include different implementations in the results.

Unfortunately, the results show that using the object dtype for strings is still the fastest way, which seems to be mainly influenced by degraded row based performance for string and string[pyarrow]:

import pandas as pd
import timeit

points = 1000000
data = [f"data-{n}" for n in range(points)]
for dtype in ["object", "string", "string[pyarrow]"]:
    index = pd.Index([f"index-{n}" for n in range(points)], dtype=dtype)
    df = pd.DataFrame(data, index=index, dtype=dtype)
    print(dtype)
    %timeit df.loc['index-2000']

which returns

object
9.78 µs ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
string
15.7 µs ± 36.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
string[pyarrow]
17.6 µs ± 66.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

image

@hagenw hagenw marked this pull request as ready for review February 8, 2024 08:41
@hagenw hagenw merged commit 75d4f3c into main Feb 8, 2024
9 checks passed
@hagenw hagenw deleted the update-benchmark branch February 8, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant