Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@df: Passing column name as string overflows stack #562

Open
ron-wolf opened this issue Oct 23, 2024 · 1 comment · May be fixed by #563
Open

@df: Passing column name as string overflows stack #562

ron-wolf opened this issue Oct 23, 2024 · 1 comment · May be fixed by #563

Comments

@ron-wolf
Copy link

I was trying to plot this CSV dataset using StatsPlots.@df. I wrote and ran the following code in a fresh Pluto notebook:

using Downloads, CSV, DataFrames, StatsPlots
lifesat = CSV.read(Downloads.download("https://github.com/ageron/data/raw/main/lifesat/lifesat.csv"), DataFrame)
@df lifesat plot(cols("GDP per capita (USD)"), cols("Life satisfaction"))

I expected this to work, as per the README, but I was instead greeted by a StackOverflowError. Running again line-by-line in the Julia REPL yielded this stack trace:

 [1] add_sym!(cols::Vector{Symbol}, s::Char, names::Tuple{Symbol, Symbol, Symbol}) (repeats 79984 times)
   @ StatsPlots ~/.julia/packages/StatsPlots/cStOe/src/df.jl:199

However, this isn't the whole story, as expanding the macro by itself with macroexpand yields the following code (cleaned up):

(d -> begin
  (var1, var2), names = StatsPlots.extract_columns_and_names(d, "GDP per capita (USD)", "Life satisfaction")
  StatsPlots.add_label([StatsPlots.compute_name(names, "GDP per capita (USD)"), StatsPlots.compute_name(names, "Life satisfaction")], plot, var1, var2)
end)(lifesat)

And sure enough, attempting to call that first function extract_columns_and_names with those arguments yields the same error. The culprit is the following line of code in that function:

selected_cols = add_sym!(Symbol[], syms, names)

Which delegates to add_sym!, which recurses indefinitely over the first character of the first string, as follows:

add_sym!(Symbol[], "GDP per capita (USD)", (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
add_sym!(Symbol[], 'G', (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
add_sym!(Symbol[], 'G', (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
add_sym!(Symbol[], 'G', (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
# ...

The solution is to special-case strings in the definition for add_sym!, making sure not to iterate over their characters individually, but instead to convert them to symbols first.

@ron-wolf ron-wolf changed the title StackOverflowError when @df: Passing column name as string overflows stack Oct 23, 2024
@ron-wolf
Copy link
Author

Also, as a side note, I had expected StatsPlots to show the label of the DataFrame column by default, but it did not:

A Plots dot J L scatter plot in the Pluto notebook interface. The data is from the life satisfaction database of the Organisation for Economic Cooperation and Development, or O E C D. Each point in the scatter plot represents a country. The only label in the plot that identifies the data series as "Y 1". The axes, which are unlabeled, represent GDP per capita in U S dollars on the X axis and life satisfaction on the Y axis. The code to produce the plot appears below it, specifying the columns to retrieve from the data frame, the type of plot (a scatter plot), the horizontal and vertical bounds of the plot, and that minor grid lines should be displayed.

ron-wolf added a commit to ron-wolf/StatsPlots.jl that referenced this issue Oct 24, 2024
Fixes JuliaPlots#562. Add a special case to `add_sym!` to convert an `AbstractString` column name to a `Symbol` before adding it.

The old behavior would iterate over the string and try to add each character as a column name. This produced a stack overflow, since `add_sym!` had no special case for `Char`, resulting in indefinite recursion.
@ron-wolf ron-wolf linked a pull request Oct 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant