`readdlm` dictionary parsing default behavior #13

jakebolewski · 2016-06-10T12:39:38Z

julia> english = readdlm(IOBuffer(readstring("/usr/share/dict/words")),  ' ')
235886×1 Array{Any,2}:
 "A"
 "a"
 "aa"
 "aal"
 "aalii"
 "aam"
 "Aani"
 "aardvark"
 "aardwolf"
 "Aaron"
 "Aaronic"
 "Aaronical"
 "Aaronite"
 "Aaronitic"
 "Aaru"
 "Ab"
 "aba"
 "Ababdeh"
 "Ababua"
 "abac"
 "abaca"
 "abacate"
 "abacay"
 ⋮
 "zymoscope"
 "zymosimeter"
 "zymosis"
 "zymosterol"
 "zymosthenic"
 "zymotechnic"
 "zymotechnical"
 "zymotechnics"
 "zymotechny"
 "zymotic"
 "zymotically"
 "zymotize"
 "zymotoxic"
 "zymurgy"
 "Zyrenian"
 "Zyrian"
 "Zyryan"
 "zythem"
 "Zythia"
 "zythum"
 "Zyzomys"
 "Zyzzogeton"

julia> english_lower = map(lowercase, english);
ERROR: MethodError: no method matching lowercase(::Bool)
 in collect_to!(::Array{String,2}, ::Base.Generator{Array{Any,2},Base.#lowercase}, ::Int64, ::Int64) at ./array.jl:267
 in _collect(::Array{Any,2}, ::Base.Generator{Array{Any,2},Base.#lowercase}, ::Base.EltypeUnknown, ::Base.HasShape) at ./array.jl:249
 in map(::Function, ::Array{Any,2}) at ./abstractarray.jl:1131
 in eval(::Module, ::Any) at ./boot.jl:225
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

julia> Set(typeof(x) for x in english)
Set(Any[Bool,SubString{String},Float64])

julia> english_str = map(strip, split(readstring("/usr/share/dict/words")));

julia> foreach(english_str) do x
       try
       parse(Float64, x)
       println(x)
       end
       end
infinity
Nan
nan

andreasnoack · 2016-06-10T12:50:13Z

What did you expect from can_parse_float? The words that can be parsed as Float64s are already Float64s

julia> filter(t -> isa(t, Float64), english)
3-element Array{Any,1}:
 Inf
 NaN
 NaN

and the second argument in parse should be a string.

julia> parse(Float64, NaN)
ERROR: MethodError: no method matching parse(::Type{Float64}, ::Float64)
Closest candidates are:
  parse{T<:AbstractFloat}(::Type{T<:AbstractFloat}, ::AbstractString)
 in eval(::Module, ::Any) at ./boot.jl:225
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

jakebolewski · 2016-06-10T12:59:06Z

Yes, I am not thinking this early. I updated the original comment. I feel that this should return a list of strings by default (would be what I would expect) but this is debatable because Inf / NaN / true / false are in the dictionary.

malmaud · 2016-06-10T13:04:02Z

I think every other DLM reader I've encountered will first check if a column is entirely boolean/numeric, and if not will just return strings for each cell. Returning a hybrid of both is weird.

StefanKarpinski · 2016-06-10T15:13:47Z

Solution: delete readdlm.

malmaud · 2016-06-10T15:17:58Z

ViralBShah · 2016-06-10T18:28:54Z

@tanmaykm said he is working on a better readdlm. And deleting it is not a solution, unless you have something better to offer.

StefanKarpinski · 2016-06-10T18:35:59Z

How is deleting it not a solution? Both DataFrames and CSV.jl already offer better solutions to this.

tkelman · 2016-06-10T18:51:41Z

Even if the code that's currently in base is going to be improved, there's not much reason for it to remain in this repository and coupled to the release schedule of the core language and compiler.

StefanKarpinski · 2016-06-10T18:56:05Z

💯 CSV parsing is complicated enough that it should be moved out of the base language.

ViralBShah · 2016-06-10T19:13:36Z

Yes, CSV parsing is complicated, but dlm parsing is not. The current Julia repo has a bunch of other things that make things convenient for users of such systems - and I don't see why reading simple delimited files is the thing that has to be booted. Why do I have to install a package to load a simple file?

StefanKarpinski · 2016-06-10T19:17:55Z

This implementation is a nightmare and it's awful to maintain when we change things in Base.

StefanKarpinski · 2016-06-10T19:21:05Z

The fact that we have readdlm in base leads to also having readcsv in base, which leads to users, naïvely (the fools!) actually trying to use it to read CSV files, only to be told, "don't do that, you should use DataFrames or CSV". That is a ludicrous situation.

JeffBezanson · 2016-06-10T19:24:08Z

I think we should have readdlm only, and have it only support simple delimited files, not csv, and remove the readcsv function from Base.

ViralBShah · 2016-06-10T19:30:16Z

readcsv is only 2 lines, and I am fine with removing them to avoid the confusion.

readcsv(io; opts...)          = readdlm(io, ','; opts...)
readcsv(io, T::Type; opts...) = readdlm(io, ',', T; opts...)

ViralBShah · 2016-06-10T19:32:07Z

The implementation is a nightmare for good reason. I think it will simplify over time as the compiler gets better, and by being strict about the files that it accepts.

nalimilan · 2016-06-10T21:03:45Z

+1 for removing. Apart from threads by people complaining about poor performance and lack of options of this function, Julia has been criticized for having a too big standard library. readdlm is a typical case as it's not even in a submodule which could eventually become a separate package.

JeffBezanson · 2016-06-10T21:06:02Z

Yes, I think we should deprecate readcsv right away, and plan to move readdlm into the "standard library" when that process happens, hopefully early in the 0.6 cycle.

jakebolewski · 2016-06-15T17:44:29Z

reading in simple homogeneous (typed) text files is really handy esp. for testing so delegating this job to CSV.jl or similar package seems like overkill. I think that readdlm should be similar to Numpy's loadtxt. It handles the simple cases reliably by specifying the input type up front and punting on any input that is hard. I would rather have something simple that works reliably than something super complicated and magical that does not.

StefanKarpinski · 2016-06-15T19:10:04Z

I would be fine with having something simple, fast and reliable in Base; that is not what we have atm.

jakebolewski changed the title ~~readdlm cannot parse the dictionary correctly by default~~ readdlm dictionary parsing default behavior Jun 10, 2016

kshyatt added collections I/O labels Jan 26, 2017

JeffBezanson added the bug Something isn't working label Aug 23, 2017

JeffBezanson mentioned this issue Aug 23, 2017

deprecate readcsv and writecsv? JuliaLang/julia#23414

Closed

ViralBShah transferred this issue from JuliaLang/julia Apr 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`readdlm` dictionary parsing default behavior #13

`readdlm` dictionary parsing default behavior #13

jakebolewski commented Jun 10, 2016 •

edited

Loading

andreasnoack commented Jun 10, 2016

jakebolewski commented Jun 10, 2016

malmaud commented Jun 10, 2016

StefanKarpinski commented Jun 10, 2016

malmaud commented Jun 10, 2016

ViralBShah commented Jun 10, 2016

StefanKarpinski commented Jun 10, 2016

tkelman commented Jun 10, 2016 •

edited

Loading

StefanKarpinski commented Jun 10, 2016

ViralBShah commented Jun 10, 2016 •

edited

Loading

StefanKarpinski commented Jun 10, 2016

StefanKarpinski commented Jun 10, 2016

JeffBezanson commented Jun 10, 2016 via email

ViralBShah commented Jun 10, 2016

ViralBShah commented Jun 10, 2016

nalimilan commented Jun 10, 2016

JeffBezanson commented Jun 10, 2016

jakebolewski commented Jun 15, 2016 •

edited

Loading

StefanKarpinski commented Jun 15, 2016

readdlm dictionary parsing default behavior #13

readdlm dictionary parsing default behavior #13

Comments

jakebolewski commented Jun 10, 2016 • edited Loading

andreasnoack commented Jun 10, 2016

jakebolewski commented Jun 10, 2016

malmaud commented Jun 10, 2016

StefanKarpinski commented Jun 10, 2016

malmaud commented Jun 10, 2016

ViralBShah commented Jun 10, 2016

StefanKarpinski commented Jun 10, 2016

tkelman commented Jun 10, 2016 • edited Loading

StefanKarpinski commented Jun 10, 2016

ViralBShah commented Jun 10, 2016 • edited Loading

StefanKarpinski commented Jun 10, 2016

StefanKarpinski commented Jun 10, 2016

JeffBezanson commented Jun 10, 2016 via email

ViralBShah commented Jun 10, 2016

ViralBShah commented Jun 10, 2016

nalimilan commented Jun 10, 2016

JeffBezanson commented Jun 10, 2016

jakebolewski commented Jun 15, 2016 • edited Loading

StefanKarpinski commented Jun 15, 2016

`readdlm` dictionary parsing default behavior #13

`readdlm` dictionary parsing default behavior #13

jakebolewski commented Jun 10, 2016 •

edited

Loading

tkelman commented Jun 10, 2016 •

edited

Loading

ViralBShah commented Jun 10, 2016 •

edited

Loading

jakebolewski commented Jun 15, 2016 •

edited

Loading