Samples no longer work #33

alunap · 2024-09-17T15:58:10Z

If I run the example code (any of them) I get a failure.

using MLJ, MLJText, TextAnalysis

docs = ["Hi my name is Sam.", "How are you today?"]
tfidf_transformer = TfidfTransformer()
mach = machine(tfidf_transformer, tokenize.(docs))
MLJ.fit!(mach)

tfidf_mat = transform(mach, tokenize.(docs))

will fail:

ERROR: MethodError: no method matching tokenize(::String)

Closest candidates are:
  tokenize(::Type{S}, ::T) where {S<:Languages.Language, T<:AbstractString}
   @ TextAnalysis ~/.julia/packages/TextAnalysis/bdhyq/src/deprecations.jl:4
  tokenize(::S, ::T) where {S<:Languages.Language, T<:AbstractString}
   @ TextAnalysis ~/.julia/packages/TextAnalysis/bdhyq/src/tokenizer.jl:19

Stacktrace:
 [1] _broadcast_getindex_evalf
   @ ./broadcast.jl:709 [inlined]
 [2] _broadcast_getindex
   @ ./broadcast.jl:682 [inlined]
 [3] getindex
   @ ./broadcast.jl:636 [inlined]
 [4] copy
   @ ./broadcast.jl:942 [inlined]
 [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(tokenize), Tuple{Vector{String}}})
   @ Base.Broadcast ./broadcast.jl:903
 [6] top-level scope
   @ REPL[11]:1

The text was updated successfully, but these errors were encountered:

ablaom · 2024-09-18T20:45:21Z

Thanks for posting.

The tokenize method is owned by TextAnalysis, and I see that TextAnalysis 0.8 no longer supports the version dispatching on strings. One workaround is to use TextAnalysis 0.7.5.

Posted a query here about what the correct method is now.

pazzo83 · 2024-09-18T20:49:49Z

It looks like it requires a "language" to be set, but it's not actually used:
https://github.com/JuliaText/TextAnalysis.jl/blame/master/src/tokenizer.jl

I think one thing we can do is just use the tokenization from WordTokenzers directly. I can look at making this change.

ablaom · 2024-09-18T21:18:58Z

Thanks @pazzo83 for looking into this. This doesn't appear to affect MLJText tests, which are still passing (the compat for TextAnaysis includes the latest 0.8).

So maybe all that's needed is to update the docstrings.

ablaom · 2024-10-08T19:06:30Z

@pazzo83 Any chance of a PR to fix this?

pazzo83 · 2024-10-26T14:23:46Z

Sorry for the delay! I realized that we only need to update the Readme (since this is the code that is being referred to). I will be doing that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samples no longer work #33

Samples no longer work #33

alunap commented Sep 17, 2024

ablaom commented Sep 18, 2024

pazzo83 commented Sep 18, 2024

ablaom commented Sep 18, 2024 •

edited

Loading

ablaom commented Oct 8, 2024

pazzo83 commented Oct 26, 2024

Samples no longer work #33

Samples no longer work #33

Comments

alunap commented Sep 17, 2024

ablaom commented Sep 18, 2024

pazzo83 commented Sep 18, 2024

ablaom commented Sep 18, 2024 • edited Loading

ablaom commented Oct 8, 2024

pazzo83 commented Oct 26, 2024

ablaom commented Sep 18, 2024 •

edited

Loading