-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary column reported as number of classes if no default_target_attribute set #528
Comments
This one was fixed by @joaquinvanschoren |
thanks for checking. Is there a unit test? Generally it would be great if you could link to the PR or commit that closes an issue (which will also show the unit test). |
Actually can you maybe fix openml/openml-python#346 ? |
Actually it's not fixed: |
The point was that there shouldn't be some arbitrary number of classes if the |
I get it. Cc @joaquinvanschoren |
That's an EvaluationEngine issue, right? The consensus was to not compute any supervised meta-features in that case? @janvanrijn: has the evaluationengine been exported in a new jar and pulled to master? Then it's a matter of rerunning all the meta-features? |
I am surprises. I checked the meta-feature engine with the latest code base, and it seems to do this fine (obviously, as your code is correct) Somewhere we are running old meta-feature engines? The servers engine should be up to date (I would be highly surprised ). |
This was only merged recently, right? Are we already running the new jar, and |
Yesterday night we merged some other PR. This is the one that (presumably) solved this issue: |
Correction, this one: |
there's no regression test?! |
Right now I get:
Shouldn't |
There is. The code in the repository results in the correct behavior. However, as we are working with a websystem with multiple components, it is not impossible that somewhere an old version of one of the components is running, which is my only explanation why these results have come on line. I checked all instances of the evaluation engine I am running, and these are up to date. This query reveals that there are more datasets that are calculated wrongly. I will reset them all, and we should check whether once they are reevaluated still have this problem.
|
There was no regression test in the PR though ;) |
I asked @joaquinvanschoren to make a unit test, and it was implemented in this PR:
I get a different result. I just executed the following query:
I get 207 results (where there is no number of classes defined), I am checking them now |
The following query is a bit more accurate:
It only shows datasets that are already processed and that were processed without errors. I still get about 30 results: did | name |
This is the right query:
This shows 1 result: @amueller For all your other results: there is a difference between quality not calculated yet and quality set to NULL. The prior is not optimal, but not a bug, I would say (evaluation engine can be slow for certain qualities, making it lack behind when an initial try has crashed for whatever reason) The second is a problem. I will investigate. |
So the Analcat data problem hinges on the following: It does not have a default target attribute (problem is actually bigger, I opened #834 for this). I think the NULL value is OK, as we now have the following distinction: number of classes > 0: classification dataset I am actually confused why other datasets that have default target attribute IS NULL other values.. Will investigate this. |
Found an issue in the OpenML API: |
So little update. This is the query with problematic datasets:
These are the datasets: did | name | visibility | NumberOfClasses | As argued, I expected these to have value NULL for NumberOfClasses. But they don't. I will investigate. |
I am rerunning the latest version of evaluation engine on all these datasets, and they seem to be fine now. This causes me to believe that there was an old version of the evaluation engine running for a while, or there still is one running somewhere (let's hope not) We should probably start versioning evaluation engines, and make the api check for the latest version before accepting results. |
I reapplied the query, everything seems fine. please close if you agree. Note that
Is there a good place to document this? probably meta-feature record? @joaquinvanschoren Why were the results wrong in the first place? Although I am not 100% sure about this, my best guess would be that there was somewhere an old version of the evaluation engine calculating meta-features. I opened #835 for this reason. |
Best to document this is the description of NumberOfClasses feature, yes.
The evaluation engine is already versioned, right?
Even with the version number, right now anyone can update the metafeatures
simply by downloading and running the evaluationengine, right?
On Mon, 22 Oct 2018 at 20:30, janvanrijn ***@***.***> wrote:
I reapplied the query, everything seems fine. please close if you agree.
Note that NumberOfClasses can be null, according to this rule:
number of classes > 0: classification dataset
number of classes = 0: regression dataset
number of classes undefined: we don't know.
Is there a good place to document this? probably meta-feature record?
@joaquinvanschoren <https://github.com/joaquinvanschoren>
Why were the results wrong in the first place? Although I am not 100% sure
about this, my best guess would be that there was somewhere an old version
of the evaluation engine calculating meta-features. I opened #835
<#835> for this reason.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#528 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABpQVyf8HisX8XcBIUlTgMEBsSX5Fp03ks5ung7EgaJpZM4Q3CYu>
.
--
Thank you,
Joaquin
|
By the way, thanks for hunting this down! It’s a relief that this is
consistent now!
On Mon, 22 Oct 2018 at 21:10, Joaquin Vanschoren <
[email protected]> wrote:
Best to document this is the description of NumberOfClasses feature, yes.
The evaluation engine is already versioned, right?
Even with the version number, right now anyone can update the metafeatures
simply by downloading and running the evaluationengine, right?
On Mon, 22 Oct 2018 at 20:30, janvanrijn ***@***.***> wrote:
> I reapplied the query, everything seems fine. please close if you agree.
> Note that NumberOfClasses can be null, according to this rule:
>
> number of classes > 0: classification dataset
> number of classes = 0: regression dataset
> number of classes undefined: we don't know.
>
> Is there a good place to document this? probably meta-feature record?
> @joaquinvanschoren <https://github.com/joaquinvanschoren>
>
> Why were the results wrong in the first place? Although I am not 100%
> sure about this, my best guess would be that there was somewhere an old
> version of the evaluation engine calculating meta-features. I opened #835
> <#835> for this reason.
>
> —
> You are receiving this because you were mentioned.
>
>
> Reply to this email directly, view it on GitHub
> <#528 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABpQVyf8HisX8XcBIUlTgMEBsSX5Fp03ks5ung7EgaJpZM4Q3CYu>
> .
>
--
Thank you,
Joaquin
--
Thank you,
Joaquin
|
Thanks! forgot this query:
which gives me this result:
(501 was also in there before I reran it). I will reset them all. |
Yes, versioned, but this version is not communicated with the openml server, and the version with which something was ran is not reported.
Nope, this is an (semi)-admin functions. (Semi)-admins can do this, indeed. We need to find a way how to check whether (semi)-admins can only run evaluation engines that belong to their user account. |
|
As a gentle reminder: datasets without a default target feature will have number of classes is NULL, datasets with a numeric default target feature will have number of classes is 0. @joaquinvanschoren was correct, the production server didn't run the latest version of the EvalEngine. I updated it. I will reset all faulty runs. FFR, the list with currently wrong datasets:
|
FFR, here is the other query that needs to be checked:
|
seems solved. shall we close? |
I didn't have time to check but if you're certain we can close. |
Both my queries still do not give any results. will close for now. |
As mentioned in #527, https://www.openml.org/d/40945 has no
default_target_attribute
but reports 370 classes. That doesn't make a lot of sense to me. I think we shouldn't report "number of classes" if there's no default target.The text was updated successfully, but these errors were encountered: