-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation Engine policy on datasets without a default target #13
Comments
I would do the first for now. Yes, ideally these should be on a task level, but honestly I feel like computing meta-features is something that can be done so easily locally that we shouldn't worry about it too much for now. |
Just some expectation management, for now I would do neither of both options, as this seems like a nice issue for a hackathon, to be picked up by someone from the community, or to be accompanied by a research project. |
Just saying this so that in terms of this discussion we converge to the 'ideal' situation, rather than a quick and dirty hack that will do the trick for now. |
I don't understand how you can do not the first option, if the first option is not to compute them. |
Coding-wise that would be rather trivial, but getting this in production requires additional time investments:
yes, this can all be done, but in combination with all the other maintenance tasks that I perform(ed) on the various components of OpenML I would like to prevent over committing to maintenance work. |
As long as we don't do this first step, we'll have bad meta-features in the
database. Shall I just look around for someone to do this now (or do it
myself)?
On Thu, 4 Oct 2018 at 17:31 janvanrijn ***@***.***> wrote:
Coding-wise that would be rather trivial, but getting this in production
requires additional time investments:
- checking the exact set of meta-features that are depending on a
target feature
- updating unit tests
- code review
- deleting meta-features from server
- restarting a set of evaluation engine instances
yes, this can all be done, but in combination with all the other
maintenance tasks that I perform(ed) on the various components of OpenML I
would like to prevent over committing to maintenance work.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABpQV6gCwkWBoRP6CKHPxCwJnk3i3fCYks5uhinegaJpZM4XIMmD>
.
--
Thank you,
Joaquin
|
Would be perfect if you can find someone. Otherwise I will do it once I have time for this. Please make sure to review PR #11 first, as this introduced a proper unit testing framework and the extension can build upon this. |
Joaquin and I have agreed that I would do the coding, and he would make sure that the meta-features will be recalculated on the server. Just for anyone who thought this would be an easy issue, please check the diff on PR #14 (+405 −368). This does not even take into account the changes I made for the java connector (different, more low level, library). In order to be able to properly unit test the functions that do the trick, I had to restructure some things. Furthermore, I found out that the "quick and fast" calculation of the first 10 meta-features is a slight code duplication, and I fitted this into the general framework as well. This all required quite some changes, but all together, I think this code update makes the repository better maintainable. @joaquinvanschoren I think this PR is ready for review. Given the big change I would be surprised if there are no mistakes in it. Please have a thorough look to it, and feel free to run / extend the unit tests. |
My personal opinion is to remove the meta features. Except for a very small well defined set. Until for the remaining features it is well defined how this would work. Which apparently it is not. Hence this thread. |
@berndbischl: you mean remove the meta-features for datasets without a target feature and for multi-target datasets? That is exactly what we have done here. |
I just spend 2 days on refactoring and improving the part where meta-features get calculated. PR is under review by @joaquinvanschoren and I think it will be merged soon. Removing them now seems like a bad idea :) However, additionally I think that meta-features on task level solves almost all of our problems. This is something that could be a nice task for a moderately experienced contributor. |
I think @berndbischl meant remove all meta-features. Which is also the direction I tend to. At the current stage of openml it seems mostly a hassle. |
I think 2 days of work are not an argument for design decisions. |
This point I completely agree to.
This part I don't agree with. I do agree that the system currently consists of meta-features that might not be as is well-defined as we originally thought when we implemented them, so we can definitely change / improve that. But as the meta-features are currently present in:
For these reasons I am highly against dropping this functionality. |
Several datasets do not have a specific target. Also, multitask datasets do not have a single target, which complicates the calculation of meta-features such as: classcount, entropy, landmarkers and mean mutual information. Several things that we can do:
@mfeurer @amueller @joaquinvanschoren @berndbischl @giuseppec @ja-thomas
The text was updated successfully, but these errors were encountered: