Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample weights #120

Closed
wants to merge 7 commits into from
Closed

Sample weights #120

wants to merge 7 commits into from

Conversation

nikisix
Copy link
Contributor

@nikisix nikisix commented Jul 17, 2018

Commit corresponds to discussion at #118
Loss functions and implicit factorization done.
Implicit sequence coming.

Copy link
Owner

@maciejkula maciejkula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, I left a couple of comments. I think once we've settled on a design for the implicit factorization models we can extend to all the other models as well.

Thanks for your help, I really appreciate it!

mask = mask.float()
loss = loss * mask
return loss.sum() / mask.sum()
if sample_weights is not None or mask is not None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need these checks here: we can always call the base loss function. I think that will make for less code repetition in all the loss functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally

@@ -15,7 +15,42 @@
from spotlight.torch_utils import assert_no_grad


def pointwise_loss(positive_predictions, negative_predictions, mask=None):
def base_loss(loss, sample_weights=None, mask=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this _weighted_loss? This would reflect the fact that:

  1. This is an internal function.
  2. Its essence lies in applying weights.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this name, but it'd be a misnomer in case they call us with a mask and no weights. Although, conceptually at least, a mask could be thought of as a weight of zero.

_base_loss
vs
_weighted_loss
vs
_modified_loss

¯_(ツ)_/¯ ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, as you suggest, we subsume masks under weights this will all be fine!

raise ValueError('Degenerate epoch loss: {}'
.format(epoch_loss))

def fit_weighted(self, interactions, verbose=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep a single fit method. If weights are present, we use them; if not, we don't.

This could probably be accomplished by changing the minibatch function to just infinitely yield None for those arguments that are None rather than tensors. This way, it always yields None for batch_sample_weights which ties in nicely with how the loss functions are changed.

Copy link
Contributor Author

@nikisix nikisix Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely tried to keep it to a single fit function at first. But I want to avoid consuming ram for an unnecessary tensor of Nones.
Would this lead to a tensor of Nones (not None) being passed to the loss functions, rather than a single None? (unless the tensor is not created by yielding Nones?)
Would also have to augment torch_utils.shuffle and add a few if all(weights==None) checks in (may slow it down a bit).

What if we just checked if weights are specified at the beginning of fit, and if they are just have fit call _fit_weighted? That way we can maintain the api of having the user always call fit, but still keep the code relatively simple.

Sequence weight question for you: I was thinking about the sequence interactions yesterday, and my guess is to make a parallel sequence tensor full of sample weights, of the same dimensions as the sequence tensor of item-ids. Does this sound right?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't need a tensor of nones. I'll add a comment with a prototype of what I meant.

…ed_loss. rename fit_weighted to _fit_weighted and call it from fit to streamline the user-api. add cscope database files to gitignore.
@nikisix
Copy link
Contributor Author

nikisix commented Jul 18, 2018

I notice that the sequence losses already make pretty heavy use of the mask argument via the PADDING_INDEX.

                loss = self._loss_func(positive_prediction,
                                       negative_prediction,
                                       mask=(sequence_var != PADDING_IDX))

Is it correct for sample weights to override them in the _weighted_loss, or is there something more sophisticated we could be doing?

@maciejkula
Copy link
Owner

This is a good point: we can definitely subsume masks under weights by just setting weights to zero where mask should be false.

raise ValueError('Degenerate epoch loss: {}'
.format(epoch_loss))

def _fit_weighted(self, interactions, verbose=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced about having a separate _fit_weighted function, even if it is internal. I think it introduces too much code duplication that will have to be kept in sync.

What about modifying minibatch to look roughly like this:

def minibatch(*tensors, **kwargs):

    batch_size = kwargs.get('batch_size', 128)

    if len(tensors) == 1:
        tensor = tensors[0]
        for i in range(0, len(tensor), batch_size):
            yield tensor[i:i + batch_size]
    else:
        for i in range(0, len(tensors[0]), batch_size):
            yield tuple(x[i:i + batch_size] if x is not None else None for x in tensors)

This way, it emits tensor slices if an argument is a tensor (as before), but also emits None in the tuple if an argument is None.

raise ValueError('Degenerate epoch loss: {}'
.format(epoch_loss))

def fit_weighted(self, interactions, verbose=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't need a tensor of nones. I'll add a comment with a prototype of what I meant.

Copy link
Owner

@maciejkula maciejkula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making masks simply zero weights.

But I still think we should try to keep only one fit method :)

@maciejkula
Copy link
Owner

Closing in favour of #122

@maciejkula maciejkula closed this Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants