Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

loss stuck when using multi_gpu #4

Open
burgalon opened this issue Oct 20, 2017 · 1 comment
Open

loss stuck when using multi_gpu #4

burgalon opened this issue Oct 20, 2017 · 1 comment

Comments

@burgalon
Copy link

burgalon commented Oct 20, 2017

I'm trying to use make_parallel() using Keras XCeption, and a generator which yields two classes, batch_size=2.

When using one gpu without make_parallel, the model gets to loss=0 acc=1 in 2 epochs.
However, when using multi_gpu with gpus=2, the model gets stuck in acc=0.5 with loss=8.0591.

I'm guessing this is related somehow to the loss aggregation being collected only from one GPU instead of both, but I am not sure why.

When trying to train 4 classes, batch_size=4, the training gets to acc=0.97 after 11 epochs, while single gpu gets acc=1 within 2 epochs.

Any idea?

@burgalon
Copy link
Author

also posted here keras-team/keras#8200

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant