loss stuck when using multi_gpu #4

burgalon · 2017-10-20T12:27:20Z

I'm trying to use make_parallel() using Keras XCeption, and a generator which yields two classes, batch_size=2.

When using one gpu without make_parallel, the model gets to loss=0 acc=1 in 2 epochs.
However, when using multi_gpu with gpus=2, the model gets stuck in acc=0.5 with loss=8.0591.

I'm guessing this is related somehow to the loss aggregation being collected only from one GPU instead of both, but I am not sure why.

When trying to train 4 classes, batch_size=4, the training gets to acc=0.97 after 11 epochs, while single gpu gets acc=1 within 2 epochs.

Any idea?

burgalon · 2017-10-20T12:30:18Z

also posted here keras-team/keras#8200

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss stuck when using multi_gpu #4

loss stuck when using multi_gpu #4

burgalon commented Oct 20, 2017 •

edited

Loading

burgalon commented Oct 20, 2017

loss stuck when using multi_gpu #4

loss stuck when using multi_gpu #4

Comments

burgalon commented Oct 20, 2017 • edited Loading

burgalon commented Oct 20, 2017

burgalon commented Oct 20, 2017 •

edited

Loading