You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
I'm trying to use make_parallel() using Keras XCeption, and a generator which yields two classes, batch_size=2.
When using one gpu without make_parallel, the model gets to loss=0 acc=1 in 2 epochs.
However, when using multi_gpu with gpus=2, the model gets stuck in acc=0.5 with loss=8.0591.
I'm guessing this is related somehow to the loss aggregation being collected only from one GPU instead of both, but I am not sure why.
When trying to train 4 classes, batch_size=4, the training gets to acc=0.97 after 11 epochs, while single gpu gets acc=1 within 2 epochs.
Any idea?
The text was updated successfully, but these errors were encountered:
I'm trying to use make_parallel() using Keras XCeption, and a generator which yields two classes, batch_size=2.
When using one gpu without make_parallel, the model gets to loss=0 acc=1 in 2 epochs.
However, when using multi_gpu with gpus=2, the model gets stuck in acc=0.5 with loss=8.0591.
I'm guessing this is related somehow to the loss aggregation being collected only from one GPU instead of both, but I am not sure why.
When trying to train 4 classes, batch_size=4, the training gets to acc=0.97 after 11 epochs, while single gpu gets acc=1 within 2 epochs.
Any idea?
The text was updated successfully, but these errors were encountered: