Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standalone mode issue #21

Open
kevinsummer219 opened this issue Mar 25, 2021 · 4 comments
Open

standalone mode issue #21

kevinsummer219 opened this issue Mar 25, 2021 · 4 comments

Comments

@kevinsummer219
Copy link

hi,
run the command, the train will always block on DEBUG:data loader ready:
sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10002

if i run sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10003, maser2 can train normally, however, master1 can't work normally, why?

@kevinsummer219
Copy link
Author

我用中文描述下我的问题:
目前按照standalone的执行流程,目前我能正常跑起来。
但是如果只使用sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10002,也就是启动master1, 从log中看到一直blocker在data loader ready:,不能正常跑起来。如果这个时候我启动sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10003, master2我从log中看到能够正常训练。这个时候在启动sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10004,master3也不能正常训练。
我的问题是:master1、master2、master3、master4,不应该都可以启动并且跑起来,为什么同时启动就是master2能够正常跑,其他三个都不正常?为什么是master2?

@sagewe
Copy link
Collaborator

sagewe commented Mar 29, 2021

我用中文描述下我的问题:
目前按照standalone的执行流程,目前我能正常跑起来。
但是如果只使用sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10002,也就是启动master1, 从log中看到一直blocker在data loader ready:,不能正常跑起来。如果这个时候我启动sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10003, master2我从log中看到能够正常训练。这个时候在启动sh FedVision/examples/paddle_mnist/run.sh 127.0.0.1:10004,master3也不能正常训练。
我的问题是:master1、master2、master3、master4,不应该都可以启动并且跑起来,为什么同时启动就是master2能够正常跑,其他三个都不正常?为什么是master2?

有可能是因为数据没有预先下载好? 尝试在实验之前,到data 目录执行下脚本

@kevinsummer219
Copy link
Author

肯定是下载好了的哈,不然master2怎么能跑起来,启动master1和master2,master2就能跑起来,单独启动任何一个master都不能跑起来?

@zhangfx123
Copy link

你好,我也遇到了相同的问题,请问你解决了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants