-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.7.2版本训练模型丢失问题 #906
Comments
是在/data/projects/fate/model_local_cache/ 这个目录。你们有定时清理逻辑吗? 是不是被清理了?, 这个可以查下。 |
@dylan-fan 没有做定时清理逻辑。部署kubefate各节点下的python容器一直运行未重启,各节点下面的/data/projects/fate/fateflow/model_local_cache目录以及/data/projects/fate/fateflow/jobs在同一个时间点内容均全部被清除。 FATE中可能有某些指令会触发model_local_cahe及jobs文件夹清空么? |
kubefate 这块fangchi看下? |
请问是用的是kubefate的docker-compose模式还是K8s模式? |
@wfangchi k8s模式 |
|
情况描述:
部署版本为Kubefate 1.7.2,可正常训练预测,运行一段时间后python容器下的/data/projects/fate/model_local_cache/目录清空。
问题:
1.模型是否最终存于python容器下的/data/projects/fate/model_local_cache/中?(如果是存储在Eggroll里,容器名称和目录是什么)
2.什么情况可能会导致kubefate的model_local_cache文件夹清空?
多谢
The text was updated successfully, but these errors were encountered: