Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to run pytorch mnist ddp #1040

Open
wangpf09 opened this issue May 31, 2023 · 0 comments
Open

how to run pytorch mnist ddp #1040

wangpf09 opened this issue May 31, 2023 · 0 comments

Comments

@wangpf09
Copy link

I have kubeflow deployed now, but there is a problem running the official mnist example, how should I solve it? The yml of PytorchJob is as follows:

apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: pytorch-mnist-ddp-gpu
  namespace: kubeflow-user-example-com
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - image: gcr.io/kubeflow-examples/pytorch-mnist-ddp-gpu
              name: pytorch
              resources:
                limits:
                  cpu: '1'
                  memory: 4Gi
                  nvidia.com/gpu: 1
              volumeMounts:
                - mountPath: /mnt/kubeflow-gcfs
                  name: kubeflow-gcfs
          volumes:
            - name: kubeflow-gcfs
              persistentVolumeClaim:
                claimName: kubeflow-gcfs
                readOnly: false
    Worker:
      replicas: 2
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - image: gcr.io/kubeflow-examples/pytorch-mnist-ddp-gpu
              name: pytorch
              resources:
                limits:
                  cpu: '1'
                  memory: 4Gi
                  nvidia.com/gpu: 1
              volumeMounts:
                - mountPath: /mnt/kubeflow-gcfs
                  name: kubeflow-gcfs
          volumes:
            - name: kubeflow-gcfs
              persistentVolumeClaim:
                claimName: kubeflow-gcfs
                readOnly: false

8d731664134b224973a790c50a2885d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant