Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start a Dask cluster with EMR cluster programmatically #121

Open
jennakwon06 opened this issue Jun 25, 2020 · 3 comments
Open

Start a Dask cluster with EMR cluster programmatically #121

jennakwon06 opened this issue Jun 25, 2020 · 3 comments

Comments

@jennakwon06
Copy link

jennakwon06 commented Jun 25, 2020

Hello,

We want to programmatically spin up an EMR cluster then spin up a Dask cluster in the EMR cluster with YarnCluster construct.

Currently, what we are doing is - open up SSH tunnel to the master node of the EMR cluster (it's in private subnet), log onto the master node, create a .ipynb notebook that has "YarnCluster(..)" code. We execute that cell to spin up the Dask cluster.

It would be nice to automate this; e.g. run some commands to spin up an EMR cluster that also has Dask cluster.

Thanks!

@jennakwon06 jennakwon06 changed the title Start a Dask cluster with EMR cluster "step"? Start a Dask cluster with EMR cluster programmatically Jun 25, 2020
@jennakwon06
Copy link
Author

Or something like - set up the Dask cluster as part of EMR bootstrap - that would be useful.

@quasiben
Copy link
Member

This seems like a useful feature though I'm not sure it belongs in dask-yarn. Quickly glancing at boto and it seems like there is support for launching EMR. In fact, I found a blog post on it: https://medium.com/@kulasangar/create-an-emr-cluster-and-submit-a-job-using-boto3-c34134ef68a0. Perhaps someone has time to experiment with connecting boto3 and dask-yarn together ?

@jennakwon06
Copy link
Author

So yes - we are programatically launching an EMR cluster with boto EMR api.

But the manual step is - when EMR cluster is done launching (takes ~5 minutes), log onto the master node of the EMR cluster then run a Jupyter notebook with cell "cluster = YarnCluster(...)".

We then do "Client("ip-node-of-emr-master-node")" to connect to the YarnCluster from somewhere different than EMR master node - like a Jupyter notebook on a SageMaker notebook instance.

So the ideal is - from my SM notebook instance, I can do one call "spin-up-dask-cluster-on-emr-cluster(dask_cluster_settings, emr_cluster_settings)".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants