Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Sep 28, 2023
1 parent d121e09 commit 1599ac2
Show file tree
Hide file tree
Showing 5 changed files with 177 additions and 154 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
e319981d
24b7edc0
54 changes: 30 additions & 24 deletions chapters/sec2/2-0-sec-intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -351,43 +351,49 @@ <h1 class="title"><span id="sec-2-intro" class="quarto-section-identifier">IT/Ad

</header>

<p>Welcome to the section of the book I wish I hadn’t written – the section where you’ll learn about the basics of doing IT/Admin tasks yourself.</p>
<p>Taking data science work to production involves getting that work hosted somewhere. As a data scientist, you want to share a development environment with other data professionals or publish a data science project to non-technical stakeholders. That sharing requires that your work live on a centralized server somewhere, and someone has to administer that server.</p>
<p>In my experience, data scientists are at their best when a professional IT/Admin takes responsibility for those administration tasks. But that partnership often isn’t achievable.</p>
<p>Some data professionals work at small organizations that lack dedicated IT/Admins. Others are students or hobbyists trying to DIY something cheaply. And others work at organizations with IT/Admin teams, but that team lacks the time, interest, or expertise necessary to be helpful.</p>
<p>Sometimes, a data scientist has to be their own IT/Admin to avoid being completely blocked from taking their work to production, which is the worst outcome. Many – if not most – data scientists find themselves responsible for administering the servers where their work runs at some point in their career.</p>
<p>That’s a scary place to be. Administering a server as a novice admin is a little like jumping from a Honda Civic to wrangling a tractor-trailer. You’re going from managing personal devices for your own use to managing a work machine that people make a whole career out of.</p>
<p>Even with the many online resources, the amount to learn can be completely overwhelming. And being a bad IT/Admin can mean security vulnerabilities, system instability, and general annoyance.</p>
<p>So in this section, you’re going to learn the basics of being your own IT/Admin. You’ll be introduced to the topics in IT/Administration that are relevant for a data science project. By the end, you’ll be pretty comfortable administering a simple data science workbench or server to host a data science project.</p>
<p>Welcome to the section of the book I wish I didn’t need to write – the section where you’ll learn about the basics of doing IT/Admin tasks yourself.</p>
<p>As a data scientist, you want to share a development environment with other data professionals or publish a data science project to non-technical stakeholders. That sharing requires a centralized server, and someone needs to administer that server.</p>
<p>In my experience, data scientists are at their best when paird with a professional IT/Admin who administers the servers. But that partnership often isn’t achievable.</p>
<p>You might work at a small organization that lacks dedicated IT/Admins. Or maybe you’re a student or hobbyist trying to cheaply DIY an environment. It’s possible you work at a sophisticated organization with professional IT/Admins, but they, unfortunately, lack the time, interest, or expertise necessary to be helpful.</p>
<p>Sometimes, you have to be your own IT/Admin to be able to take your work to production at all. It’s fair to say that many – if not most – data scientists will find themselves responsible for administering the servers where their work runs at some point in their career. And that’s a scary place to be.</p>
<p>Administering a server as a novice is like suddenly stepping into an 18-wheel tractor-trailer when you’ve never driven anything other than a cute Honda Civic.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> You’re leaping from managing a personal device to wrangling a professional-scale work machine without the training to match.</p>
<p>Even with the many online resources available as support, the number of topics and the depth of each can be completely overwhelming. And being a bad IT/Admin can lead to security vulnerabilities, system instability, and general annoyance.</p>
<p>In this section, you’re going to learn the basics of being your own IT/Admin. You’ll be introduced to the IT/Admin topics that are relevant for a data science environment. By the end, you’ll be comfortable administering a simple data science workbench or server to host a data science project.</p>
<p>If you don’t have to be your own IT/Admin, that’s even better. Reading this section will give you an appreciation for what an IT/Admin does and help you be a better partner to them.</p>
<section id="getting-and-running-a-server" class="level2">
<h2 class="anchored" data-anchor-id="getting-and-running-a-server">Getting and running a server</h2>
<p>The most common way to get a server for data science is to rent one from a cloud provider. In order to do data science tasks, many people combine their server with other services from the cloud provider. That’s why <a href="2-1-cloud.html">Chapter&nbsp;<span>7</span></a> is an introduction to what the cloud is and how you might want to use it for data science purposes.</p>
<p>Unlike your phone or personal computer, you’ll never touch this cloud server you’ve rented. Instead, you’ll administer the server via a virtual interface from your computer. Moreover, servers generally don’t even have the kind of point-and-click interface you’re used to on your personal devices.</p>
<p>Instead, you’ll access and manage your server from the text-only command line.That’s why <a href="2-2-cmd-line.html">Chapter&nbsp;<span>8</span></a> is all about how to set up the command line on your local machine to make it convenient and ergonomic and how to connect to your server for administration purposes using a technology called SSH.</p>
<p>Unlike your Apple, Windows, or Android operating systems you’re used to on your personal devices, most servers run the Linux operating system. <a href="2-3-linux.html">Chapter&nbsp;<span>9</span></a> will teach you a little about what Linux is and will introduce you to the basics of Linux administration including how to think about files and users on a multi-tenant server.</p>
<p>But you’re not interested in just running a Linux server. You want to use it to accomplish data science tasks. In particular, you’ll want to install data science tools like R, Python, RStudio, JupyterHub, and more. So you’ll need to learn how to install, run, and configure applications on your server. That’s why <a href="2-4-app-admin.html">Chapter&nbsp;<span>10</span></a> is about application administration.</p>
<p>When your phone or computer gets slow or you run out of storage, it’s probably time for a new one. But a server is a working machine that can be scaled up or down to accommodate more people or heavier workloads over time. That means that you may have to manage the server’s resources much more actively than your personal devices. That’s why <a href="2-5-scale.html">Chapter&nbsp;<span>11</span></a> is all about managing and scaling server resources.</p>
<p>Many data science tasks require a server and a variety of supporting tools like networking and storage. These days, the most common way to set up a data science environment is to rent a server from a cloud provider. That’s why <a href="2-1-cloud.html">Chapter&nbsp;<span>7</span></a> is an introduction to what the cloud is and how you might want to use it for data science purposes.</p>
<p>Unlike your phone or personal computer, you’ll never touch this cloud server you’ve rented. Instead, you’ll administer the server via a virtual interface from your computer. Moreover, servers generally don’t even have the kind of point-and-click interface you’re familiar with from your personal devices.</p>
<p>Instead, you’ll access and manage your server from the text-only command line.That’s why <a href="2-2-cmd-line.html">Chapter&nbsp;<span>8</span></a> is all about how to set up the command line on your local machine to make it convenient and ergonomic, and how to connect to your server for administration purposes using a technology called SSH.</p>
<p>Unlike the Apple, Windows, or Android operating systems you have on your personal devices, most servers run the Linux operating system. <a href="2-3-linux.html">Chapter&nbsp;<span>9</span></a> will teach you a little about what Linux is and will introduce you to the basics of Linux administration, including how to think about files and users on a multi-tenant server.</p>
<p>But you’re not just interested in running a Linux server. You want to use it to accomplish data science tasks. In particular, you want to use data science tools like R, Python, RStudio, JupyterHub, and more. You’ll need to learn how to install, run, and configure applications on your server. That’s why <a href="2-4-app-admin.html">Chapter&nbsp;<span>10</span></a> is about application administration.</p>
<p>When your phone or computer gets slow or you run out of storage, it’s probably time for a new one. But a server is a working machine that can be scaled up or down to accommodate more people or heavier workloads over time. That means that you may have to manage the server’s resources more actively than your personal devices. That’s why <a href="2-5-scale.html">Chapter&nbsp;<span>11</span></a> is all about managing and scaling server resources.</p>
</section>
<section id="making-it-safely-accessible" class="level2">
<h2 class="anchored" data-anchor-id="making-it-safely-accessible">Making it (safely) accessible</h2>
<p>Unless you’re doing something really silly, your personal devices aren’t accessible to anyone who isn’t physically touching the device. In contrast, most servers are only useful <strong>because</strong> they’re addressable on a computer network, perhaps even the open internet.</p>
<p>Making a server accessible to people over the internet makes it useful, but it also introduces risk. Many dastardly plans for your personal devices are thwarted because a villain would have to physically steal it to get access. For a server, allowing digital access means there are many more potential threats looking to steal data or hijack your computational resources for nefarious ends. You’ve got to be careful about how you’re providing access to the machine.</p>
<p>Moreover, risk aside, computer networking is a complicated topic, and making it work right can be somewhat difficult. Following random tutorials on the internet is a great way to eventually get your server working, but have no idea what happened or why it suddenly works.</p>
<p>The good news is that it’s not magic. <a href="2-6-networking.html">Chapter&nbsp;<span>12</span></a> is all about how computers find each other across a network. Once you understand the basic structure and operations of a computer network, making only the things you intend to be public on your server will be much easier.</p>
<p>Aside from a basic introduction to computer networking, there are two other things you’ll want to configure to make your server safe and accessible. The first is to host your server at a human-friendly URL, which you’ll learn how to configure in <a href="2-7-dns.html">Chapter&nbsp;<span>13</span></a>. The second is to add SSL/TLS to your server to secure the traffic going to and from your server. You’ll learn how to do that in <a href="2-8-ssl.html">Chapter&nbsp;<span>14</span></a>.</p>
<p>Once you’ve finished these chapters, you’ll have a basic understanding of all the main topics in IT/Admin that are likely to come up as you try to administer a simple data science workbench or project hosting platform.</p>
<p>Unless you’re doing something very silly, your personal devices aren’t accessible to anyone who isn’t physically touching the device. In contrast, most servers are only useful <strong>because</strong> they’re addressable on a computer network, perhaps even the open internet.</p>
<p>Making a server accessible to people over the internet makes it useful, but it also introduces risk. Many dastardly plans for your personal devices are thwarted because a villain would have to physically steal it to get access. For a server, allowing digital access means there are many more potential threats looking to steal data or hijack your computational resources for nefarious ends. Therefore, you’ve got to be careful about how you’re providing access to the machine.</p>
<p>Risk aside, there’s a lot of depth to computer networking and just getting it working isn’t trivial. You can probably muddle through by following tutorials on the internet, but that’s a great way to end up with connections that suddenly work and no idea what you did right or how you could break it in the future.</p>
<p>The good news is that it’s not magic. <a href="2-6-networking.html">Chapter&nbsp;<span>12</span></a> is all about how computers find each other across a network. Once you understand the basic structure and operations of a computer network, you’ll be able to configure your server’s networking and feel confident that you’ve done it right.</p>
<p>But you’re not done once you’ve configured basic connectivity for your server. You will want to take two more steps make it safe and easy to access. The first is to host your server at a human-friendly URL, which you’ll learn how to configure in <a href="2-7-dns.html">Chapter&nbsp;<span>13</span></a>. The second is to add SSL/TLS to your server to secure the traffic going to and from your server. You’ll learn how to do that in <a href="2-8-ssl.html">Chapter&nbsp;<span>14</span></a>.</p>
<p>By the end of these chapters, you will have solid mental models for all the basic tasks you or any other IT/Admin are going to take on in administering a data science workbench or hosting platform.</p>
</section>
<section id="labs-in-this-section" class="level2">
<h2 class="anchored" data-anchor-id="labs-in-this-section">Labs in this Section</h2>
<p>In the first section of the book, the labs involved creating a DevOps-friendly data science project. In this section, the labs will revolve around actually putting that project into production.</p>
<p>You’ll start by standing up an AWS EC2 instance, configuring your local command line, and connecting to the server via SSH. Once you’ve done that, you’ll learn how to create users on the server and access the server as a particular user.</p>
<p>At that point, you’ll be ready to transition into data science work. You’ll add R, Python, RStudio Server, and JupyterHub to your server and get them configured for work. Additionally, you’ll deploy the Shiny App and API you created in the book’s first section onto the server.</p>
<p>Once the server itself is configured, you’ll need to configure the server’s networking to make it accessible and secure. You’ll learn how to open the proper ports and configure a proxy to access multiple services on the same server, and you’ll learn to configure DNS records so your server is available at a real URL and SSL so it can all be done securely.</p>
<p>In the first section of the book, you created a DevOps-friendly data science project. In this section, the labs will focus on actually putting that project into production.</p>
<p>You’ll start by standing up a server from a cloud provider, configuring your local command line, and connecting to the server via SSH. Once you’ve done that, you’ll learn how to create users on the server and access the server as a particular user.</p>
<p>At that point, you’ll be ready to transition into data science work. You’ll add R, Python, RStudio Server, and JupyterHub to your server and get them configured. Additionally, you’ll deploy the Shiny App and API you created in the book’s first section onto the server.</p>
<p>Once the server itself is ready, you’ll need to configure the server’s networking to make it accessible and secure. You’ll learn how to open the proper ports, set up a proxy to access multiple services on the same server, configure DNS records so your server is available at a real URL, and activate SSL so it can all be done securely.</p>
<p>By the time you’ve finished the labs in this section, you’ll be able to use your EC2 instance as a data science workbench and add your penguin mass prediction Shiny App to the Quarto website you created in the book’s first section.</p>
<p>For more details on what you’ll do in each chapter, see <a href="../append/lab-map.html">Appendix&nbsp;<span>C</span></a>.</p>


</section>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr>
<ol>
<li id="fn1"><p>The first car I ever bought was a Honda Civic Hybrid. Great car.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>

</main> <!-- /main -->
Expand Down
Loading

0 comments on commit 1599ac2

Please sign in to comment.