Skip to content

Commit

Permalink
Added S3 placeholder notebook (#115)
Browse files Browse the repository at this point in the history
* Added S3 placeholde notebook

* Added variables

---------

Co-authored-by: chetan thote <[email protected]>
  • Loading branch information
chetanthote and chetan thote authored Sep 16, 2024
1 parent ae48bdb commit 64e60fb
Show file tree
Hide file tree
Showing 3 changed files with 382 additions and 42 deletions.
11 changes: 11 additions & 0 deletions notebooks/load-csv-data-s3-placeholder/meta.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[meta]
authors=["chetan-thote"]
title="Importing Data from S3 into SingleStore using Pipelines"
description="""\
This notebook demonstrates how to create a sample table in SingleStore, set up a pipeline to import data from an Amazon S3 bucket."""
difficulty="beginner"
tags=["starter", "loaddata", "s3"]
lesson_areas=["Ingest"]
icon="database"
destinations=["spaces"]
minimum_tier="free-shared"
323 changes: 323 additions & 0 deletions notebooks/load-csv-data-s3-placeholder/notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,323 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"<div id=\"singlestore-header\" style=\"display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;\">\n",
" <div id=\"icon-image\" style=\"width: 90px; height: 90px;\">\n",
" <img width=\"100%\" height=\"100%\" src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/database.png\" />\n",
" </div>\n",
" <div id=\"text\" style=\"padding: 5px; margin-left: 10px;\">\n",
" <div id=\"badge\" style=\"display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%\">SingleStore Notebooks</div>\n",
" <h1 style=\"font-weight: 500; margin: 8px 0 0 4px;\">Importing Data from S3 into SingleStore using Pipelines</h1>\n",
" </div>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
" <div>\n",
" <p><b>Note</b></p>\n",
" <p>This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to <tt>Start</tt> using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.</p>\n",
" </div>\n",
"</div>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"This notebook demonstrates how to create a sample table in SingleStore, set up a pipeline to import data from an Amazon S3 bucket, and run queries on the imported data. It is designed for users who want to integrate S3 data with SingleStore and explore the capabilities of pipelines for efficient data ingestion."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"<h3>Demo Flow</h3>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataCSV.png width=\"100%\" hight=\"50%\"/>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"## Sample Table in SingleStore\n",
"\n",
"Start by creating a table that will hold the data imported from S3."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE TABLE IF NOT EXISTS sample_table (\n",
" id INT,\n",
" name VARCHAR(255),\n",
" age INT,\n",
" address TEXT,\n",
" created_at TIMESTAMP\n",
");"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"## Create a Pipeline to Import Data from S3\n",
"\n",
"You'll need to create a pipeline that pulls data from an S3 bucket into this table. This example assumes you have a CSV file in your S3 bucket.\n",
"\n",
"<i>Ensure that:\n",
"You have access to the S3 bucket.\n",
"Proper IAM roles or access keys are configured in SingleStore.\n",
"The CSV file has a structure that matches the table schema.</i>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a9c60e86-a548-4257-9130-5120e850aad0",
"metadata": {},
"source": [
"## Set Up Variables\n",
"\n",
"Define the <b>URL</b>, <b>REGION</b>, <b>ACCESS_KEY</b>, and <b>SECRET_ACCESS_KEY</b> variables for integration, replacing the placeholder values with your own."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "69c573a1-316c-49c2-9ac7-16327a302199",
"metadata": {},
"outputs": [],
"source": [
"URL = 's3://your-bucket-name/your-data-file.csv'\n",
"REGION = 'your-region'\n",
"ACCESS_KEY = 'access_key_id'\n",
"SECRET_ACCESS_KEY = 'access_secret_key'"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d8927379-38ca-427f-9de3-dcc76b5ba05e",
"metadata": {},
"source": [
"Using these identifiers and keys, execute the following statement."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"\n",
"CREATE PIPELINE s3_import_pipeline\n",
"AS LOAD DATA S3 '{{URL}}'\n",
"CONFIG '{\\\"REGION\\\":\\\"{{REGION}}\\\"}'\n",
"CREDENTIALS '{\\\"AWS_ACCESS_KEY_ID\\\": \\\"{{ACCESS_KEY}}\\\",\n",
" \\\"AWS_SECRET_ACCESS_KEY\\\": \\\"{{SECRET_ACCESS_KEY}}\\\"}'\n",
"INTO TABLE sample_table\n",
"FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\\\"'\n",
"LINES TERMINATED BY '\\n'\n",
"IGNORE 1 lines;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"## Start the Pipeline\n",
"\n",
"To start the pipeline and begin importing the data from the S3 bucket:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "24aba272-a594-4971-8d7c-640b31dcf216",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"START PIPELINE s3_import_pipeline;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "bea9d53a-45cc-4dd9-87d6-bdabd2ae6370",
"metadata": {},
"source": [
"## Select Data from the Table\n",
"\n",
"Once the data has been imported, you can run a query to select it:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3924e9fb-094a-467c-bdac-8f7826e63501",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM sample_table LIMIT 10;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"### Check if all data of the data is loaded"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT count(*) FROM sample_table"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"We have shown how to insert data from a Amazon S3 using `Pipelines` to SingleStoreDB. These techniques should enable you to\n",
"integrate your Amazon S3 with SingleStoreDB."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"## Clean up\n",
"\n",
"Remove the '#' to uncomment and execute the queries below to clean up the pipeline and table created."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"#### Drop Pipeline"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"#STOP PIPELINE s3_import_pipeline;\n",
"\n",
"#DROP PIPELINE s3_import_pipeline;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"#### Drop Data"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"#DROP TABLE sample_table;"
]
},
{
"cell_type": "markdown",
"id": "",
"metadata": {},
"source": [
"<div id=\"singlestore-footer\" style=\"background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px\"></div>\n",
"<div><img src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png\" style=\"padding: 0px; margin: 0px; height: 24px\"/></div>"
]
}
],
"metadata": {
"jupyterlab": {
"notebooks": {
"version_major": 6,
"version_minor": 4
}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 64e60fb

Please sign in to comment.