Skip to content

Commit

Permalink
refactor: Refactor with SQLConnector class v2 (#12)
Browse files Browse the repository at this point in the history
Closes meltano/hub#995
#9
#5

Very similar to #4 but I
started again from scratch in case anything else had changed since then
because the PR is pretty old.
  • Loading branch information
pnadolny13 authored May 10, 2023
1 parent b68be55 commit 4e9cf88
Show file tree
Hide file tree
Showing 14 changed files with 1,841 additions and 783 deletions.
29 changes: 29 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates

version: 2
updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "weekly"
time: "13:00"
day: "monday"
timezone: "US/Central"
reviewers:
- "pnadolny13"
labels:
- "dependencies"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
time: "13:00"
day: "monday"
timezone: "US/Central"
reviewers:
- "pnadolny13"
labels:
- "dependencies"
55 changes: 55 additions & 0 deletions .github/workflows/ci_workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
### A CI workflow template that runs linting and python testing
### TODO: Modify as needed or as desired.

name: Test tap-athena

on: [push]

jobs:
linting:
runs-on: ubuntu-latest
strategy:
matrix:
# Only lint using the primary version used for dev
python-version: ["3.9"]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install pipx and Poetry
run: |
pip install pipx poetry
- name: Run lint command from tox.ini
run: |
pipx run tox -e lint
pytest:
runs-on: ubuntu-latest
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install Poetry
run: |
pip install poetry
- name: Install dependencies
run: |
poetry install
- name: create-json
id: create-json
uses: jsdaniell/[email protected]
with:
name: ".secrets/config.json"
json: ${{ secrets.CONFIG_JSON }}
- name: Test with pytest
run: |
poetry run pytest
98 changes: 72 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,42 @@
# tap-athena
# `tap-athena`

`tap-athena` is a Singer tap for Athena.
Athena tap class.

Built with the [Meltano Tap SDK](https://sdk.meltano.com) for Singer Taps.
Built with the [Meltano Singer SDK](https://sdk.meltano.com).

## Installation
## Capabilities

```bash
pipx install git+https://github.com/MeltanoLabs/tap-athena.git
```
* `catalog`
* `state`
* `discover`
* `about`
* `stream-maps`
* `schema-flattening`

## Configuration
## Settings

### Accepted Config Options
| Setting | Required | Default | Description |
|:---------------------|:--------:|:-------:|:------------|
| aws_access_key_id | True | None | |
| aws_secret_access_key| True | None | |
| aws_region | True | None | |
| s3_staging_dir | True | None | |
| schema_name | True | None | |
| stream_maps | False | None | Config object for stream maps capability. For more information check out [Stream Maps](https://sdk.meltano.com/en/latest/stream_maps.html). |
| stream_map_config | False | None | User-defined config values to be used within map expressions. |
| flattening_enabled | False | None | 'True' to enable schema flattening and automatically expand nested properties. |
| flattening_max_depth | False | None | The max depth to flatten schemas. |

- `aws_access_key_id`
- `aws_secret_access_key`
- `s3_staging_dir`
- `schema_name`
- `aws_region`
A full list of supported settings and capabilities is available by running: `tap-athena --about`

A full list of supported settings and capabilities for this
tap is available by running:
### Configure using environment variables

```bash
tap-athena --about
```
This Singer tap will automatically import any environment variables within the working directory's
`.env` if the `--config=ENV` is provided, such that config values will be considered if a matching
environment variable is set either in the terminal context or in the `.env` file.

### Source Authentication and Authorization

Authentication is performed using AWS credentials, as provided from config settings descried above.

## Usage

You can easily run `tap-athena` by itself or in a pipeline using [Meltano](https://meltano.com/).
Expand All @@ -45,20 +51,60 @@ tap-athena --config CONFIG --discover > ./catalog.json

## Developer Resources

Follow these instructions to contribute to this project.

### Initialize your Development Environment

```bash
# Install pipx if you haven't already
pip install pipx
pipx ensurepath

# Restart your terminal here, if needed, to get the updated PATH
pipx install poetry
poetry install

# Optional: Install Tox if you want to use it to run auto-formatters, linters, tests, etc.
pipx install tox
```

### Create and Run Tests

Create tests within the `tap_athena/tests` subfolder and
To run the automated tests, create the following test table in Athena. Make sure to alter to use your database name and S3 path.

```sql
CREATE EXTERNAL TABLE `my_sample_data`.`test_data` (
`complex-1` decimal(1),
`complex_2` int,
`Complex3` array < string >,
`complex_4_date` date,
`complex_5_bool` boolean,
`complex_6_float` float,
`complex_7_timestamp` timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://[YOUR_BUKET_NAME]/complex_data/'
TBLPROPERTIES (
'classification' = 'csv',
'skip.header.line.count' = '0',
'write.compression' = 'GZIP'
);


INSERT INTO "test_data" values(cast(2.0 as decimal(1,0)),2,ARRAY['d','e','f'], cast('2023-05-11' as date),false,cast(2.001 as real), CAST('2023-05-02 02:02:02.02' as TIMESTAMP));

select * from "test_data";
```

Add your config.json to the `.secrets` directory:

Create tests within the `tests` subfolder and
then run:

```bash
poetry run pytest
pipx run tox -e pytest
pipx run tox -e pytest -- tests/test_core.py
```

You can also test the `tap-athena` CLI interface directly using `poetry run`:
Expand Down Expand Up @@ -90,8 +136,8 @@ Now you can test and orchestrate using Meltano:
```bash
# Test invocation:
meltano invoke tap-athena --version
# OR run a test `elt` pipeline:
meltano elt tap-athena target-jsonl
# OR run a test `run` pipeline:
meltano run tap-athena target-jsonl
```

### SDK Dev Guide
Expand Down
12 changes: 12 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[mypy]
python_version = 3.9
warn_unused_configs = True

[mypy-boto3.*]
ignore_missing_imports = True

[mypy-botocore.exceptions.*]
ignore_missing_imports = True

[mypy-genson.*]
ignore_missing_imports = True
Loading

0 comments on commit 4e9cf88

Please sign in to comment.