Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various documentation updates #102

Merged
merged 2 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/doc2go.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Publish Go Docs

on:
# Publish documentation when a new release is tagged.
push:
tags: [ 'v*' ]

# Allow manually publishing documentation from a specific hash.
workflow_dispatch:
inputs:
head:
description: "Git commit to publish documentation for."
required: true
type: string

# If two concurrent runs are started, prefer the latest one.
concurrency:
group: "pages"
cancel-in-progress: true

jobs:
build:
name: Build godoc website
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
with:
# Check out head specified by workflow_dispatch,
# or the tag if this fired from the push event.
ref: ${{ inputs.head || github.ref }}
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: stable
cache: true
- name: Install doc2go
run: go install go.abhg.dev/doc2go@latest
- name: Generate API reference
run: doc2go -home github.com/${{ github.repository }} ./...
- name: Upload pages
uses: actions/upload-pages-artifact@v1

publish:
name: Publish godoc website
# Don't run until the build has finished running.
needs: build
# Grants the GITHUB_TOKEN used by this job permissions needed to publish
# the doc website.
permissions:
pages: write
id-token: write
# Deploy to the github-pages environment
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v1
56 changes: 54 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,14 @@ $ dmap --help
$ dmap repo-scan --help
```

It is recommended to pass secure values via environment variables, e.g.:

```bash
# Read password from stdin
$ read -rs PASSWORD
$ dmap repo-scan --password $PASSWORD # ... other flags ...
```

### Installation

The Dmap CLI can be installed as a native binary, a Docker image, or directly
Expand Down Expand Up @@ -154,6 +162,8 @@ go install github.com/cyralinc/dmap/cmd/dmap@<version>

## Go Library

[API Reference Docs](https://cyralinc.github.io/dmap/)

The Dmap Go library provides APIs to scan cloud environments to discover data
repositories in those environments, as well as scan individual data repositories
for sensitive data.
Expand Down Expand Up @@ -337,8 +347,50 @@ Additional repository types can be added by implementing the [`sql.Repository`](
interface and registering it in a [`sql.Registry`](sql/registry.go). See the
[`sql`](sql) package for more details.

See the [`classification`](classification) package for more details on how to
define and use data labels for classifying sensitive data.
#### Custom Data Labels

The Dmap library allows you to define custom data labels for classifying
sensitive data in data repositories. Each data label has a name, description,
and a set of tags that can be used to group labels, e.g. "PII", "PCI", "HIPAA",
etc.

Labels are defined as OPA Rego policies and are loaded at runtime by the
repository scanner. The metadata for the labels is defined in a [`labels.yaml`](classification/labels/labels.yaml)
file. This can be passed to the scanner via the `LabelsYamlFilename` field in
the `ScannerConfig` struct, e.g.:

```Go
cfg := sql.ScannerConfig{
LabelsYamlFilename: "/path/to/labels.yaml",
// Other fields...
}
scanner, err := sql.NewScanner(context.Background(), cfg)
```

If using the Dmap CLI, the `--label-yaml-file` flag can be used to specify the
path to the labels YAML file, e.g.:

```bash
$ dmap repo-scan \
--label-yaml-file "/path/to/labels.yaml" \
# Other flags...
```

See the [`labels`](classification/labels) package for more details on how to
define and use data labels for classifying sensitive data. Additionally, see the
[`labels.yaml`](classification/labels/labels.yaml) file for an example of the
file format and how to define custom data labels.

#### Connection String Parameters

The database connection string is currently hardcoded for each repository type
(see https://github.com/cyralinc/dmap/issues/101 for discussion about possible
future improvements). For Postgres repositories, the connection string is
configurable using [environment variables](https://pkg.go.dev/github.com/lib/pq#hdr-Connection_String_Parameters).
If you need to set additional connection parameters for other repository types,
you will need to modify the code or provide a new `Repository` implementation.
Please open and issue and/or pull request if you have any suggestions or
contributions.

## Resources

Expand Down
66 changes: 66 additions & 0 deletions classification/labels/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,69 @@ individual Rego files for each label.
To add a new predefined label, add its metadata to [`labels.yaml`](labels.yaml)
(following the file's instructions), as well as a corresponding classification
rule Rego file.

## Classification Rule Rego Files

Each label has a corresponding Rego file that defines the classification rule
for that label. The Rego file should be named after the label, with the `.rego`
extension. For example, the classification rule for the label `first_name`
should be defined in a file named `first_name.rego`.

The package for the rule should be named `classifier_<label>`, where `<label>`
is the name of the label in lowercase. For example, the package for the
classification rule for the label `first_name` should be named
`classifier_first_name`.

Rules should also have tests defined in a file named `<label>_test.rego`.

All Rego files (including tests) should be linted using [`regal`](https://www.openpolicyagent.org/integrations/regal/)
to ensure they are formatted correctly, e.g.

```bash
$ regal lint /path/to/label.rego
```

### Input and Output

The input data for a classification rule is a JSON object containing the data
to be classified. This often represents a database table sample, for example.
The key names in the input data object correspond to the column names in the
database table, and the values are the sampled data in the table. For example,
input data representing a data sample from a database table called `users`
might look like this:

```json
{
"first_name": "John",
"last_name": "Doe",
"email": "[email protected]"
}
```

Each rule must define an output variable named `output`, which must an
[object](https://www.openpolicyagent.org/docs/latest/policy-language/#objects)
of the form:

```json
{
"key": boolean
}
```

where `key` is each key from the input data, and `boolean` is a boolean value
indicating whether the key is classified as the label or not. For example, the
output object for the `first_name` label using the example input data above
would look like this:

```json
{
"first_name": true,
"last_name": false,
"email": false
}
```

See this example on the [Rego Playground](https://play.openpolicyagent.org/p/niTDt5JwN8).

Please see the existing classification rules and their tests for examples of how
to write classification rules.