Skip to content

Commit

Permalink
Various documentation updates
Browse files Browse the repository at this point in the history
  • Loading branch information
ccampo133 committed Jun 4, 2024
1 parent d287b24 commit d475e3f
Show file tree
Hide file tree
Showing 3 changed files with 181 additions and 2 deletions.
61 changes: 61 additions & 0 deletions .github/workflows/doc2go.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Publish Go Docs

on:
# Publish documentation when a new release is tagged.
push:
tags: [ 'v*' ]

# Allow manually publishing documentation from a specific hash.
workflow_dispatch:
inputs:
head:
description: "Git commit to publish documentation for."
required: true
type: string

# If two concurrent runs are started, prefer the latest one.
concurrency:
group: "pages"
cancel-in-progress: true

jobs:
build:
name: Build godoc website
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
with:
# Check out head specified by workflow_dispatch,
# or the tag if this fired from the push event.
ref: ${{ inputs.head || github.ref }}
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: stable
cache: true
- name: Install doc2go
run: go install go.abhg.dev/doc2go@latest
- name: Generate API reference
run: doc2go -home github.com/${{ github.repository }} ./...
- name: Upload pages
uses: actions/upload-pages-artifact@v1

publish:
name: Publish godoc website
# Don't run until the build has finished running.
needs: build
# Grants the GITHUB_TOKEN used by this job permissions needed to publish
# the doc website.
permissions:
pages: write
id-token: write
# Deploy to the github-pages environment
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v1
56 changes: 54 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,14 @@ $ dmap --help
$ dmap repo-scan --help
```

It is recommended to pass secure values via environment variables, e.g.:

```bash
# Read password from stdin and export as environment variable.
$ read -rs PASSWORD && export PASSWORD
$ dmap repo-scan --password $PASSWORD # ... other flags ...
```

### Installation

The Dmap CLI can be installed as a native binary, a Docker image, or directly
Expand Down Expand Up @@ -154,6 +162,8 @@ go install github.com/cyralinc/dmap/cmd/dmap@<version>

## Go Library

[API Reference Docs](https://cyralinc.github.io/dmap/)

The Dmap Go library provides APIs to scan cloud environments to discover data
repositories in those environments, as well as scan individual data repositories
for sensitive data.
Expand Down Expand Up @@ -337,8 +347,50 @@ Additional repository types can be added by implementing the [`sql.Repository`](
interface and registering it in a [`sql.Registry`](sql/registry.go). See the
[`sql`](sql) package for more details.

See the [`classification`](classification) package for more details on how to
define and use data labels for classifying sensitive data.
#### Custom Data Labels

The Dmap library allows you to define custom data labels for classifying
sensitive data in data repositories. Each data label has a name, description,
and a set of tags that can be used to group labels, e.g. "PII", "PCI", "HIPAA",
etc.

Labels are defined as OPA Rego policies and are loaded at runtime by the
repository scanner. The metadata for the labels is defined in a [`labels.yaml`](classification/labels/labels.yaml)
file. This can be passed to the scanner via the `LabelsYamlFilename` field in
the `ScannerConfig` struct, e.g.:

```Go
cfg := sql.ScannerConfig{
LabelsYamlFilename: "/path/to/labels.yaml",
// Other fields...
}
scanner, err := sql.NewScanner(context.Background(), cfg)
```

If using the Dmap CLI, the `--label-yaml-file` flag can be used to specify the
path to the labels YAML file, e.g.:

```bash
$ dmap repo-scan \
--label-yaml-file "/path/to/labels.yaml" \
# Other flags...
```

See the [`labels`](classification/labels) package for more details on how to
define and use data labels for classifying sensitive data. Additionally, see the
[`labels.yaml`](classification/labels/labels.yaml) file for an example of the
file format and how to define custom data labels.

#### Connection String Parameters

The database connection string is currently hardcoded for each repository type
(see https://github.com/cyralinc/dmap/issues/101 for discussion about possible
future improvements). For Postgres repositories, the connection string is
configurable using [environment variables](https://pkg.go.dev/github.com/lib/pq#hdr-Connection_String_Parameters).
If you need to set additional connection parameters for other repository types,
you will need to modify the code or provide a new `Repository` implementation.
Please open and issue and/or pull request if you have any suggestions or
contributions.

## Resources

Expand Down
66 changes: 66 additions & 0 deletions classification/labels/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,69 @@ individual Rego files for each label.
To add a new predefined label, add its metadata to [`labels.yaml`](labels.yaml)
(following the file's instructions), as well as a corresponding classification
rule Rego file.

## Classification Rule Rego Files

Each label has a corresponding Rego file that defines the classification rule
for that label. The Rego file should be named after the label, with the `.rego`
extension. For example, the classification rule for the label `first_name`
should be defined in a file named `first_name.rego`.

The package for the rule should be named `classifier_<label>`, where `<label>`
is the name of the label in lowercase. For example, the package for the
classification rule for the label `first_name` should be named
`classifier_first_name`.

Rules should also have tests defined in a file named `<label>_test.rego`.

All Rego files (including tests) should be linted using [`regal`](https://www.openpolicyagent.org/integrations/regal/)
to ensure they are formatted correctly, e.g.

```bash
$ regal lint /path/to/label.rego
```

### Input and Output

The input data for a classification rule is a JSON object containing the data
to be classified. This often represents a database table sample, for example.
The key names in the input data object correspond to the column names in the
database table, and the values are the sampled data in the table. For example,
input data representing a data sample from a database table called `users`
might look like this:

```json
{
"first_name": "John",
"last_name": "Doe",
"email": "[email protected]"
}
```

Each rule must define an output variable named `output`, which must an
[object](https://www.openpolicyagent.org/docs/latest/policy-language/#objects)
of the form:

```json
{
"key": boolean
}
```

where `key` is each key from the input data, and `boolean` is a boolean value
indicating whether the key is classified as the label or not. For example, the
output object for the `first_name` label using the example input data above
would look like this:

```json
{
"first_name": true,
"last_name": false,
"email": false
}
```

See this example on the [Rego Playground](https://play.openpolicyagent.org/p/niTDt5JwN8).

Please see the existing classification rules and their tests for examples of how
to write classification rules.

0 comments on commit d475e3f

Please sign in to comment.