Various documentation updates

cyralinc · Jun 4, 2024 · d475e3f · d475e3f
1 parent d287b24
commit d475e3f
Show file tree

Hide file tree

Showing 3 changed files with 181 additions and 2 deletions.
diff --git a/.github/workflows/doc2go.yaml b/.github/workflows/doc2go.yaml
@@ -0,0 +1,61 @@
+name: Publish Go Docs
+
+on:
+  # Publish documentation when a new release is tagged.
+  push:
+    tags: [ 'v*' ]
+
+  # Allow manually publishing documentation from a specific hash.
+  workflow_dispatch:
+    inputs:
+      head:
+        description: "Git commit to publish documentation for."
+        required: true
+        type: string
+
+# If two concurrent runs are started, prefer the latest one.
+concurrency:
+  group: "pages"
+  cancel-in-progress: true
+
+jobs:
+  build:
+    name: Build godoc website
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          # Check out head specified by workflow_dispatch,
+          # or the tag if this fired from the push event.
+          ref: ${{ inputs.head || github.ref }}
+      - name: Setup Go
+        uses: actions/setup-go@v3
+        with:
+          go-version: stable
+          cache: true
+      - name: Install doc2go
+        run: go install go.abhg.dev/doc2go@latest
+      - name: Generate API reference
+        run: doc2go -home github.com/${{ github.repository }} ./...
+      - name: Upload pages
+        uses: actions/upload-pages-artifact@v1
+
+  publish:
+    name: Publish godoc website
+    # Don't run until the build has finished running.
+    needs: build
+    # Grants the GITHUB_TOKEN used by this job permissions needed to publish
+    # the doc website.
+    permissions:
+      pages: write
+      id-token: write
+    # Deploy to the github-pages environment
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v1
diff --git a/README.md b/README.md
@@ -92,6 +92,14 @@ $ dmap --help
 $ dmap repo-scan --help
 ```
 
+It is recommended to pass secure values via environment variables, e.g.:
+
+```bash
+# Read password from stdin and export as environment variable.
+$ read -rs PASSWORD && export PASSWORD
+$ dmap repo-scan --password $PASSWORD # ... other flags ...
+``` 
+
 ### Installation
 
 The Dmap CLI can be installed as a native binary, a Docker image, or directly 
@@ -154,6 +162,8 @@ go install github.com/cyralinc/dmap/cmd/dmap@<version>
 
 ## Go Library
 
+[API Reference Docs](https://cyralinc.github.io/dmap/)
+
 The Dmap Go library provides APIs to scan cloud environments to discover data
 repositories in those environments, as well as scan individual data repositories
 for sensitive data.
@@ -337,8 +347,50 @@ Additional repository types can be added by implementing the [`sql.Repository`](
 interface and registering it in a [`sql.Registry`](sql/registry.go). See the
 [`sql`](sql) package for more details.
 
-See the [`classification`](classification) package for more details on how to
-define and use data labels for classifying sensitive data.
+#### Custom Data Labels
+
+The Dmap library allows you to define custom data labels for classifying
+sensitive data in data repositories. Each data label has a name, description,
+and a set of tags that can be used to group labels, e.g. "PII", "PCI", "HIPAA", 
+etc.
+
+Labels are defined as OPA Rego policies and are loaded at runtime by the 
+repository scanner. The metadata for the labels is defined in a [`labels.yaml`](classification/labels/labels.yaml)
+file. This can be passed to the scanner via the `LabelsYamlFilename` field in
+the `ScannerConfig` struct, e.g.:
+
+```Go
+cfg := sql.ScannerConfig{
+	LabelsYamlFilename: "/path/to/labels.yaml", 
+	// Other fields...
+}
+scanner, err := sql.NewScanner(context.Background(), cfg)
+```
+
+If using the Dmap CLI, the `--label-yaml-file` flag can be used to specify the
+path to the labels YAML file, e.g.:
+
+```bash
+$ dmap repo-scan \
+  --label-yaml-file "/path/to/labels.yaml" \
+  # Other flags...
+```
+
+See the [`labels`](classification/labels) package for more details on how to
+define and use data labels for classifying sensitive data. Additionally, see the
+[`labels.yaml`](classification/labels/labels.yaml) file for an example of the
+file format and how to define custom data labels.
+
+#### Connection String Parameters
+
+The database connection string is currently hardcoded for each repository type
+(see https://github.com/cyralinc/dmap/issues/101 for discussion about possible
+future improvements). For Postgres repositories, the connection string is
+configurable using [environment variables](https://pkg.go.dev/github.com/lib/pq#hdr-Connection_String_Parameters).
+If you need to set additional connection parameters for other repository types,
+you will need to modify the code or provide a new `Repository` implementation.
+Please open and issue and/or pull request if you have any suggestions or
+contributions.
 
 ## Resources
 

diff --git a/classification/labels/README.md b/classification/labels/README.md
@@ -6,3 +6,69 @@ individual Rego files for each label.
 To add a new predefined label, add its metadata to [`labels.yaml`](labels.yaml)
 (following the file's instructions), as well as a corresponding classification
 rule Rego file.
+
+## Classification Rule Rego Files
+
+Each label has a corresponding Rego file that defines the classification rule
+for that label. The Rego file should be named after the label, with the `.rego`
+extension. For example, the classification rule for the label `first_name`
+should be defined in a file named `first_name.rego`.
+
+The package for the rule should be named `classifier_<label>`, where `<label>`
+is the name of the label in lowercase. For example, the package for the
+classification rule for the label `first_name` should be named
+`classifier_first_name`.
+
+Rules should also have tests defined in a file named `<label>_test.rego`.
+
+All Rego files (including tests) should be linted using [`regal`](https://www.openpolicyagent.org/integrations/regal/) 
+to ensure they are formatted correctly, e.g.
+
+```bash
+$ regal lint /path/to/label.rego
+```
+
+### Input and Output
+
+The input data for a classification rule is a JSON object containing the data
+to be classified. This often represents a database table sample, for example.
+The key names in the input data object correspond to the column names in the
+database table, and the values are the sampled data in the table. For example,
+input data representing a data sample from a database table called `users`
+might look like this:
+
+```json
+{
+  "first_name": "John",
+  "last_name": "Doe",
+  "email": "[email protected]"
+}
+```
+
+Each rule must define an output variable named `output`, which must an 
+[object](https://www.openpolicyagent.org/docs/latest/policy-language/#objects)
+of the form:
+
+```json
+{
+  "key": boolean
+}
+```
+
+where `key` is each key from the input data, and `boolean` is a boolean value
+indicating whether the key is classified as the label or not. For example, the
+output object for the `first_name` label using the example input data above
+would look like this:
+
+```json
+{
+  "first_name": true,
+  "last_name": false,
+  "email": false
+}
+```
+
+See this example on the [Rego Playground](https://play.openpolicyagent.org/p/niTDt5JwN8).
+
+Please see the existing classification rules and their tests for examples of how
+to write classification rules.