Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markdownlint pre-commit check #36

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
7 changes: 7 additions & 0 deletions .mdl_style.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
all
exclude_rule 'MD014' # Dollar signs used before commands without showing output
exclude_rule 'MD029' # Ordered list item prefix
exclude_rule 'MD033' # Inline HTML
exclude_rule 'MD034' # Bare URL used
exclude_rule 'MD036' # Emphasis used instead of a header
rule 'MD013', :line_length => 120, :code_blocks => false, :tables => false
1 change: 1 addition & 0 deletions .mdlrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
style "#{File.dirname(__FILE__)}/.mdl_style.rb"
7 changes: 6 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,9 @@ repos:
rev: v2.2.6
hooks:
- id: codespell
files: ^.*\.(md|rst|yml)$
files: ^.*\.(md|rst|yml)$
- repo: https://github.com/markdownlint/markdownlint
rev: v0.11.0
hooks:
- id: markdownlint
files: ^.*\.(md)$
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ git remote add upstream https://github.com/lasp/developer-guide.git # For HTTPS
10. Iterate with the reviewer over any needed changes until the reviewer approves of the pull request. This may require
additional commits to the pull request. Once all changes are approved, merge the pull request.

<!-- markdownlint-disable-next-line MD026 -->
## Questions?

Any questions about this effort may be directed to the ``#ds-best-practices-documentation`` Slack channel.
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## How to build the documentation
# How to build the documentation

```bash
# Make the html documentation
Expand Down
30 changes: 21 additions & 9 deletions docs/source/data_management/file_formats/netcdf.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# NetCDF

>**Warning**
> This guide needs additional information

Expand All @@ -9,7 +10,9 @@ directly, without knowing how the data are stored, and metadata information may
* Self-describing, includes metadata
* Multi-dimensional array data model

The [netCDF data model](https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html) consists of the following:
The [netCDF data model](https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html) consists of the
following:

* variable
* Multi-dimensional array
* Column-oriented: each variable as a separate entity
Expand All @@ -22,13 +25,14 @@ The [netCDF data model](https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_da
* Akin to directories
* Avoid unless you really need the complex structure


## Why use NetCDF

NetCDF is a file format commonly used at LASP as it is the "highly preferred" format for NASA Earth Observing System
Data and Information System data products, per their Data Product Development Guide for Data Producers.
This affects all NASA Earth Science missions.

NetCDF features:

* Self-describing
* structure captures coordinate system (functional relationship)
* includes metadata
Expand All @@ -42,14 +46,17 @@ NetCDF features:
* Open specification (unlike IDL save files)

## Options available

There are two netCDF data models:

* NetCDF-3 classic
* NetCDF-4 built on HDF5
* recommended but prefer classic constructs

## How to use this data format

#### NetCDF Files
### NetCDF Files

* Binary format with open specification
* Requires software libraries to read and write C, Fortran, Java, python, IDL, ...
* Internal compression, don't bother to compress NetCDF files externally
Expand All @@ -58,7 +65,8 @@ There are two netCDF data models:
* nc file extension
* Don't be afraid of big files

#### Coordinate System
### Coordinate System

* Dimensions should be used to define a coordinate system
* e.g. temporal, spatial, spectral
* Avoid using dimensions to group data
Expand All @@ -72,14 +80,16 @@ There are two netCDF data models:
* shared dimensions
* Each variable should reuse dimensions to indicate that they share the same coordinates (domain set)

#### Time as Coordinate Variable
### Time as Coordinate Variable

* If the data are a function of a single time dimension then there should be a single time variable
* avoid breaking time up by date and time of day
* Prefer numeric time units
* time unit since an epoch
* e.g. "seconds since 1970-01-01", "microseconds since 1980-01-06"

#### Metadata
### Metadata

* Optional but useful to make NetCDF file self-describing
* attribute
* global (dataset level)
Expand All @@ -93,20 +103,22 @@ There are two netCDF data models:
* [Attribute Convention for Data Discovery (ACDD)](https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3)
* [udunits](https://www.unidata.ucar.edu/software/udunits/): standard units

#### Other useful variable attributes
### Other useful variable attributes

* _FillValue
* missing_value is considered deprecated and is not recommended by the NetCDF Users Group.
* NaN is another option, however, NaNs in files are handled differently in every language and so it may
be better to pick a value for official data products that many users will be using
* valid_range, valid_min, valid_max
* scale_factor, add_offset (packed values)
* [cell_methods](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#_data_representative_of_cells): standards for representing data cells (bins)
* [cell_methods](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#_data_representative_of_cells)
: standards for representing data cells (bins)
* e.g. daily average, wavelength bins

## Useful Links

* [NetCDF User's Guide](https://docs.unidata.ucar.edu/nug/current/)
* [NetCDF ToolsUI](https://docs.unidata.ucar.edu/netcdf-java/current/userguide/toolsui_ref.html)
* [NetCDF Workshop Materials](https://www.unidata.ucar.edu/software/netcdf/workshops/2011/index.html)


Credit: Content taken from a Confluence guide written by Doug Lindholm
12 changes: 6 additions & 6 deletions docs/source/licensing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@ Licenses provide legally binding guidelines for the use and distribution of soft
proprietary or free and open source.

## Purpose for this guideline

In Data Systems, many of our code repositories are open source. An open source license gives others explicit permission
to use any part of the code legally. This guide provides options for choosing the right license for your project.


## Options for this guideline

### Software

To avoid copyright concerns, it is recommended that:

1. Any software created by LASP is properly licensed to provide sufficient guidance on their usage
2. Any software used by LASP are licensed and used accordingly to protect against potential legal action from the owner
of that intellectual property
Expand All @@ -36,28 +37,27 @@ Some fairly common options:
* [MIT](https://opensource.org/license/MIT) - Short and sweet, very similar to BSD-3
* [Apache-2](https://opensource.org/license/apache-2-0) - Commonly used in the Java/Scala communities


Some examples from groups at LASP:

* MIT: <https://github.com/SWxTREC/enlilviz>
* Apache-2: <https://github.com/latis-data/latis3/>


*NOTE: There is a NASA Open Source License: <https://opensource.gsfc.nasa.gov/nosa.php>; However, it DOES NOT satisfy
the Free Software Foundation’s definition of open source.*

### Data

A creative commons license can be used to restrict who can use data and how they use it.

Creative Commons: <https://creativecommons.org/choose/>

Data rights qualifiers

* BY – Credit must be given to you, the creator.
* NC – Only noncommercial use of your work is permitted.
* ND – No derivatives or adaptations of your work are permitted.
* SA – Adaptations must be shared under the same terms.


You can mix and match the qualifiers on the data rights depending on what limitations you want to enact on the data
you’re distributing/producing.

Expand All @@ -69,18 +69,18 @@ you’re distributing/producing.
* CC BY-NC-SA 4.0 – Anyone can use the data/work, but NOT for commercial purposes and the work must be shared alike
(SA), meaning it must have the same terms of use.


## How to apply this guideline

<!-- markdownlint-disable-next-line MD024 -->
### Software

1. CU Venture Partners (CU lawyers) recommend using BSD-3 license.
2. Make sure that you put the license file in the root directory and call it `LICENSE` or `LICENSE.md` so that the code
repository (GitHub, GitLab, Bitbucket) can immediately identify what license your code is released under and let
contributors know.
3. Fill out the copyright, noting that it is NOT LASP, but *Regents of CU. Copyright (c) YYYY, Regents of the University
of Colorado*


## Useful Links

* [Public license selector](https://ufal.github.io/public-license-selector/)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,33 +1,35 @@
> **Warning:** More information is needed to complete this guideline.

# Python Packaging and Distribution

> > **Warning:** More information is needed to complete this guideline.

Examples of Python packaging and distribution options and how to use them.

## Purpose

> **Warning** Need to add an explanation of how this guideline supports DS workflows, meets internal and external
> policies, and aids in collaboration and our overall success

## Options

The options for Python packaging and distribution that we often see used at LASP are:

- [PyPI](#packaging-for-pypi--pip-install-)
- [Conda](#packaging-for-conda--conda-install-)

## Packaging for PyPI (`pip install`)

### PyPI resources:
### PyPI resources

- [PyPI Help Page](https://pypi.org/help/)

- [Setting up a PyPI account](https://pypi.org/account/register/)

- [Getting a PyPI access token](https://pypi.org/help/#apitoken)


### Built-In (`build` + `twine`)

> **Warning**: Need to add introductory paragraph that summarizes Built-In

#### How to use Built-In

Python Packaging User Guide: https://packaging.python.org/en/latest/
The link below is a fairly complete tutorial. There are also instructions there for using various other build tools:
https://packaging.python.org/en/latest/tutorials/packaging-projects/
Expand All @@ -37,10 +39,11 @@ https://packaging.python.org/en/latest/tutorials/packaging-projects/
- [Python Packaging User Guide](https://packaging.python.org/en/latest/)

#### Setuptools Example – Library Package

<details>
<summary>setup.py</summary>

```
```python
"""
Setup file for the science data processing pipeline.

Expand Down Expand Up @@ -86,6 +89,7 @@ setup(
}
)
```

</details>

### Poetry
Expand All @@ -95,7 +99,8 @@ setup(
[Poetry Build and Publish Docs](https://python-poetry.org/docs/cli/#build)

How to Publish to PyPI from Poetry
```

```bash
poetry lock
poetry install
poetry version
Expand All @@ -110,7 +115,7 @@ poetry publish # You will be prompted for your PyPI credentials if you don't pr
<details>
<summary>pyproject.toml</summary>

```
```toml
# pyproject.toml
# See: https://python-poetry.org/docs/pyproject/

Expand Down Expand Up @@ -158,12 +163,15 @@ poetry publish # You will be prompted for your PyPI credentials if you don't pr
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
```

</details>

## Packaging for Conda (`conda install`)

> **Warning**: Need a volunteer to expand on Conda

### How to install and use Conda

https://conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html

> Conda Develop:
Expand All @@ -172,6 +180,7 @@ conda recommend using `pip install` to install an editable package in developmen
> See: https://github.com/conda/conda-build/issues/1992

## Useful Links

Here are some helpful resources:

- [Python Packaging User's Guide](https://packaging.python.org/en/latest/)
Expand Down
2 changes: 2 additions & 0 deletions docs/source/programming_languages/python/terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Some Python terminology that a user might encounter, particularly when working through this Python guide.

## Purpose

Like all programming languages, Python has some terminology that is unique to it and it is helpful to have that language
explained. This page may be updated over time so that it holds the most useful terminology to those that use this
developer's guide.
Expand Down Expand Up @@ -35,6 +36,7 @@ boasts a similar dependency resolver to Conda. One major drawback to Conda in pa
and over instead of simply making code changes in place.

## Useful Links

Helpful links to additional resources on the topic

Credit: Content taken from a Confluence guide written by Gavin Medley
6 changes: 3 additions & 3 deletions docs/source/workflows/docker/beginner_guide_to_docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This guide is intended to provide an overview of what Docker is, how it's used,
containers. It will not go in depth on creating a Docker image, or on the more nuanced aspects of using Docker. For a
more in-depth introduction, you can read through the official Docker docs.

## A Beginner's Guide to Docker
## Overview

Docker is a tool for containerizing code. You can basically think of it as a lightweight virtual machine. Docker works
by defining an image which includes whatever you need to run your code. You start with a base image, which is a pre-made
Expand Down Expand Up @@ -48,7 +48,6 @@ So, you define a Docker *image* using a *Dockerfile* and/or a *Docker Compose* f
Docker *container*, which runs your code and environment. An image can be pushed up to a *registry*, where anyone with
access can pull the image and run the container themselves without needing access to the Dockerfile.


## Getting Started

This section will outline some basic commands and use cases for Docker. First, you need to
Expand Down Expand Up @@ -94,7 +93,7 @@ rebuild, and you can find a full list of flags [here](https://docs.docker.com/re
Now that we have built the image, we can see all the Docker images that are built on our system by running the
`docker images` command:

```
```plaintext
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker_tutorial latest 71736be7c555 5 minutes ago 91.9MB
Expand Down Expand Up @@ -176,6 +175,7 @@ docker image prune
```

## Useful Links

* [Official Docker documentation](https://docs.docker.com/)
* [Installing Docker engine](https://docs.docker.com/engine/install/)
* [Installing Docker Desktop for Mac](https://docs.docker.com/desktop/install/mac-install/)
Expand Down
Loading
Loading