Skip to content
This repository has been archived by the owner on Jun 22, 2021. It is now read-only.

Commit

Permalink
Merge pull request #23 from alan-turing-institute/dev
Browse files Browse the repository at this point in the history
For a 0.2.5 release
  • Loading branch information
ablaom authored Apr 21, 2020
2 parents b65844b + d290f1a commit 0e63c49
Show file tree
Hide file tree
Showing 9 changed files with 270 additions and 175 deletions.
5 changes: 3 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
name = "MLJScientificTypes"
uuid = "2e2323e0-db8b-457b-ae0d-bdfb3bc63afd"
authors = ["Anthony D. Blaom <[email protected]>"]
version = "0.2.4"
version = "0.2.5"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
ColorTypes = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
ScientificTypes = "321657f4-b219-11e9-178b-2701a2544e81"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
Expand All @@ -14,7 +15,7 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
CategoricalArrays = "^0.7"
ColorTypes = "^0.9,^0.10"
PrettyTables = "^0.8,^0.9"
ScientificTypes = "^0.7"
ScientificTypes = "^0.8"
Tables = "^1.0"
julia = "1"

Expand Down
96 changes: 77 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,83 @@
| :-----------: | :------: | :-----------: |
| [![Build Status](https://travis-ci.org/alan-turing-institute/MLJScientificTypes.jl.svg?branch=master)](https://travis-ci.org/alan-turing-institute/MLJScientificTypes.jl) | [![codecov.io](http://codecov.io/github/alan-turing-institute/MLJScientificTypes.jl/coverage.svg?branch=master)](http://codecov.io/github/alan-turing-institute/MLJScientificTypes.jl?branch=master) | [![](https://img.shields.io/badge/docs-dev-blue.svg)](https://alan-turing-institute.github.io/MLJScientificTypes.jl/dev)

Implementation of the MLJ convention for [Scientific Types](https://github.com/alan-turing-institute/ScientificTypes.jl).
Scientific Types allow the distinction between **machine type** and
Implementation of a convention for [scientific
types](https://github.com/alan-turing-institute/ScientificTypes.jl),
as used in the [MLJ
universe](https://github.com/alan-turing-institute/MLJ.jl).

**Important note.** While this document refers to the *MLJ convention*,
this convention could (and, hopefully, will) be adopted in
statistical/scientific software outside of the MLJ project. Of its
dependencies, only the tiny package
[ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl)
has any direct connection to MLJ.

This package makes a distinction between **machine type** and
**scientific type**:

* the _machine type_ is a Julia type the data is currently encoded as (for instance: `Float64`)
* the _scientific type_ is a type defined by this package which
encapsulates how the data should be _interpreted_ (for instance:
`Continuous` or `Multiclass`)
* The _machine type_ refers to the Julia type begin used to represent
the data (for instance, `Float64`).

* The _scientific type_ is one of the types defined in
[ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl)
reflecting how the data should be _interpreted_ (for instance,
`Continuous` or `Multiclass`).


#### Contents

- [Installation](#installation)
- [Who is this repository for?](#who-is-this-repository-for)
- [What's provided here?](#what-is-provided-here)
- [Very quick start](#very-quick-start)

## Installation

```julia
using Pkg
Pkg.add(MLJScientificTypes)
```

## Who is this repository for?

This repository has two kinds of users in mind:

- users of software in the [MLJ
universe](https://github.com/alan-turing-institute/MLJ.jl) seeking a
deeper understanding of the use of scientific types and associated
tools; *these users do not need to directly install this package*
but may find its documentation helpful

- developers of statistical and scientific software who want to
articulate their data type requirements in a generic,
purpose-oriented way, and who are furthermore happy to adopt an
existing convention about what data types should be used for
what purpose (a convention that has been successfully adopted in an
existing large scale Julia project)

Developers interested in implementing a different convention will
instead import [Scientific
Types.jl](https://github.com/alan-turing-institute/ScientificTypes.jl),
following the documentation there, possibly using this repo as a
template.

## What's provided here?

The module `MLJScientificTypes` defined in this repo rexports the
scientific types and associated methods defined in [Scientific
Types.jl](https://github.com/alan-turing-institute/ScientificTypes.jl)
and provides:

- a collection of `ScientificTypes.scitype` definitions that
articulate the MLJ convention, importing the module automatically
activating the convention

- a `coerce` function, for changing machine types to reflect a specified
scientific interpretation (scientific type)

- an `autotype` fuction for "guessing" the intended scientific type of data

Determining what scientific type should be given to what data is determined
by a convention such as the one this package implements which is the one
in use in the [MLJ](https://github.com/alan-turing-institute/MLJ.jl) universe.

## Very quick start

Expand Down Expand Up @@ -57,7 +122,7 @@ julia> sch.names
(:a, :b, :c, :d, :e)
```

Now you could want to specify that `b` is actually a `Count`, and that `d` and `e` are `Multiclass`; this is done with the `coerce` function:
To specify that instead `b` should be regared as `Count`, and that both `d` and `e` are `Multiclass`, we use the `coerce` function:

```julia
Xc = coerce(X, :b=>Count, :d=>Multiclass, :e=>Multiclass)
Expand All @@ -74,17 +139,10 @@ _.table =
│ a │ Float64 │ Continuous │
│ b │ Union{Missing, Int64} │ Union{Missing, Count} │
│ c │ Int64 │ Count │
│ d │ CategoricalValue{Int64,UInt32} │ Multiclass{2} │
│ e │ Union{Missing, CategoricalValue{Char,UInt32}} │ Union{Missing, Multiclass{2}} │
│ d │ CategoricalValue{Int64,UInt32} │ Multiclass{2} │
│ e │ Union{Missing, CategoricalValue{Char,UInt32}}│ Union{Missing, Multiclass{2}} │
└─────────┴──────────────────────────────────────────────┴───────────────────────────────┘
_.nrows = 5
```

Note that a warning is shown as you ask to convert a `Union{Missing,T}` to a
`S` which ultimately results in a `Union{Missing,S}`. See the docs for more
details. Compare with the following call which leads to the same result but
shows no warning:

```
Xc = coerce(X, :b=>Union{Missing,Count}, :d=>Multiclass, :e=>Union{Missing,Multiclass})
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
ScientificTypes = "321657f4-b219-11e9-178b-2701a2544e81"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
Expand Down
4 changes: 2 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
using Documenter, MLJScientificTypes
using Documenter, MLJScientificTypes, ScientificTypes

makedocs(
modules = [MLJScientificTypes],
modules = [MLJScientificTypes, ScientificTypes],
format = Documenter.HTML(
prettyurls = !("local" in ARGS),
),
Expand Down
Loading

0 comments on commit 0e63c49

Please sign in to comment.