Skip to content

An unofficial R port of the Python package to download data off of the UCI ML repository

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

coatless-rpkg/ucimlrepo

Repository files navigation

ucimlrepo A hexagonal logo of the ucimlrepo R package that shows data being downloaded from a repository

R-CMD-check

The goal of ucimlrepo is to download and import data sets directly into R from the UCI Machine Learning Repository.

Important

This package is an unoffical port of the Python ucimlrepo package.

Note

Want to have datasets alongside a help documentation entry?

Check out the {ucidata} R package! The package provides a small selection of data sets from the UC Irvine Machine Learning Repository alongside of help entries.

Installation

You can install the development version of ucimlrepo from GitHub with:

# install.packages("remotes")
remotes::install_github("coatless-rpkg/ucimlrepo")

Usage

To use ucimlrepo, load the package using:

library(ucimlrepo)

With the package now loaded, we can download a dataset using the fetch_ucirepo() function or use the list_available_datasets() function to view a list of available datasets.

Download data

For example, to download the iris dataset, we can use:

# Fetch a dataset by name
iris_by_name <- fetch_ucirepo(name = "iris")
names(iris_by_name)
#> [1] "data"      "metadata"  "variables"

There are many levels to the data returned. For example, we can extract the original data frame containing the iris dataset using:

iris_uci <- iris_by_name$data$original
head(iris_uci)
#>   sepal length sepal width petal length petal width       class
#> 1          5.1         3.5          1.4         0.2 Iris-setosa
#> 2          4.9         3.0          1.4         0.2 Iris-setosa
#> 3          4.7         3.2          1.3         0.2 Iris-setosa
#> 4          4.6         3.1          1.5         0.2 Iris-setosa
#> 5          5.0         3.6          1.4         0.2 Iris-setosa
#> 6          5.4         3.9          1.7         0.4 Iris-setosa

Alternatively, we could retrieve two data frames, one for the features and one for the targets:

iris_features <- iris_by_name$data$features
iris_targets <- iris_by_name$data$targets

We can then view the first few rows of each data frame:

head(iris_features)
#>   sepal length sepal width petal length petal width
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3.0          1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> 5          5.0         3.6          1.4         0.2
#> 6          5.4         3.9          1.7         0.4
head(iris_targets)
#>         class
#> 1 Iris-setosa
#> 2 Iris-setosa
#> 3 Iris-setosa
#> 4 Iris-setosa
#> 5 Iris-setosa
#> 6 Iris-setosa

Alternatively, you can also directly query by using an ID found by using list_available_datasets() or by looking up the dataset on the UCI ML Repo website:

# Fetch a dataset by id
iris_by_id <- fetch_ucirepo(id = 53)

View list of data sets

We can also view a list of data sets available for download using the list_available_datasets() function:

# List available datasets
list_available_datasets()

Note

Not all 600+ datasets on UCI ML Repo are available for download using the package. The current list of available datasets can be viewed here.

If you would like to see a specific dataset added, please submit a comment on an issue ticket in the upstream repository.