remayn
is an open-source Python toolkit focused on results management for machine learning experiments.
It includes the required functionalities to save the complete results of an experiment, load them, and generate reports.
Overview | |
---|---|
CI/CD | |
Code |
remayn
is supported by Python >=3.8.
The easiest way to install remayn
is via pip
:
pip install remayn
A new Result
object can be created using the make_result
function. Then, the Result
can be saved to disk by simply calling the save()
method.
import numpy as np
from remayn.result import make_result
targets = np.array([1, 2, 3])
predictions = np.array([1.1, 2.2, 3.3])
config = {"model": "linear_regression", "dataset": "iris", "learning_rate": 1e-3}
result = make_result("./results",
config=config,
targets=targets,
predictions=predictions
)
result.save()
This will generate an unique identifier for this Result
and it will be saved in a subdirectory of the ./results
directory.
After saving the results of all the experiments, the set of results can be loaded using the ResultFolder
class, as shown in the following snippet:
from remayn.result_set import ResultFolder
rs = ResultFolder('./results')
Note that the same path used to save the results is employed here to load the ResultFolder
. The ResultFolder
object is a special type of ResultSet
and represents a set of results which have been loaded from disk.
After loading the results, the create_dataframe
method of the ResultSet
class can be used to generate a pandas.DataFrame
containing all the results. This method receives a callable which is used to compute the metrics from the targets and predictions stored in each Result
. Therefore, first we can define a function that computes the metrics:
def mse(y_true, y_pred):
return ((y_true - y_pred)**2).mean()
def _compute_metrics(targets, predictions):
return {
"mse": mse(targets, predictions),
}
Then, the create_dataframe
method of the ResultSet
is used:
from remayn.result_set import ResultFolder
rs = ResultFolder('./results')
df = rs.create_dataframe(
config_columns=[
"model",
"dataset",
"learning_rate",
],
metrics_fn=_compute_metrics,
)
Finally, the DataFrame can be saved to a file by using the existing pandas
methods:
df.to_excel('results.xlsx', index=False)
This will generate an Excel file that contains the column given in the config_columns
parameter along with the columns associated with the metrics computed in the function provided.
Code contributions to the remayn
project are welcomed via pull requests.
Please, contact the maintainers (maybe opening an issue) before doing any work to make sure that your contributions align with the project.
- You can clone the repository and then install the library from the local repository folder:
git clone [email protected]:ayrna/remayn.git
pip install ./remayn
- In order to set up the environment for development, install the project in editable mode and include the optional dev requirements:
pip install -e '.[dev]'
- Install the pre-commit hooks before starting to make any modifications:
pre-commit install
- Write code that is compatible with all supported versions of Python listed in the
pyproject.toml
file. - Create tests that cover the common cases and the corner cases of the code.
- Preserve backwards-compatibility whenever possible, and make clear if something must change.
- Document any portions of the code that might be less clear to others, especially to new developers.
- Write API documentation as docstrings.