Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development #9

Merged
merged 5 commits into from
Jan 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/workflows/deploymemt.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python application

on:
push:
branches: [ "main" ]
pull_request:
branches: ['main']
workflow_run:
workflows: ['ci']
types:
- completed

permissions:
contents: read

jobs:
cd:
# Only run this job if new work is pushed to "main"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
# Set up operating system
runs-on: ubuntu-latest
permissions:
id-token: write
environment:
name: pypi
url: https://pypi.org/p/ordinalgbt
# Define job steps
steps:
- name: Set up Python 3.9
uses: actions/setup-python@v3
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- uses: actions/checkout@v3
# Here we run build to create a wheel and a
# .tar.gz source distribution.
- name: Build package
run: python -m build --sdist --wheel
# Finally, we use a pre-defined action to publish
# our package in place of twine.
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
- name: Test install from PyPi
run: |
pip install ordinalgbt
35 changes: 1 addition & 34 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,37 +37,4 @@ jobs:
- uses: chartboost/ruff-action@v1
- name: Test with pytest
run: |
pytest
cd:
needs: ci
# Only run this job if new work is pushed to "main"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
# Set up operating system
runs-on: ubuntu-latest
permissions:
id-token: write
environment:
name: pypi
url: https://pypi.org/p/ordinalgbt
# Define job steps
steps:
- name: Set up Python 3.9
uses: actions/setup-python@v3
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- uses: actions/checkout@v3
# Here we run build to create a wheel and a
# .tar.gz source distribution.
- name: Build package
run: python -m build --sdist --wheel
# Finally, we use a pre-defined action to publish
# our package in place of twine.
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
- name: Test install from PyPi
run: |
pip install ordinalgbt
pytest
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,13 @@ The `predict_proba` method can be used to get the probabilities of each class:
y_proba = model.predict_proba(X_new)

print(y_proba)
```
```

## TODOs
* Create XGBoost and Catboost implementations
* Bring test coverage to 100%
* Implement the all-thresholds loss function
* Implement the ordistic loss function
* Create more stable sigmoid calculation
* Experiment with bounded and unbounded optimisation for the thresholds
* Identify way to reduce jumps due to large gradient
10 changes: 9 additions & 1 deletion docs/motivation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"Usually when faced with prediction problems involving ordered labels (i.e. low, medium, high) and tabular data, data scientists turn to regular multinomial classifiers from the gradient boosted tree family of models, because of their ease of use, speed of fitting, and good performance. Parametric ordinal models have been around for a while, but they have not been popular because of their poor performance compared to the gradient boosted models, especially for larger datasets.\n",
"\n",
"Although classifiers can predict ordinal labels adequately, they require building as many classifiers as there are labels to predict. This approach, however, leads to slower training times, and confusing feature interpretations. For example, a feature which is positively associated with the increasing order of the label set (i.e. as the feature's value grows, so do the probabilities of the higher ordered labels), will va a positive association with the highest ordered label, negative with the lowest ordered, and a \"concave\" association with the middle ones."
"Although classifiers can predict ordinal labels adequately, they require building as many classifiers as there are labels to predict. This approach, however, leads to slower training times, and confusing feature interpretations. For example, a feature which is positively associated with the increasing order of the label set (i.e. as the feature's value grows, so do the probabilities of the higher ordered labels), will va a positive association with the highest ordered label, negative with the lowest ordered, and a \"concave\" association with the middle ones.\n"
]
},
{
Expand All @@ -33,6 +33,14 @@
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"There's been recurring requests from the community for an ordinal loss implementation in all of the major gradient boosting model frameworks ([LightGBM](https://github.com/microsoft/LightGBM/issues/5882), [XGBoost](https://github.com/dmlc/xgboost/issues/5243), [XGBoost](https://github.com/dmlc/xgboost/issues/695), [CatBoost](https://github.com/catboost/catboost/issues/1994))."
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down