From d634ed8ec0fe3176e79ffefd376d74d6d4bbdb04 Mon Sep 17 00:00:00 2001 From: adamingas Date: Thu, 18 Jan 2024 12:03:35 +0000 Subject: [PATCH 1/5] Add todo in readme --- README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c1bb560..ca2bd35 100644 --- a/README.md +++ b/README.md @@ -56,4 +56,13 @@ The `predict_proba` method can be used to get the probabilities of each class: y_proba = model.predict_proba(X_new) print(y_proba) -``` \ No newline at end of file +``` + +## TODOs +* Create XGBoost and Catboost implementations +* Bring test coverage to 100% +* Implement the all-thresholds loss function +* Implement the ordistic loss function +* Create more stable sigmoid calculation +* Experiment with bounded and unbounded optimisation for the thresholds +* Identify way to reduce jumps due to large gradient \ No newline at end of file From 9712b801bbc36279f1be0b01e2683482f124ebcd Mon Sep 17 00:00:00 2001 From: adamingas Date: Thu, 18 Jan 2024 12:04:08 +0000 Subject: [PATCH 2/5] doc: add request links for ordinal loss implementation in gradient boosting frameworks --- docs/motivation.ipynb | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/motivation.ipynb b/docs/motivation.ipynb index 620d45b..0021c59 100644 --- a/docs/motivation.ipynb +++ b/docs/motivation.ipynb @@ -9,7 +9,7 @@ "\n", "Usually when faced with prediction problems involving ordered labels (i.e. low, medium, high) and tabular data, data scientists turn to regular multinomial classifiers from the gradient boosted tree family of models, because of their ease of use, speed of fitting, and good performance. Parametric ordinal models have been around for a while, but they have not been popular because of their poor performance compared to the gradient boosted models, especially for larger datasets.\n", "\n", - "Although classifiers can predict ordinal labels adequately, they require building as many classifiers as there are labels to predict. This approach, however, leads to slower training times, and confusing feature interpretations. For example, a feature which is positively associated with the increasing order of the label set (i.e. as the feature's value grows, so do the probabilities of the higher ordered labels), will va a positive association with the highest ordered label, negative with the lowest ordered, and a \"concave\" association with the middle ones." + "Although classifiers can predict ordinal labels adequately, they require building as many classifiers as there are labels to predict. This approach, however, leads to slower training times, and confusing feature interpretations. For example, a feature which is positively associated with the increasing order of the label set (i.e. as the feature's value grows, so do the probabilities of the higher ordered labels), will va a positive association with the highest ordered label, negative with the lowest ordered, and a \"concave\" association with the middle ones.\n" ] }, { @@ -33,6 +33,14 @@ "" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "There's been recurring requests from the community for an ordinal loss implementation in all of the major gradient boosting model frameworks ([LightGBM](https://github.com/microsoft/LightGBM/issues/5882), [XGBoost](https://github.com/dmlc/xgboost/issues/5243), [XGBoost](https://github.com/dmlc/xgboost/issues/695), [CatBoost](https://github.com/catboost/catboost/issues/1994))." + ] + }, { "cell_type": "markdown", "metadata": { From 5b49b72073dea73e2be1ec47e0abe932e4ceca75 Mon Sep 17 00:00:00 2001 From: adamingas Date: Sat, 20 Jan 2024 11:10:55 +0000 Subject: [PATCH 3/5] new deployment file --- .github/workflows/deploymemt.yml | 49 ++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 .github/workflows/deploymemt.yml diff --git a/.github/workflows/deploymemt.yml b/.github/workflows/deploymemt.yml new file mode 100644 index 0000000..7f1cd27 --- /dev/null +++ b/.github/workflows/deploymemt.yml @@ -0,0 +1,49 @@ +# This workflow will install Python dependencies, run tests and lint with a single version of Python +# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python + +name: Python application + +on: + push: + branches: [ "main" ] + workflow_run: + workflows: ['ci'] + types: + - completed + +permissions: + contents: read + +jobs: + cd: + # Only run this job if new work is pushed to "main" + if: github.event_name == 'push' && github.ref == 'refs/heads/main' + # Set up operating system + runs-on: ubuntu-latest + permissions: + id-token: write + environment: + name: pypi + url: https://pypi.org/p/ordinalgbt + # Define job steps + steps: + - name: Set up Python 3.9 + uses: actions/setup-python@v3 + with: + python-version: 3.9 + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install build + - uses: actions/checkout@v3 + # Here we run build to create a wheel and a + # .tar.gz source distribution. + - name: Build package + run: python -m build --sdist --wheel + # Finally, we use a pre-defined action to publish + # our package in place of twine. + - name: Publish to PyPI + uses: pypa/gh-action-pypi-publish@release/v1 + - name: Test install from PyPi + run: | + pip install ordinalgbt \ No newline at end of file From 59065f5d2dee69b92d0594913957cd17a47f473a Mon Sep 17 00:00:00 2001 From: adamingas Date: Sat, 20 Jan 2024 11:11:14 +0000 Subject: [PATCH 4/5] removes cd from python application --- .github/workflows/python-app.yml | 35 +------------------------------- 1 file changed, 1 insertion(+), 34 deletions(-) diff --git a/.github/workflows/python-app.yml b/.github/workflows/python-app.yml index 9e85463..d1bf6a3 100644 --- a/.github/workflows/python-app.yml +++ b/.github/workflows/python-app.yml @@ -37,37 +37,4 @@ jobs: - uses: chartboost/ruff-action@v1 - name: Test with pytest run: | - pytest - cd: - needs: ci - # Only run this job if new work is pushed to "main" - if: github.event_name == 'push' && github.ref == 'refs/heads/main' - # Set up operating system - runs-on: ubuntu-latest - permissions: - id-token: write - environment: - name: pypi - url: https://pypi.org/p/ordinalgbt - # Define job steps - steps: - - name: Set up Python 3.9 - uses: actions/setup-python@v3 - with: - python-version: 3.9 - - name: Install dependencies - run: | - python -m pip install --upgrade pip - pip install build - - uses: actions/checkout@v3 - # Here we run build to create a wheel and a - # .tar.gz source distribution. - - name: Build package - run: python -m build --sdist --wheel - # Finally, we use a pre-defined action to publish - # our package in place of twine. - - name: Publish to PyPI - uses: pypa/gh-action-pypi-publish@release/v1 - - name: Test install from PyPi - run: | - pip install ordinalgbt \ No newline at end of file + pytest \ No newline at end of file From 685656175b1147567a06283193548a428e62e1b3 Mon Sep 17 00:00:00 2001 From: adamingas Date: Sat, 20 Jan 2024 11:12:54 +0000 Subject: [PATCH 5/5] adds on pull request value --- .github/workflows/deploymemt.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/deploymemt.yml b/.github/workflows/deploymemt.yml index 7f1cd27..d45d71c 100644 --- a/.github/workflows/deploymemt.yml +++ b/.github/workflows/deploymemt.yml @@ -6,6 +6,8 @@ name: Python application on: push: branches: [ "main" ] + pull_request: + branches: ['main'] workflow_run: workflows: ['ci'] types: