Release Snowflake-ml-python 1.1.0 (#72)

Co-authored-by: Snowflake Authors <[email protected]>
snowflakedb · Dec 1, 2023 · abe5b67 · abe5b67
1 parent b938743
commit abe5b67
Show file tree

Hide file tree

Showing 183 changed files with 8,267 additions and 3,100 deletions.
diff --git a/.flake8 b/.flake8
@@ -22,6 +22,9 @@ max_line_length=120
 ; E731: Do not assign a lambda expression, use a def (E731) https://www.flake8rules.com/rules/E731.html
 ; F821: Undefined name name (F821) https://www.flake8rules.com/rules/F821.html
 ; W504: Line break occurred after a binary operator (W504) https://www.flake8rules.com/rules/W504.html
+; T2xx: Use print https://github.com/jbkahn/flake8-print
 
 extend-ignore=E203
-exclude=build,setup,tool,.tox,connector_python3,parameters.py"
+exclude=build,setup,tool,.tox,connector_python3,parameters.py
+per-file-ignores =
+    tests/*: T2
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -45,12 +45,13 @@ repos:
           - jupyter
         exclude: (?x)^(\.vscode\-bootstrap/.*\.json)$
   - repo: https://github.com/pycqa/flake8  # config: .flake8
-    rev: 3.9.2
+    rev: 6.1.0
     hooks:
       - id: flake8
         additional_dependencies:
-          - flake8-bugbear == 20.11.1
+          - flake8-bugbear == 23.9.16
           - flake8-init-return == 1.0.0
+          - flake8-print == 5.0.0
   - repo: https://github.com/terrencepreilly/darglint
     rev: v1.7.0
     hooks:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,12 +1,29 @@
 # Release History
 
+## 1.1.0
+
+### Bug Fixes
+
+- Model Registry: Fix panda dataframe input not handling first row properly.
+- Model Development: OrdinalEncoder and LabelEncoder output_columns do not need to be valid snowflake identifiers. They
+  would previously be excluded if the normalized name did not match the name specified in output_columns.
+
+### Behavior Changes
+
+### New Features
+
+- Model Registry: Add support for invoking public endpoint on SPCS service, by providing a "enable_ingress" SPCS
+  deployment option.
+- Model Development: Add support for distributed HPO - GridSearchCV and RandomizedSearchCV execution will be
+  distributed on multi-node warehouses.
+
 ## 1.0.12
 
 ### Bug Fixes
 
 - Model Registry: Fix regression issue that container logging is not shown during model deployment to SPCS.
 - Model Development: Enhance the column capacity of OrdinalEncoder.
-- Model Registry: Fix unbound `batch_size`` error when deploying a model other than Hugging Face Pipeline
+- Model Registry: Fix unbound `batch_size` error when deploying a model other than Hugging Face Pipeline
    and LLM with GPU on SPCS.
 
 ### Behavior Changes

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -246,6 +246,8 @@ available in `conda` only. You can also set this along with `dev_version_pypi` i
 
 (At least one of these three fields should be set.)
 
+`require_gpu`: Set this to true if the package is only a requirement for the environment with GPUs.
+
 #### Snowflake Anaconda Channel
 
 `from_channel`: Set this if the package is not available in the Snowflake Anaconda Channel
@@ -357,17 +359,15 @@ To test if your code is working in store procedure or not simply, you could work
 
 To write a such test, you need to
 
+1. Your test cannot have a parameter called `_sproc_test_mode`.
 1. Let your test case inherit from `common_test_base.CommonTestBase`.
 1. Remove all Snowpark Session creation in your test, and use `self.session` to access the session if needed.
-1. If you write your own `setUp` and `tearDown` method, remember to call `super().setUp()` or `super().tearDown().`
+1. If you write your own `setUp` and `tearDown` method, remember to call `super().setUp()` or
+   `super().tearDown()`.
 1. Decorate your test method with `common_test_base.CommonTestBase.sproc_test()`. If you want your test running in
 store procedure only rather than both locally and in store procedure, set `local=False`. If you don't want to test
 with caller's rights, set `test_callers_rights=False`. (Owner's rights store procedure is always tested)
 
-    **Attention**: Depending on your configurations, 1-3 sub-tests will be run in your test method.
-    Sub-test means that `setUp` and `tearDown` won't run every sub-test and will only run once before and
-    after the whole test method. So it is important to make your test case self-contained.
-
 ### Compatibility Test
 
 To test if your code is compatible with previous version simply, you could work based on `CommonTestBase` in
@@ -376,9 +376,11 @@ To test if your code is compatible with previous version simply, you could work
 
 To write a such test, you need to
 
+1. Your test cannot have a parameter called `_snowml_pkg_ver`.
 1. Let your test case inherit from `common_test_base.CommonTestBase`.
 1. Remove all Snowpark Session creation in your test, and use `self.session` to access the session if needed.
-1. If you write your own `setUp` and `tearDown` method, remember to call `super().setUp()` or `super().tearDown().`
+1. If you write your own `setUp` and `tearDown` method, remember to call `super().setUp()` or
+   `super().tearDown()`.
 1. Write a factory method in your test class that return a tuple of a function and its parameters as a tuple. The
 function will be run as a store procedure in the environment with previous version of library.
 
@@ -393,11 +395,6 @@ function will be run as a store procedure in the environment with previous versi
 1. Decorate your test method with `common_test_base.CommonTestBase.compatibility_test`, providing the factory method
 you created in the above step, optional version range to test with, as well as additional package requirements.
 
-    **Attention**: For every version available in the server and within the version range, a sub-test will be run that
-    contains a run of prepare function in the store procedure and a run of the method. Sub-test means that `setUp` and
-    `tearDown` won't run every sub-test and will only run once before and after the whole test method. So it is
-    important to make your test case self-contained.
-
 ## `pre-commit`
 
 Pull requests against the main branch are subject to `pre-commit` checks. Those checks enforce the code style.

diff --git a/bazel/environments/conda-env-snowflake.yml b/bazel/environments/conda-env-snowflake.yml
@@ -43,7 +43,7 @@ dependencies:
   - sentencepiece==0.1.99
   - shap==0.42.1
   - snowflake-connector-python==3.2.0
-  - snowflake-snowpark-python==1.6.1
+  - snowflake-snowpark-python==1.8.0
   - sphinx==5.0.2
   - sqlparse==0.4.4
   - tensorflow==2.10.0

diff --git a/bazel/environments/conda-env.yml b/bazel/environments/conda-env.yml
@@ -48,7 +48,7 @@ dependencies:
   - sentencepiece==0.1.99
   - shap==0.42.1
   - snowflake-connector-python==3.2.0
-  - snowflake-snowpark-python==1.6.1
+  - snowflake-snowpark-python==1.8.0
   - sphinx==5.0.2
   - sqlparse==0.4.4
   - tensorflow==2.10.0
@@ -63,4 +63,3 @@ dependencies:
   - pip:
       - --extra-index-url https://pypi.org/simple
       - peft==0.5.0
-      - vllm==0.2.1.post1
diff --git a/bazel/environments/conda-gpu-env.yml b/bazel/environments/conda-gpu-env.yml
@@ -50,7 +50,7 @@ dependencies:
   - sentencepiece==0.1.99
   - shap==0.42.1
   - snowflake-connector-python==3.2.0
-  - snowflake-snowpark-python==1.6.1
+  - snowflake-snowpark-python==1.8.0
   - sphinx==5.0.2
   - sqlparse==0.4.4
   - tensorflow==2.10.0

diff --git a/bazel/filter_affected_targets.py b/bazel/filter_affected_targets.py
diff --git a/bazel/requirements/parse_and_generate_requirements.py b/bazel/requirements/parse_and_generate_requirements.py
@@ -49,6 +49,7 @@ class RequirementInfo(TypedDict, total=False):
     version_requirements: str
     version_requirements_pypi: str
     version_requirements_conda: str
+    require_gpu: bool
     requirements_extra_tags: Sequence[str]
     tags: Sequence[str]
 
@@ -67,7 +68,7 @@ def filter_by_tag(
         tag_filter: tag to filter the requirement. Defaults to None.
 
     Returns:
-        True if tag_filter is None, or in the array of given field if presented.
+        True if tag_filter is None, or in the array of given field in presented.
     """
     return tag_filter is None or tag_filter in req_info.get(field, [])
 
@@ -100,6 +101,7 @@ def get_req_name(req_info: RequirementInfo, env: Literal["conda", "pip", "conda-
         req_info: requirement information.
         env: environment indicator, choose from conda and pip.
 
+
     Raises:
         ValueError: Illegal env argument.
 
@@ -123,14 +125,15 @@ def get_req_name(req_info: RequirementInfo, env: Literal["conda", "pip", "conda-
 
 
 def generate_dev_pinned_string(
-    req_info: RequirementInfo, env: Literal["conda", "pip", "conda-only", "pip-only"]
+    req_info: RequirementInfo, env: Literal["conda", "pip", "conda-only", "pip-only"], has_gpu: bool = False
 ) -> Optional[str]:
     """Get the pinned version for dev environment of the requirement in the given env.
     For each env, env specific pinned version will be chosen, if not presented, common pinned version will be chosen.
 
     Args:
         req_info: requirement information.
         env: environment indicator, choose from conda and pip.
+        has_gpu: If the environment has GPU, present to filter require required GPU package.
 
     Raises:
         ValueError: Illegal env argument.
@@ -143,6 +146,8 @@ def generate_dev_pinned_string(
     name = get_req_name(req_info, env)
     if name is None:
         return None
+    if not has_gpu and req_info.get("require_gpu", False):
+        return None
     if env.startswith("conda"):
         version = req_info.get("dev_version_conda", req_info.get("dev_version", None))
         if version is None:
@@ -348,7 +353,7 @@ def generate_requirements(
             filter(
                 None,
                 map(
-                    lambda req_info: generate_dev_pinned_string(req_info, "conda"),
+                    lambda req_info: generate_dev_pinned_string(req_info, "conda", has_gpu=(mode == "dev_gpu_version")),
                     filter(
                         lambda req_info: req_info.get("from_channel", SNOWFLAKE_CONDA_CHANNEL)
                         == SNOWFLAKE_CONDA_CHANNEL,
@@ -359,7 +364,15 @@ def generate_requirements(
         )
     )
     extended_env_conda = list(
-        sorted(filter(None, map(lambda req_info: generate_dev_pinned_string(req_info, "conda"), requirements)))
+        sorted(
+            filter(
+                None,
+                map(
+                    lambda req_info: generate_dev_pinned_string(req_info, "conda", has_gpu=(mode == "dev_gpu_version")),
+                    requirements,
+                ),
+            )
+        )
     )
 
     extended_env: List[Union[str, MutableMapping[str, Sequence[str]]]] = copy.deepcopy(
@@ -370,7 +383,13 @@ def generate_requirements(
     # while for internal pip-only packages, nexus is the only viable index.
     # Relative order is here to prevent nexus index overriding public index.
     pip_only_reqs = list(
-        filter(None, map(lambda req_info: generate_dev_pinned_string(req_info, "pip-only"), requirements))
+        filter(
+            None,
+            map(
+                lambda req_info: generate_dev_pinned_string(req_info, "pip-only", has_gpu=(mode == "dev_gpu_version")),
+                requirements,
+            ),
+        )
     )
     if pip_only_reqs:
         extended_env.extend(["pip", {"pip": pip_only_reqs}])
@@ -383,7 +402,15 @@ def generate_requirements(
             sorted(
                 map(
                     lambda s: s + "\n",
-                    filter(None, map(lambda req_info: generate_dev_pinned_string(req_info, "pip"), requirements)),
+                    filter(
+                        None,
+                        map(
+                            lambda req_info: generate_dev_pinned_string(
+                                req_info, "pip", has_gpu=(mode == "dev_gpu_version")
+                            ),
+                            requirements,
+                        ),
+                    ),
                 )
             )
         )

diff --git a/bazel/requirements/requirements.schema.json b/bazel/requirements/requirements.schema.json
@@ -64,6 +64,11 @@
           "description": "The channel where the package come from, set if not from Snowflake Anaconda Channel.",
           "type": "string"
         },
+        "gpu_only": {
+          "default": false,
+          "description": "The package is required when running in an environment where GPU is available.",
+          "type": "boolean"
+        },
         "name": {
           "description": "The name of the required packages.",
           "type": "string"
@@ -90,6 +95,9 @@
             {
               "enum": [
                 "deployment_core",
+                "udf_inference",
+                "spcs_inference",
+                "model_packaging",
                 "build_essential"
               ],
               "type": "string"