Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pants plugins: pack_metadata - python imports rules #6260

Merged
merged 19 commits into from
Oct 24, 2024

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Oct 5, 2024

This PR has commits extracted from #6202 where I'm working on getting all of our unit tests to run under pants+pytest.

It is best to review each commit of this PR to make the review manageable (I rewrote history to simplify reviews). If this PR is still too large to review, I can try to split it into multiple PRs.

Overview

This has pants+pytest run pack tests directly instead of using st2-run-pack-tests, which has some issues detailed below.

This enhances pants-plugins/pack_metadata (introduced in #5868) so that it can inject the custom PYTHONPATH entries that StackStorm adds when running python in packs like actions and sensors. We have several special lib directories to support:

  • <pack>/lib is the shared lib dir that gets added for both actions and sensors when [packs].enable_common_libs = True in st2.conf.
  • <pack>/actions/lib for actions (allowing import foo to import <pack>/actions/lib/foo.py in an action)
  • The entry_point parent directory for sensors and actions

That last point is actually not handled correctly in st2-run-pack-tests which only adds <pack>/actions/ and <pack>/sensors/ to PYTHONPATH, which assumes that the python file is directly in those directories. The python files can, however be in a sub directory, and that sub directory gets added to PYTHONPATH. So, this pants-plugins/pack_metadata enhancement allows pants to accurately reflect the logic st2 actually uses.

Another reason that this does not use st2-run-pack-tests, is that it obscures the PYTHONPATH logic that pants needs to carefully manage. Pants uses PEX to make sure that tests are run in a hermetic sandbox that cannot be influenced by things like packages installed with pip install --user. So, the python dependencies detected by this plugin actually get passed as PEX_EXTRA_SYS_PATH not PYTHONPATH which is ignored when running python hermetically.

Eventually, I would like to publish pants-plugins/pack_metadata so it can be used in pack repos to run tests with pants+pytest instead of st2-run-pack-tests.

Plugin components

Developing pants plugins

Pants has extensive documentation on the plugin API, including how to create targets, and how to write rules. Some of the plugin APIs I'm using in this PR are not documented, but these still give the general overview of the rule+target mechanisms used.

Targets and Target Fields

pack_metadata and pack_content_resource Targets and their Fields

The pack_metadata target was introduced in #5868, and it was added to BUILD files in #5871. For example, this is the pack_metadata target for the linux pack:

pack_metadata(
name="metadata",
)

Creating this doesn't require manual intervention. If a new pack is added, CI will prompt you to run pants tailor :: which generates this target (see #5871). One of my goals with this PR is to preserve this low-maintenance aspect while introducing the features we need.

So, what does the pack_metadata target do? It is basically a resources target that generates a resource target for each of the yaml files in the pack: pack.yaml, action metadata, sensor metadata, rules, etc. This is similar to the python_sources target which generates a python_source target for each python file.

In order to calculate the PYTHONPATH entries for python actions and sensors, the plugin has to extract the entry_point from the individual metadata files. But, the pack_metadata target generated generic resource targets that didn't make it easy to distinguish action metadata from pack.yaml and other files. So, this PR changes pack_metadata so it generates a pack_content_resource target for each file instead of just using the generic resource target. Note that the pack_content_resource target should not actually show up in any BUILD files. It should only be generated by pack_metadata.

This is where the pack_metadata target says which target should be generated for each file:

generated_target_cls = PackContentResourceTarget

This PR adds the pack_content_resource target here:

class PackContentResourceTarget(ResourceTarget):
alias = "pack_content_resource"
core_fields = (
*COMMON_TARGET_FIELDS,
ResourceDependenciesField,
PackContentResourceSourceField,
PackContentResourceTypeField,
)
help = "A single pack content resource file (mostly for metadata files)."

It has the same fields as resource target with a couple additions. It extends ResourceTarget so that anything that needs a generic resource can still get the file. Part of that is extending the ResourceSourceField because targets are often identified by the their fields not by the target itself.

class PackContentResourceSourceField(ResourceSourceField):
pass

The other new field is type, defined here. The possible values are defined in PackContentResourceTypes which is an enum. In nearly all cases, the BUILD files will not have an explicit type for any of these targets, so the compute_value method will get raw_value=None. It then uses a map to calculate the actual value for type based on the file's path (in address).

class PackContentResourceTypeField(StringField):
alias = "type"
help = (
"The content type of the resource."
"\nDo not use this field in BUILD files. It is calculated automatically"
"based on the conventional location of files in the st2 pack."
)
valid_choices = PackContentResourceTypes
value: PackContentResourceTypes
@classmethod
def compute_value(
cls, raw_value: Optional[str], address: Address
) -> PackContentResourceTypes:

If a pack has files that do not follow conventions, this PR adds an escape hatch for overriding the calculated type. The pack_metadata target now has the ResourcesOverridesField to provide this escape hatch. It would be used like this:

pack_metadata(
    name="metadata",
    overrides={
        "actions/workflows/some-action-chain.yaml": dict(
            type="action_chain_workflow",  # default is orquesta_workflow
        ),
    },
)

The conventions for types of metadata files in subdirectories are defined here:

_content_type_by_path_parts: dict[Tuple[str, ...], PackContentResourceTypes] = {
("actions",): PackContentResourceTypes.action_metadata,
("actions", "chains"): PackContentResourceTypes.action_chain_workflow,
("actions", "workflows"): PackContentResourceTypes.orquesta_workflow,
("aliases",): PackContentResourceTypes.alias_metadata,
("policies",): PackContentResourceTypes.policy_metadata,
("rules",): PackContentResourceTypes.rule_metadata,
("sensors",): PackContentResourceTypes.sensor_metadata,
("triggers",): PackContentResourceTypes.trigger_metadata,
}

If a file doesn't follow any of the conventions, it gets type="unknown", which can be overridden with the overrides field (shown above).

return PackContentResourceTypes.unknown

None of the packs in the st2 repo break with convention, except for maybe some insignificant things in fixture packs. So, I didn't have to add any overrides to get pack and unit tests passing.

New Field: python_tests(inject_pack_python_path=...)

This PR also has to provide a way for python_tests targets to opt-in to the PYTHONPATH manipulation. For example, the linux pack's BUILD file uses __defaults__ to enable this for all tests in the pack:

__defaults__(all=dict(inject_pack_python_path=True))

I added this by hand, instead of making pants tailor :: add it, because it's not needed for fixture packs. So, I added it only for the main packs that might actually get installed.

This new field is added to python_tests and python_test targets here:

PythonTestsGeneratorTarget.register_plugin_field(
InjectPackPythonPathField, as_moved_field=True
),
PythonTestTarget.register_plugin_field(InjectPackPythonPathField),

And the field is defined as a simple Boolean field here:

class InjectPackPythonPathField(BoolField):
alias = "inject_pack_python_path"
help = (
"For pack tests, set this to true to make sure <pack>/lib or actions/ dirs get "
"added to PYTHONPATH (actually PEX_EXTRA_SYS_PATH). Use `__defaults__` to enable "
"this in the BUILD file where you define pack_metadata, like this: "
"`__defaults__(all=dict(inject_pack_python_path=True))`"
)
default = False

Rules

This PR adds the actual PYTHONPATH manipulation logic in 3 sets of rules, with each set in a different file:

  • pack_metadata/python_rules/python_pack_content.py
  • pack_metadata/python_rules/python_module_mapper.py
  • pack_metadata/python_rules/python_path_rules.py

Note: This PR also adds tests for each of these rules.

I added a long comment that provides an overview of the implementation and how these rules work together:

# Implementation Notes:
#
# With pants, we can rely on dependency inference for all the
# st2 components, runners, and other venv bits (st2 venv and pack venv).
# In StackStorm, all of that goes at the end of PYTHONPATH.
# Pants runs things hermetically via pex, so PYTHPNPATH
# changes happen via PEX_EXTRA_SYS_PATH instead.
#
# Actions:
# At runtime, the python_runner creates a PYTHONPATH that includes:
# [pack/lib:]pack_venv/lib/python3.x:pack_venv/lib/python3.x/site-packages:pack/actions/lib:st2_pythonpath
# python_runner runs python_action_wrapper which:
# - injects the action's entry_point's directory in sys.path
# - and then imports the action module and runs it.
#
# Sensors:
# At runtime, ProcessSensorContainer creates PYTHONPATH that includes:
# [pack/lib:]st2_pythonpath
# Then the process_container runs the sensor via sensor_wrapper which:
# - injects the sensor's entry_point's directory in sys.path
# (effectively always "sensors/" as a split("/") assumes only one dir)
# - and then imports the class_name from sensor module and runs it.
#
# For actions, this pants plugin should add this to PEX_EXTRA_SYS_PATH:
# pack/actions/path_to_entry_point:[pack/lib:]pack/actions/lib
# For sensors, this pants plugin should add this to PEX_EXTRA_SYS_PATH:
# pack/sensors:[pack/lib:]
#
# The rules in this file are used by:
# python_module_mapper.py:
# Dependency inference uses pack_metadata's module_mapper to detect any
# python imports that require one of these PYTHONPATH modifications,
# resolving those imports to modules in lib/, actions/, or sensors/.
# python_path_rules.py:
# Then get the relevant python imports from dependencies and
# add their parent directory to a generated PEX_EXTRA_SYS_PATH.

python_pack_content

The rules in this file are responsible for finding pack content using StackStorm's conventions. Any rules that need to extract something pack metadata files should be in this file (like the rule that extracts entry_point from action and sensor metadata files).

@rule(
desc=f"Find all `{PackMetadata.alias}` targets in project filtered by content type",
level=LogLevel.DEBUG,
)
async def find_pack_metadata_targets_of_types(
request: PackContentResourceTargetsOfTypeRequest, targets: AllTargets
) -> PackContentResourceTargetsOfType:

@rule(desc="Find all Pack Content entry_points that are python", level=LogLevel.DEBUG)
async def find_pack_content_python_entry_points(
python_setup: PythonSetup, _: PackContentPythonEntryPointsRequest
) -> PackContentPythonEntryPoints:

@rule(desc="Find all Pack lib directory python targets", level=LogLevel.DEBUG)
async def find_python_in_pack_lib_directories(
python_setup: PythonSetup,
all_unexpanded_targets: AllUnexpandedTargets,
_: PackPythonLibsRequest,
) -> PackPythonLibs:

python_module_mapper

This file has a rule to inject StackStorm's custom python module lookup logic into the pants dependency inference, since the standard python import rules cannot identify pack python files. The dependency inference logic influences which files are considered dependencies of other files, and therefore which files need to be in the sandbox when running pytest and other tools on that file.

@rule(
desc=f"Creating map of `{PackMetadata.alias}` targets to Python modules in pack content",
level=LogLevel.DEBUG,
)
async def map_pack_content_to_python_modules(
_: St2PythonPackContentMappingMarker,
) -> FirstPartyPythonMappingImpl:

python_path_rules

This file has rules to calculate StackStorm's custom pack PYTHONPATH and then inject that (as PEX_EXTRA_SYS_PATH) when pants runs pytest.

@rule(
desc="Get pack paths that should be added to PYTHONPATH/PEX_EXTRA_SYS_PATH for a target.",
level=LogLevel.DEBUG,
)
async def get_extra_sys_path_for_pack_dependencies(
request: PackPythonPathRequest,
) -> PackPythonPath:

@rule(
desc="Inject pack paths in PYTHONPATH/PEX_EXTRA_SYS_PATH for python tests.",
level=LogLevel.DEBUG,
)
async def inject_extra_sys_path_for_pack_tests(
request: PytestPackTestRequest,
) -> PytestPluginSetup:

Eventually, I hope to add this PYTHONPATH manipulation for pylint as well (and possibly other utils python runs), but that will require changing pants itself to support that. Since this only affected one file (a file we don't even lint in the soon-to-be-legacy Makefile), I just skipped the pylint error:

# TODO: extend pants and pants-plugins/pack_metadata to add lib dirs extra_sys_path for pylint
from environ import get_environ # pylint: disable=E0401

@pull-request-size pull-request-size bot added the size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several. label Oct 5, 2024
@cognifloyd cognifloyd added this to the pants milestone Oct 5, 2024
@cognifloyd cognifloyd self-assigned this Oct 5, 2024
@cognifloyd cognifloyd force-pushed the pants-plugins-pack_metadata-python-imports branch 5 times, most recently from db829b3 to fc65ba4 Compare October 11, 2024 16:41
@@ -28,7 +28,9 @@

def rules():
return [
PythonTestsGeneratorTarget.register_plugin_field(UsesServicesField),
PythonTestsGeneratorTarget.register_plugin_field(
UsesServicesField, as_moved_field=True
Copy link
Member Author

@cognifloyd cognifloyd Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix. This is unrelated to the rest of this PR, other than, I noticed as_moved_field was missing here when I added InjectPythonPathField. Luckily, it still worked, but this is more correct.

I also corrected the dependencies field on pack_metadata to be a moved field. Being a "moved" field means you can define it on a target generator, but the actual metadata for the field ends up on the generated target, not on the target generator (so, on pack_content_resource instead of on pack_metadata).

@cognifloyd cognifloyd marked this pull request as ready for review October 11, 2024 17:35
Now pack_metadata targets will generate pack_content_resource instead of just resource.
pack_content_resource is still a resource, but this setup allows us to find
the generated resource targets more simply.

This also harmonizes the implementation of pack_metadata to follow the fields
definition of resources (esp moving dependencies into moved_fields instead of core_fields).
…esource targets

This will allow rules to look up just action and sensor metadata (for example).
… module mapping

Only handles the actual action/sensor python files. It does not yet handle:
- <pack>/lib
- <pack>/actions/lib
…thon module mapping

This makes dependency inference aware of these which may be on the PYTHONPATH.
- <pack>/lib
- <pack>/actions/lib
The pack_metadata plugin now handles identifying these imports for dep inference.
Next step, modify the PYTHONPATH as well.
…TRA_SYS_PATH for tests

This won't work until pants gets support for injecting path entries.
Copy link
Contributor

@nzlosh nzlosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice job!

@cognifloyd cognifloyd requested review from a team and rush-skills October 17, 2024 13:44
@cognifloyd cognifloyd changed the title Pants plugins: pack_metadata/python_imports Pants plugins: pack_metadata - python imports rules Oct 17, 2024
Copy link
Contributor

@guzzijones guzzijones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will trust you on this one. great explanation. I admit this is a bit beyond my expertise.

@guzzijones guzzijones merged commit 80f0547 into master Oct 24, 2024
31 checks passed
@guzzijones guzzijones deleted the pants-plugins-pack_metadata-python-imports branch October 24, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement packs pantsbuild python3 size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants