Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved mono repo support #21204

Open
luabud opened this issue May 9, 2023 · 32 comments
Open

Improved mono repo support #21204

luabud opened this issue May 9, 2023 · 32 comments
Assignees
Labels
area-editor-* User-facing catch-all feature-request Request for new features or functionality meta Issue that is tracking an overall project needs spike Label for issues that need investigation before they can be worked on.

Comments

@luabud
Copy link
Member

luabud commented May 9, 2023

We talked to a lot of folks at PyCon that were complaining about our support for mono repos. This issue is to track feedback about this experience. Current hypothesis is that our multiroot support isn't great, so solving that might solve the issue for mono repos.

@luabud luabud self-assigned this May 9, 2023
@luabud luabud added needs spike Label for issues that need investigation before they can be worked on. area-editor-* User-facing catch-all labels May 9, 2023
@luabud luabud added the feature-request Request for new features or functionality label May 9, 2023
@luabud luabud self-assigned this May 9, 2023
@luabud luabud added this to the May 2023 milestone May 9, 2023
@luabud
Copy link
Member Author

luabud commented May 10, 2023

#21256

@rdbisme
Copy link

rdbisme commented May 16, 2023

Just as feedback, I personally have been using multi-root satisfactorily. A couple of bugs here and there in 1.5y of usage (one was about Python extension doing crazy stuff and hogging the machine for no apparent reason. Recreating the multi-root workspace fixed the problems) and now this problem with the Black extension that you are citing here above (but I believe it's not a problem with Python extension, that works correctly and provides autocompletion and everything correctly).

Happy to provide feedback, I use this at work on a daily basis :)

@yegorski
Copy link

yegorski commented May 18, 2023

@rdbisme For our use case multi-root workspace don't work since we have tooling in our monorepo (Pants) which requires top-level files to be present in the root folder, to define "metadata" about how each project in the monorepo should function (e.g. specifying python virtual envs, building and running tests selectively). So our monorepo is not just a bunch of folders opened in a single VSCode window. If it was then there would be no issue and we'd gladly we multi-root workspaces.

See my post from which this issue is linked microsoft/vscode#181845

@lucasvieirasilva
Copy link

Recently, I've published an Nx (monorepo tool) plugin @nxlv/python to enable python poetry projects in Nx workspaces, and the issue is very similar as @yegorski described in his post microsoft/vscode#181845.

Each project has a virtual environment. The ideal scenario is the microsoft/vscode-python extension switch the virtual environment based on the sub-folder virtual environment without manually selecting the interpreter or changing the settings.json file.

Different from @yegorski use case, I can use project.code-workspace and add a .vscode/settings.json for each Python project, but that's not the ideal scenario, and might be not a good configuration if you have a Nx workspace with Python and Node.js projects in it.

I agree with @yegorski post, if the extension supports the .vscode/settings.json on each sub-folder would be awesome.

@Moortiii
Copy link

We are experiencing similar issues as @lucasvieirasilva and @yegorski. It boils down to the following in our case:

  1. I need to programatically set up virtual-environments and assign workspace interpreters for users to simplify onboarding. As far as I know, there is no API exposed to update the Workspace Python Interpreter setting. Manually updating the SQLite-database in which these settings are stored feels does not seem like a good solution, given that everything is stored under the ms-python key as a JSON blob. If it were more cleanly separated I would feel more comfortable about doing this without fear of corrupting the rest of the user's state. However, an optimal solution would not require altering this database at all.

  2. If we create a separate .vscode/settings.json in each workspace-folder we experience issues with Intellisense not picking up the correct interpreter without closing and re-opening files. The button used to select an interpreter will say that no interpreter is currently selected until I close and re-open the file in question. Compared to setting the Workspace interpreter(s) we also get terrible performance with this solution. On a medium-to-large sized codebase we have to wait almost a full minute before we can start following references etc. I have to repeat this waiting time for each new project I open files in. It would be more clean, and more maintainable if the python-extension supported assigning interpreters for sub-folders via a single settings.json file at the root of the project.

  3. We, too, have additional meta-files and tools at the top-level and need these to be visible in the monorepo. As a workaround I have created the following structure:

º base-folder
º project-folder-1
º project-folder-n
º libs-folder-1
º libs-folder-n

This works but the structure of base-folder is similar to this:

base_folder/
    meta_file_1
    meta_file_n
 
    libs/
       lib-1/
       lib-n/
    projects/
       project-1/
       project-n/

Meaning that all the project and library folders have to be repeated in the Workspace below the base-folder. Opening files via Ctrl + P will open the actual folder, not the directory in the base-folder. Therefore I can't add these and just keep them closed at all times to prevent clutter. There are a few solutions that could help alleviate these issues:

  1. Make it possible to add additional folders to workspaces, set their interpreter and then hide them from the workspace. From my own testing it appears that the interpreter changes are still picked up correctly when navigating to individual projects inside the base-folder in the workspace. However, I have not been able to hide the folders added via the "Add Folder to Workspace..." to remove clutter, even using third-party extensions.

  2. Make it possible to assign a folder as a workspace, giving us the benefit of selecting a separate interpreter etc. while keeping the directory structure otherwise intact.

  3. Allow users to create meta-structures inside a workspace. If the solution above is not feasible, our experience would also be improved if we could create a workspace similar to:

º base-folder
º projects/ (this is a meta-folder, for structure only)
    º project-1
    º project-n
º libs/ (this is a meta-folder, for structure only)
    º lib-1
    º lib-n

@luabud
Copy link
Member Author

luabud commented Jun 22, 2023

I just wanted to start by thanking you all so much for taking the time to provide more context and feedback around your experiences with mono repos, it's super appreciated!

I don't often work with mono repos so I'm trying to get a better understanding of the typical workflow involved. Please correct me if I'm wrong, but from what I'm understanding there are two common approaches to how folks have been managing mono repos:

  1. Using one shared virtual environment
  2. Using individual virtual environments per project/subfolder

If a mono repo is set up in a way that one can use a shared virtual environment (i.e. there's no dependency conflicts between the projects in the mono repo), my understanding is that the current experience is good enough as one can simply open the root/base folder in VS Code, create a virtual environment on the project root and install all the dependencies on that same venv. The extension can automatically activate that venv and all actions can be performed inside it.

When there's a mono repo with individual virtual environments per project (because e.g. their dependencies can conflict with each one), but there's nothing on the root directory that is relevant/needed for all projects in the mono repo, then multi-root workspaces works well enough (except for the occasional bugs as called out by @rdbisme 😅). The extension will switch the selected interpreter to match the corresponding virtual environment inside the workspace where the file you open lives.

The problem is really when the root folder has metadata or files that are needed for the entire project, and each one of the projects have their own virtual environment. Ideally, you want to stay at the project root level and perform any/all actions from there, so opening the projects as a workspace is not ideal. One could do as @Moortiii suggested and open the root folder as a workspace and then add the remaining project folders as their own workspaces. There are a couple of issues with that:

  1. Files and folders end up duplicated in the file explorer
    • One workaround could be to add a .vscode/settings.json file under the root folder workspace setting "files.exclude" to hide the project folders (which would be the opposite of what @Moortiii wants, I guess)
  2. There's a significant perf issue once you have more than a couple of workspaces open (and I think it might be because Pylance launches a server per workspace)
  3. The Python extension is not automatically picking up the virtual environment in the each project workspace

Are there any other issues that I'm missing?

@karthiknadig karthiknadig modified the milestones: June 2023, July 2023 Jun 26, 2023
@luabud
Copy link
Member Author

luabud commented Jun 26, 2023

One more question that showed up as we discussed about this issue today: IntelliSense seems to be a key feature that is missing when opening a mono repo as a single folder in VS Code. Are there any other features that people would also like to use but can't because the the Python extension doesn't switch to the right environment? Or is it really just IntelliSense?

@yegorski
Copy link

I think it's just IntelliSense for us. There might be other python features. I guess ultimately the success criteria is we can run the apps and debug tests in the correct virtual env.

@Moortiii
Copy link

Moortiii commented Jun 27, 2023

I think IntelliSense support alone would suffice for us as well. However, I wanted to clarify that opening each project as its own workspace folder is far from an ideal solution. In a perfect world I would be able to open the root folder of my monorepo (not as a workspace) and assign separate interpreters to each subfolder as I please.

I also think it's important to ensure that this is easy to configure and can be done programatically. Otherwise it is difficult to guarantee that all contributors to a monorepo are using the same setup and can start working right away.

Regarding dependency management and single interpreter vs. multiple interpreters I can chime in with my own experience. In a large project I find that it's easy to end up in a situation where transitive dependencies conflict with one another. Sometimes this is workable, but it can also lead to a situation that cannot be resolved without removing or downgrading certain dependencies. In the worst case this could prevent us from installing important security patches.

It also conflicts with the general strategy I've chosen where the monorepo will still build and version individual images which are composed to a larger platform. Service A will never depend on Service B, and as such I find it weird to introduce a strict limitation that their individual dependencies must be compatible at all times when this is not the case at runtime.

@nkronenfeld
Copy link

Iuabud writes:

1. Using one shared virtual environment

2. Using individual virtual environments per project/subfolder

If a mono repo is set up in a way that one can use a shared virtual environment (i.e. there's no dependency conflicts between the projects in the mono repo), my understanding is that the current experience is good enough as one can simply open the root/base folder in VS Code, create a virtual environment on the project root and install all the dependencies on that same venv. The extension can automatically activate that venv and all actions can be performed inside it.

We are in situation (1) above, and everything works fine with regards to the venv... but that doesn't end up being sufficient.

We have a common module (called, simply enough, "common") with code used by all the other modules (call them "A", "B", "C", etc).

We have installed "common" to the venv - for local building in all the other modules - but in the editor, we find that chasing down function definitions and the like takes us to the code installed in venv/lib/site-packages instead of to the common module in our code.

Maybe we have set it up wrong somehow, but I've been chasing documentation for a while now, and haven't found a better way. If I've missed something, I'd love to hear about it; if not, then this is another thing that one should be able to get working in a mono-repo.

@PeterJCLaw
Copy link

@nkronenfeld have you tried installing the common module as an editable package?
That would be like pip install -e ./common rather than pip install ./common. Doing that will create a link in your site-packages to the local source rather than (essentially) copying the files over. Once the files are copied over there's not really any way for an editor to know that the local source and the installed source are the same thing (and indeed they may in fact not be the same).

@erictraut
Copy link

Pyright (the type checker upon which pylance is built) includes support for an alternative way to work with monorepos. The feature is called "execution environments", and it's documented here. You can configure it using a pyrightconfig.json or pyproject.toml file within your monorepo's root directory. When using execution environments, you can work with a single-root VS Code workspace but specify different subdirectories within your project that represent different "executables". Each execution environment can have a different pythonVersion, pythonPlatform and extraPaths. The feature assumes that there is a shared venv for the entire monorepo, which is an assumption that may not hold for some teams.

I'm curious whether this feature addresses some or all of the issues you have with the multi-root workspace approach. If so, it's perhaps something that pylance could further build upon as a solution for monorepos.

Both of these approaches (execution environments and multi-root workspaces) have pros and cons, and neither is a perfect solution for all use cases.

@nkronenfeld
Copy link

Sorry, I've been waiting to answer until I had time to sort everything out (which I still haven't).

Installing common as an editable package seems to have worked... but has messed up our build in other ways that I still have to sort out. I think they aren't major problems, I just haven't had time to deal with them lately, because it's been good enough. I have been able to follow code back to the correct source in VSCode, and I have been able to run without separately rebuilding common every time - both of which are big wins. So @PeterJCLaw thank you very much for that.

Execution environments also look like a good match - I will take a look at them too. Thank you too for that, @erictraut

@luabud luabud modified the milestones: July 2023, August 2023 Jul 24, 2023
@luabud luabud modified the milestones: August 2023, September 2023 Aug 28, 2023
@luabud luabud modified the milestones: September 2023, October 2023 Sep 25, 2023
@luabud luabud removed this from the October 2023 milestone Oct 24, 2023
@luabud luabud added the meta Issue that is tracking an overall project label Oct 26, 2023
@tibbe
Copy link

tibbe commented Dec 6, 2023

We'd love to see improvements in this area as well. We're using a monorepo with editable package installs. We use one project folder and venv per package (i.e. we don't add the root itself, just each package, as a project folder).

I would describe the IntelliSense behavior we see as erratic/nondeterministic. Sometimes it works and then stops working, without us changing the venvs, the "selected interpreter" settings in VSCode, or anything else I could reasonably imagine should affect IntelliSense.

Restarting VSCode often fixes the problem. Sometimes re-opening the file fixes the problem. Sometimes "Clear all interpreters" followed by resetting them again for each folder works.

It's really hard to pinpoint the cause because, as I said above, we get changing behavior without changing any configuration. I suspect it might be related opening a file in one project directory followed opening a file in another (thus "switching" interpreters).

@rob-steele-active
Copy link

Junior dev here. Just came across this thread and thought I would throw in my limited experience with monorepos and VSCode.

TLDR: I think vscode should add some way to modify/influence venv discovery.

I started a sample monorepo just testing stuff out to see how it would work. I haven't set up Pants or any other monorepo specific tools. I'm using Makefiles for managing builds, and setting up venvs.

Seems some people in this thread ran into issues with managing virtual environments in subdirectories. I ran into issues with that too but found some solid workarounds that seem pretty flexible to me. The only tool I found I really needed was the Python venv manager. https://marketplace.visualstudio.com/items?itemName=donjayamanne.python-environment-manager

The extension makes switching the environments effortless. Getting the MS Python extension to detect your environments in subdirectories requires that you enter the path to the interpreter in the venv's bin as the MS Python extension won't detect environments in subdirectories AFAIK. Once the environment is detected though it can be easily switched to and intellisense immediately works with the selected environment.

Environment discovery is definitely one of the places where I really feel there could be some easy improvements on VSCode's side. A workspace/user scoped configuration option to direct the python extension to look in specific directories would make the experience much better IMO.

One other thing I noticed while writing this is that some of my venv's that I pointed to previously have disappeared from the Environment Manager's list. They seem to disappear after restarting VSCode unless you have the virtual environment active in some way, i.e. in a terminal. My guess is that the extension doesn't keep track of virtual environments across restarts and only displays what the MS Python extension is aware of. Oh well it's definitely not perfect.

@thomelane
Copy link

Cheers for the pointer to this issue @luabud. We've got a similar setup to a few people above: separate virtual environments per subfolder, using poetry and with editable installs.

Using multi-root didn't work out great for us because of the file duplication issue (when opening root and also subfolders), but also the need to manually add each subfolder to the workspace isn't ideal when you have lots of subfolders that change often and lots of people on the team.

I'm using virtualenvs.in-project, so scanning for closest venv up the directory tree would work in my case, to find which venv should be activated for each active file. A subfolder specific settings.json would work too.

@teticio
Copy link

teticio commented Jan 14, 2024

I've put this simple extension together to do exactly what @thomelane suggests, so please let me know if this solves the problem and feel free to contribute to it (https://github.com/teticio/python-envy). https://marketplace.visualstudio.com/items?itemName=teticio.python-envy

@DanielRosenwasser
Copy link
Member

On the Pylance/Pyright side, I filed something related around walking upwards to find the nearest configuration for the analyses: microsoft/pylance-release#5564

Otherwise, I think the scenarios described above are similar issues which I ran into where it actually pushed me towards using one Python project/dependency manager over another given its ability to produce a top-level workspace .venv - something I mind less than having a top-level pyproject.toml.

Still, I think that because the .venv directory is not co-located with the pyproject.toml, the Python extension might not immediately suggest using it upon creation.

@alita-moore
Copy link

It would be nice to be able to explore the tests on the sub-module I'm working on similar to how @teticio's python-envy works. By updating "python.testing.pytestArgs": ["..."] to the root of the current module

@teticio
Copy link

teticio commented Mar 15, 2024

@alita-moore That's an interesting idea. Would it make sense to add (as an option) to python-envy? Would it just set python.testing.pytestArgs to the same directory where it activates the .venv? So the structure of your monorepo would be having a tests directory and a .venv directory in every sub-module directory? Let me know and I can make the changes.

@alita-moore
Copy link

alita-moore commented Mar 16, 2024

@teticio yeah i think it makes sense to add to python-envy (awesome package, btw, thank you very much). Yeah it would set the python.testing.pytestArgs to the root directory of where .venv is so for example if you had a project structure of

-- folder
--- .venv
--- src
---- something.py

Then when you focus something.py it would update the python.testing.pytestArgs to reflect the path to the folder (in this case just folder. Where perhaps it initially started out as a . or the root directory of vscode.

Note that I actually use individual test directories in my code / at the same level of the code it's testing, so I think having it not rely on a top-level test folder would be great. Thank you!

@luabud
Copy link
Member Author

luabud commented May 6, 2024

hey all, just wanted to give an update that we are working on a prototype that would allow a better experience for mono repos, such as allowing interpreters to be associated to folders and individual files, without relying on multi-root workspaces. No ETAs but hopefully this will be available soon :) But I also wanted to link a somewhat related issue from upstream VS Code here: microsoft/vscode#32693

@leddy231
Copy link

image

Just noticed that the notebook env selection now shows the paths so they can be distinguished per submodule. Huge thanks!

@alita-moore
Copy link

When using pyenvy if you switch between multiple interpreters quickly it sometimes causes the cpu to throttle.

@abingham
Copy link

abingham commented Aug 1, 2024

We've recently started trying a monorepo approach in vscode with mixed results. We're using a rye-style approach with a single venv that the packages share, each installed into it as editable.

If we use a non-multiroot workspace, i.e. just opening the monorepo at the root directory and letting the tooling work from there, most things work very well. Refactoring, import suggestions, and all of that work as well as we could hope. The primary problem we run into in this configuration is the test explorer. Some of our test modules have the same name, e.g. test_foo.py shows up in the test suites for multiple packages, and this causes pytest to complain. There's a tension, then, between a) treating the entire body of code as one for some operations like refactoring, and b) treating the packages independently for others, like test discovery and running. Ideally, the test explorer would be able to handle several independent test suites, but I haven't found a way to do that yet. (And while we could rename the conflicting test modules, this seems like putting the onus on the wrong party to fix the problem.)

The other approach we've tried is using multi-root workspaces, with each of our packages added as a folder to the workspace. In this configuration, the testing problem is fixed; the test explorer treats each package's tests separately, so naming conflicts go away. Unfortunately, we lose all of the benefits of a single language server handling all of the code. Things like refactoring and import suggestions simply don't work across packages. My understanding is that much of this is due to the fact that there's a separate language server for each "folder", and they don't cooperate in any way.

Anyhow, hopefully that's another useful data point as you work on the tooling.

@tibbe
Copy link

tibbe commented Aug 13, 2024

If we use a non-multiroot workspace, i.e. just opening the monorepo at the root directory and letting the tooling work from there, most things work very well. Refactoring, import suggestions, and all of that work as well as we could hope. The primary problem we run into in this configuration is the test explorer.

Could you ellaborate a bit on how you make this setup work? I tried the same i.e. having a root folder that I opened in VSCode and inside that folder I have a few folders containing different Python packages, using a single venv and editable dependencies. In my case no test cases show up whatsoever.

@abingham
Copy link

abingham commented Aug 13, 2024

Could you ellaborate a bit on how you make this setup work?

The main trick is to tell pytest where to find the tests. Following the rye approach, have a top-level pyproject.toml with something like this:

[tool.pytest.ini_options]
addopts = ["--import-mode=importlib"]
testpaths = [
    "packages/app/tests",
    "packages/domain/tests",
    "packages/infrastructure/tests",
    "packages/web/tests",
]

Each of 'app', 'domain', 'infrastructure', and 'web' are Python packages with their own pyproject.toml and their tests in the associated 'tests' directories. This seems to tell vscode where everything is.

Incidentally, I mentioned that my main problem with this approach was test name collision. The --import-mode=importlib bit makes this problem go away, and is the recommended way of importing tests in any event. So this setup is working very well for us.

@alita-moore
Copy link

@abingham I've found that a multi-root monorepo using poetry's relative packages + pylance retains refactoring and type checking across projects, have you tried this? see microsoft/pylance-release#5995 for a link to a codesandbox that demonstrates this workflow.

Have you tried this? The biggest drawback I've found using it is that pylance starts to bug out once you get more than like 3-5k files in the monorepo and pylance switching between project interpreters can be slow and sometimes bug prone like in the case of a jupyter notebook as discussed in the linked issue.

@tibbe
Copy link

tibbe commented Aug 13, 2024

@abingham I've found that a multi-root monorepo using poetry's relative packages + pylance retains refactoring and type checking across projects, have you tried this? see microsoft/pylance-release#5995 for a link to a codesandbox that demonstrates this workflow.

Could you elaborate on this? We also have a multi-root mono repo and e.g. refactoring doesn't work across folders. Did you mean this codesandbox link?

Our setup looks like follows (with each top-level directory as a VSCode multi-root folder):

pkg_a/
  pyproject.toml
pkg_b/
  pyproject.toml

We use a "relative" dependency e.g. {path = "../pkg_b", develop = true} in pkg_a/pyproject.toml. I've tried to have a virtualenv at the top-level or inside pkg_a. What's your setup? Do you have some kind of top-level pyproject.toml?

@alita-moore
Copy link

Yes, that's the correct CodeSandbox link.

For each project, we use a separate pyproject.toml file, following the same relative dependency structure you mentioned. We do not have a single top-level pyproject.toml. Are you using Pylance?

In the CodeSandbox you linked, refactors will propagate across projects if you open it in VSCode, except for Jupyter notebooks.

Please note that we use Python Envy to switch the virtual environment to the one associated with the currently open file.

@tibbe
Copy link

tibbe commented Aug 14, 2024

For each project, we use a separate pyproject.toml file, following the same relative dependency structure you mentioned. We do not have a single top-level pyproject.toml. Are you using Pylance?

I am using Pylance. My issue was using a venv in each package directory. I guess that made VSCode switch venv when I opened e.g. pkg_b and that venv didn't include pkg_a and thus cross-package stuff didn't work (e.g. I could find all references of identifiers in pkg_b).

It seems that the best approach for a multi-root monorepo is to have to have as few virtual envs as possible (e.g. only for the "top-level" stuff, which depends on other packages). We can't have just one, because of dependency conflicts (that's why we have multiple packages to begin with), but fewer definitely seems better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-editor-* User-facing catch-all feature-request Request for new features or functionality meta Issue that is tracking an overall project needs spike Label for issues that need investigation before they can be worked on.
Projects
None yet
Development

No branches or pull requests