Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit FFI files in subdirectories #3601

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

PgBiel
Copy link
Contributor

@PgBiel PgBiel commented Sep 8, 2024

Closes #1562

Still WIP (finishing the details regarding the duplicate module check)

Changes:

  • FFI files in subdirectories are now copied. For JavaScript specifically, they can be used as @external(javascript, "./file.mjs", "function"). For Erlang, you'd use the module's name as usual.
  • An additional check was included to ensure no .erl files with the same name exist within the project, as Erlang cannot have duplicate modules.
  • Added support for directories to the in-memory filesystem. (Otherwise, the is_directory function would return true for files and read_dir would also list the directory itself, causing tests to enter infinite recursion and crash.)

Potential improvements:

  • I'd like to add explicit integration tests for using @external with relative paths in JS (I tested locally with .mjs with node.js, Deno and Bun and it worked, but could be worth it to test other potential edge cases). Let me know where would be the best spot in the codebase to do so.
  • Consider having an error when a native file would overlap with a .gleam file's generated code (.mjs or .erl).

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

You'll see there's integration tests in the test directory which can be updated with nested FFI files

compiler-core/src/build/native_file_copier.rs Outdated Show resolved Hide resolved
compiler-core/src/io/memory.rs Outdated Show resolved Hide resolved
compiler-core/src/io/memory.rs Outdated Show resolved Hide resolved
compiler-core/src/io/memory.rs Outdated Show resolved Hide resolved
@lpil lpil marked this pull request as draft September 10, 2024 15:22
@PgBiel PgBiel changed the title Permit JS FFI files in subdirectories Permit FFI files in subdirectories Sep 13, 2024
@PgBiel
Copy link
Contributor Author

PgBiel commented Sep 15, 2024

Problem: My handmade recursive directory walker is too naive and can enter infinite symlink loops. I'll probably add a new method to FileSystemReader where we can address this at the IO boundary, similarly to how it's done for gleam_source_files.

@PgBiel PgBiel marked this pull request as ready for review September 16, 2024 02:22
@PgBiel
Copy link
Contributor Author

PgBiel commented Sep 16, 2024

Alright, I've made the implementation more robust (it also now supports copying nested Erlang modules), added detection of conflicting .gleam and .mjs files (but not of conflicting .gleam and .erl files yet due to the issue outlined here: #1562 (comment) - we could leave this to a future PR), added detection of conflicting .erl files in separate subpaths, added more unit tests and added integration tests. 😄

@inoas
Copy link
Contributor

inoas commented Sep 16, 2024

Hello,

thanks for taking this on, this is lovely <3 for FFI interop!

In case this is desired and-or not yet considered:

Can we enforce some namespacing on Erlang, Elixir and JavaScript modules that maps to the directory structure and else throw a compiler error?

Edit:

... if possible, this might also make this check obsolete?

An additional check was included to ensure no .erl files with the same name exist within the project, as Erlang cannot have duplicate modules.

@PgBiel
Copy link
Contributor Author

PgBiel commented Sep 16, 2024

I'm not sure that would be desirable since the main point of FFI is to have full control over what the final transpiled code does. Besides, we would have to parse all Erlang files, though I guess we could do with some regex in this case. But I think it's more a matter of flexibility, and the error just tells you if you ever mess it up somehow.

The error is also not perfect since I guess it's theoretically possible for you to shadow some module from some other package. But it's an attempt.

@inoas
Copy link
Contributor

inoas commented Sep 16, 2024

I'm not sure that would be desirable since the main point of FFI is to have full control over what the final transpiled code does. Besides, we would have to parse all Erlang files, though I guess we could do with some regex in this case. But I think it's more a matter of flexibility, and the error just tells you if you ever mess it up somehow.

The error is also not perfect since I guess it's theoretically possible for you to shadow some module from some other package. But it's an attempt.

maybe it could warn.
I don't know... I think these FFI files will all be new and setting up strictness is nice there. You can still build any erlang/elixir library as a deprendency.

@PgBiel
Copy link
Contributor Author

PgBiel commented Sep 16, 2024

I think this is fine to be honest... Forcing people to name their modules like folder@folder2@something in every folder doesn't sound good or useful realistically. And it can clash either way.

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you.

It seems now that there's a concept of directories in the in memory file system and also the OS file system, and each is responsible for implementing traversal. I think it would make sense to remove this duplication and instead to have the trait know nothing about directory walking and then the walking logic depends on the trait and is shared between all implementations.

@lpil lpil marked this pull request as draft September 20, 2024 13:53
@PgBiel
Copy link
Contributor Author

PgBiel commented Sep 20, 2024

I think it would make sense to remove this duplication and instead to have the trait know nothing about directory walking and then the walking logic depends on the trait and is shared between all implementations.

If I understood correctly, you are proposing making a dir walking function which uses only the trait's read_dir function (and other trait functions if needed), right? Though I'm not sure if we would want to reinvent the wheel here, given we already have a package which implements graph traversal logic to avoid loops etc. What if we compromise and simply add a walk_dir function to the trait and use that where applicable?

Although I guess a re-implementation will be needed if we add symlinks to the in-memory FS down the road, so we could alternatively consider searching for existing platform agnostic implementations and using that, or even just re-implement while attempting to avoid reinventing the wheel as much as possible, e.g. by using special data structure crates if needed. Still, I'm not fully aware of all edge cases to know whether we would be able to match existing OS-specific implementations in terms of not only performance but also correctness.

@lpil
Copy link
Member

lpil commented Sep 21, 2024

Yes, that's right. We should not have two implementations of the same thing, and removing a dependency is beneficial for maintenance. Having one less dependency to verify for the EU Cyber Resilience Act is nice too!

I'm not worried about correctness seeing as this is a trivial algorithm, will be fine.

@PgBiel
Copy link
Contributor Author

PgBiel commented Oct 7, 2024

By the way, I believe this is very close to completion (the dir walker algorithm was already pushed, but I haven't tested it well yet) - I'll be picking this up again very soon, once I have a bit more time :)

@PgBiel PgBiel force-pushed the nested-js-ffi branch 5 times, most recently from ea4fbd3 to 7410d22 Compare October 24, 2024 04:39
@PgBiel PgBiel force-pushed the nested-js-ffi branch 2 times, most recently from 3808565 to 977041b Compare November 5, 2024 15:02
@PgBiel PgBiel marked this pull request as ready for review November 6, 2024 01:34
@PgBiel
Copy link
Contributor Author

PgBiel commented Nov 6, 2024

Ready for another look 😄

The dir walker has been implemented with symlink loop detection. I've switched other places using recursive search to use the dir walker. Try it out :)

I've also made a few fixes and explicit design decisions in the implementation of directories in IMFS. Of note, I now ensure the root path always exists, for consistency. (We don't really check for this anywhere at the moment though, so I can remove that, but I felt like not leaving a time bomb for the future here 😄.)

Ideally, `src/abc.gleam` and `src/abc.erl` would cause an error. However, this can actually occur legitimately when downloading a Hex package with precompiled `.erl` source files, which would trigger false positives. Wonder what would be the best way to detect this situation, but doesn't seem trivial to solve.
- Test header file in separate subdir
- Test parent folder ffi in Erlang
- Prevent problems with infinite symlink loops
add mjs bug fix to changelog

update changelog to include erl ffi files

update changelog with js ffi change
We were previously including all paths in the filesystem, which
included, in particular, the root. Therefore, the "initial files"
included the root directory, causing everything to be deleted, not only
the initial files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permit FFI modules in subdirectories
3 participants