New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Glide-core UDS Socket Handling Rework #2482

Open

ikolomi wants to merge 1 commit into release-1.2 from issue_2433

+110 −136

Collaborator

ikolomi commented Oct 21, 2024

1.Introduced a user-land mechanism for ensuring singleton behavior of the socket, rather than relying on OS-specific semantics. This addresses the issue where macOS and Linux report different errors when the socket path already exists.

2.Simplified the implementation by removing unnecessary abstractions, including redundant connection retry logic.

Issue link

This Pull Request is linked to issue (URL): [https://github.com//issues/2433]

Checklist

Before submitting the PR make sure the following are checked:

This Pull Request is related to one issue.
Commit message has a detailed description of what changed and why.
Tests are added or updated.
CHANGELOG.md and documentation files are updated.
Destination branch is correct - main or release
Commits will be squashed upon merging.

ikolomi added bug Core changes labels

ikolomi added this to the 1.2 milestone

ikolomi requested a review from eifrah-aws

October 21, 2024 09:42

ikolomi requested a review from a team as a code owner

October 21, 2024 09:42

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:42

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:42

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:42

— with

GitHub Actions Error

ikolomi force-pushed the issue_2433 branch from 99f91f7 to feb8ab5 Compare

October 21, 2024 09:47

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:48

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:48

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:48

— with

GitHub Actions Error

ikolomi force-pushed the issue_2433 branch from feb8ab5 to 58e5dd0 Compare

October 21, 2024 09:49

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:50

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:50

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:50

— with

GitHub Actions Error

ikolomi force-pushed the issue_2433 branch from 58e5dd0 to 4c501c6 Compare

October 21, 2024 09:56

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:57

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:57

— with

GitHub Actions Error

ikolomi had a problem deploying to AWS_ACTIONS

October 21, 2024 09:57

— with

GitHub Actions Error


          Glide-core UDS Socket Handling Rework:

cd4748b

1.Introduced a user-land mechanism for ensuring singleton behavior of the socket, rather than relying on OS-specific semantics. This addresses the issue where macOS and Linux report different errors when the socket path already exists.

2.Simplified the implementation by removing unnecessary abstractions, including redundant connection retry logic.

Signed-off-by: ikolomi <[email protected]>

ikolomi force-pushed the issue_2433 branch from 4c501c6 to cd4748b Compare

October 21, 2024 10:09

ikolomi requested a deployment to AWS_ACTIONS

October 21, 2024 10:09

— with

GitHub Actions Waiting

ikolomi requested a deployment to AWS_ACTIONS

October 21, 2024 10:09

— with

GitHub Actions Waiting

ikolomi requested a deployment to AWS_ACTIONS

October 21, 2024 10:09

— with

GitHub Actions Waiting

eifrah-aws reviewed

View reviewed changes

glide-core/src/socket_listener.rs

		{
		static INITIALIZED_SOCKETS: Lazy<Arc<RwLock<HashSet<String>>>> =

Contributor

eifrah-aws Oct 21, 2024

why the Arc?

Collaborator Author

ikolomi Oct 21, 2024

I thought it is needed because the lock is shared between the threads. Will check

glide-core/src/socket_listener.rs

+                  {
+                      // Optimize for already initialized
+                      let initialized_sockets = INITIALIZED_SOCKETS

Contributor

eifrah-aws Oct 21, 2024

Question:

Can we have more than 1 socket? (looking at the socket path it is hard coded to:

let socket_name = format!("{}-{}", SOCKET_FILE_NAME, std::process::id());

So it seems 1 per process (unless it is needed for testing purposes)

Collaborator Author

ikolomi Oct 21, 2024

its odd but multiple calls with different socket_path are allowed, leading to multiple UDS sockets

glide-core/src/socket_listener.rs

+                  let socket_path = socket_path.unwrap_or_else(get_socket_path);
+                  {
+                      // Optimize for already initialized

Contributor

eifrah-aws Oct 21, 2024

I don't think its needed since this method is mostly called once in the lifetime of a client creation

Collaborator Author

ikolomi Oct 21, 2024

this is not true. multiple sockets are allowed (dont want to change it )

glide-core/src/socket_listener.rs

Comment on lines +844 to 851

+                              let runtime = Builder::new_current_thread().enable_all().build();
+                              if let Err(err) = runtime {
+                                  log_error(
+                                      "listen_on_socket",
+                                      format!("Error failed to create a new tokio thread: {err}"),
+                                  );
+                                  return Err(err);
                               }

Contributor

eifrah-aws Oct 21, 2024

Suggested change

      
                            let runtime = Builder::new_current_thread().enable_all().build();
          
                            if let Err(err) = runtime {
          
                                log_error(
          
                                    "listen_on_socket",
          
                                    format!("Error failed to create a new tokio thread: {err}"),
          
                                );
          
                                return Err(err);
          
                            }
          
            let runtime = match Builder::new_current_thread().enable_all().build() {
          
                Err(err) => {
          
                    log_error(
          
                        "listen_on_socket",
          
                        format!("Error failed to create a new tokio thread: {err}"),
          
                    );
          
                    return Err(err);
          
                }
          
                Ok(runtime) => runtime,
          
            };

Collaborator Author

ikolomi Oct 21, 2024

i dont see why its better

glide-core/src/socket_listener.rs

                               }
-                              Err(err) => init_callback(Err(err.to_string())),
+                              runtime.unwrap().block_on(async move {

Contributor

eifrah-aws Oct 21, 2024

Should you accept my suggestion above, we can drop the unwrap() here

Collaborator Author

ikolomi Oct 21, 2024

i think your suggestion is a less elegant. unwrap() is legit, otherwise it would not be in the language or clippy would complain

glide-core/src/socket_listener.rs

Comment on lines +854 to +862

+                                  let listener_socket = UnixListener::bind(socket_path_cloned.clone());
+                                  if let Err(err) = listener_socket {
+                                      log_error(
+                                          "listen_on_socket",
+                                          format!("Error failed to bind listening socket: {err}"),
+                                      );
+                                      return Err(err);
+                                  }
+                                  let listener_socket = listener_socket.unwrap();

Contributor

eifrah-aws Oct 21, 2024

Suggested change

      
                                let listener_socket = UnixListener::bind(socket_path_cloned.clone());
          
                                if let Err(err) = listener_socket {
          
                                    log_error(
          
                                        "listen_on_socket",
          
                                        format!("Error failed to bind listening socket: {err}"),
          
                                    );
          
                                    return Err(err);
          
                                }
          
                                let listener_socket = listener_socket.unwrap();
          
            let listener_socket = match UnixListener::bind(socket_path_cloned.clone()) {
          
                Err(err) ==> {
          
                    log_error(
          
                        "listen_on_socket",
          
                        format!("Error failed to bind listening socket: {err}"),
          
                    );
          
                    return Err(err);
          
                }
          
                Ok(listener_socket) => listener_socket,
          
            };

Collaborator Author

ikolomi Oct 21, 2024

same as above

glide-core/src/socket_listener.rs

+                                  // signal initialization success
+                                  init_callback(Ok(socket_path_cloned.clone()));
+                                  let _ = tx.send(true);

Contributor

eifrah-aws Oct 21, 2024

The only reason a tx.send will fail means that the receiving end has dropped - we should handle this

Collaborator Author

ikolomi Oct 21, 2024

what do you mean by handle ? we could log maybe, but its so unimportant I would not bother
Otherwise, it is safer to process to the accept loop since it might serve other clients

glide-core/src/socket_listener.rs

@@ @@ -924,23 +806,109 @@ pub fn start_socket_listener_internal<InitCallback>( @@
                   init_callback: InitCallback,
                   socket_path: Option<String>,
               ) where
-                  InitCallback: FnOnce(Result<String, String>) + Send + 'static,

Contributor

eifrah-aws Oct 21, 2024

This function should return a Result. This will reduce the code below by 1/2

glide-core/src/socket_listener.rs

Comment on lines +870 to +883

+                                      match listener_socket.accept().await {
+                                          Ok((stream, _addr)) => {
+                                              local_set_pool
+                                                  .spawn_pinned(move || listen_on_client_stream(stream));
+                                          }
+                                          Err(err) => {
+                                              log_error(
+                                                  "listen_on_socket",
+                                                  format!("Error accepting connection: {err}"),
+                                              );
+                                              break;
+                                          }
+                                      }
+                                  }

Contributor

eifrah-aws Oct 21, 2024

You should consider making this method returning a Result.

This highlighted lines above can then become:

let stream = listener_socket.accept().await?.0;
local_set_pool.spawn_pinned(move || listen_on_client_stream(stream));

Collaborator Author

ikolomi Oct 21, 2024

isnt having an await? will make it impossible to log "error accepting connection"

barshaul reviewed

View reviewed changes

glide-core/src/socket_listener.rs

Comment on lines +886 to +894

+                                  drop(listener_socket);
+                                  let _ = std::fs::remove_file(socket_path_cloned.clone());
+                                  // no more listening on socket - update the sockets db
+                                  let mut sockets_write_guard = INITIALIZED_SOCKETS
+                                      .write()
+                                      .expect("Failed to acquire sockets db write guard");
+                                  sockets_write_guard.remove(&socket_path_cloned);
+                                  Ok(())

Collaborator

barshaul Oct 21, 2024

If a new client will be created between the time you drop the listener_socket/when we break from the accept loop to the time we remove it from the sockets_write_guard, the new client would get that there's an existing socket listener and return its path although it isn't available anymore. how do we handle that? it would probably make the wrapper to fail trying to connect to a bad socket path

Collaborator Author

ikolomi Oct 21, 2024 •

edited

Loading

I think it does not differ from the previous behavior, which I find odd, but I do not want to change the design at this time.
The weak point of the design is that the init callback is called upon the socket creation. Thus, if the loop terminates by failing to accept the connection - the callback might be called with a success but the connection will not be accepted.

With the previous implementation what you are describing might happen like this:

client A is created, started thread_a, binding and accepting on the socket
client B creation started, creating thread_b, which detects an existing socket which he could connect to
thread_a failed to accept the connection and terminated
client B init callback called with a success, thread_b does not continue (since it detected a connectable socket) but the creation eventually fails since the connection cannot be established

To summarize - the new implementation does not degrade, but even improves the situation by having the explicit remove_file and user-land socket db update upon the accept loop termination (in the orig implementation the socket file will remain causing the following creations to fail)

Yury-Fridlyand requested a review from jonathanl-bq

October 21, 2024 15:09

Collaborator

Yury-Fridlyand commented Oct 21, 2024

Add a changelog please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Core changes