Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

niri_ipc::NiriSocket; niri msg version; version checking on IPC #278

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

sodiboo
Copy link
Contributor

@sodiboo sodiboo commented Apr 3, 2024

This does not implement any sort of event stream, but rather implements basic support for the IPC client and server to handle multiple messages on each stream.

Previously, the server would ignore any additional events sent to it. This means every IPC request must start a new socket connection.

Now, with my changes, single IPC connection can handle arbitrarily many requests, separated by newlines.

The client can additionally handle them with arbitrary separation, as it relies on serde_json's StreamDeserializer. It cannot be used on the server-side, because it does not support async readers.

(no longer part of this PR)

A new API is made available in niri_ipc: NiriSocket. It handles the underlying UnixStream and allows a consumer to send multiple requests without closing it inbetween.

  • sidenote: wouldn't a more fitting API be that Request is a trait, with an associated type Response? That way, the request method could statically return the correct type (rather than an enum).

  • New command niri msg version to check the running compositor's version

    image

  • All niri msg commands will now check the running compositor's version.

  • If it matches, output is identical to before.

    image

  • If it does not match, additional output is printed upon each niri msg invocation, informing the user of the discrepancy so they can understand any potential problem more easily.

    image

  • This does not break existing scripts.

    image

  • The "Did you forget?" hint is also available for compositors prior to this commit

    image

  • The human-readable form of niri msg version does not duplicate the version information.

    image

  • "Did you forget?" can be filtered out just as with json output

    image

  • JSON output of niri msg version does duplicate the information, with rationale going that you may be piping its output into some other command that eventually discards the version numbers or puts it somewhere other than the terminal, but stderr should still be visible to the human

    image

  • The exact wording of human-readable output in the screenshots above may vary from the real product.


Additionally, no attempt is made to do anything more than tell the user that the version string mismatches. This is because packagers can1 and will2 and do3 overwrite the version() function with their own implementation, under the assumption that the result will only be shown to the user. This PR does not try to fundamentally change this assumption.

1: COPR at yalter/niri/niri
2: Nix at github:sodiboo/niri-flake
3: AUR at niri-bin depends on 1

@YaLTeR
Copy link
Owner

YaLTeR commented Apr 3, 2024

single IPC connection can handle arbitrarily many requests, separated by newlines

I wonder if we actually want this, at least currently? I'm not sure how much it brings over connecting twice, and it does add all this code to keep the socket open and be able to parse many messages at once, all with extra error conditions that are rare (like, it's easy to get an error when opening a socket and sending one message, but it's harder to get an error in-between messages).

Plus, this cannot be used with the actual event streaming, because then there's no way to tell whether a response from a server is the response to the request you just did, or just another event that got streamed in. LSPs pass explicit request ID to get around this problem, but this is even more complexity, all for seemingly little benefit in our case, when the client can just connect an extra time.

New command niri msg version to check the running compositor's version

This is good.

All niri msg commands will now check the running compositor's version.

I am a bit on the fence on this one, because it's adding a whole extra roundtrip to every single invocation of niri msg just to be able to show an error on a very rare case right after a niri update that made JSON incompatible. When is this even a problem currently? I believe adding new fields to response JSON will work correctly (I think serde already ignores unknown fields by default), so the only place this can happen is unknown action or unknown niri msg request, which means that niri msg can do the version request only after one of those returned an error.

@sodiboo
Copy link
Contributor Author

sodiboo commented Apr 3, 2024

I am a bit on the fence on this one, because it's adding a whole extra roundtrip to every single invocation of niri msg just to be able to show an error on a very rare case right after a niri update that made JSON incompatible. When is this even a problem currently?

In my use, this edge case is not infrequent. I usually do not restart niri often, even though the binary does update on my system. I find it nice to have a heads up, i suppose.

I didn't think too much of the impact of adding another request round-trip. One motivation for that specific implementation is to be "backwards compatible" with the previous versions of niri, without radically changing the protocol of those messages. This shouldn't be an issue, really, as we don't actually guarantee any ipc compatibility across commits (to my knowledge), but i personally dislike it quite a lot when a niri_msg ipc command fails for no apparent reason; and the error emitted does not suggest an intentional change.

Another way to do this, which would break the current protocol (but who cares? it'd work in the future), is to move the actual response/error into a field of a real "reply" object, where the compositor sends back {"version": "...", "reply": <what was previously the whole message>}. Then, the version subcommand is really a "noop" request that just requests the version. This eliminates the overhead of sending two requests; but i am not convinced this overhead is significant. When we do have "event streams" of sorts, we obviously would not be rechecking the version every time; only once when that client starts listening to the server.

It is also important to note that the niri_ipc crate that other programs could use does not enforce the version checking. It is only part of niri msg, so a consumer that wants to do anything complex with this data and who cares a lot about performance might instead prefer interfacing with the niri_ipc crate directly, which eliminates an extra roundtrip of json compared to parsing the output of niri msg.

Personally, i find value in having niri msg double-check version all the time. Though i just presented some reasons why i think it is reasonable to do unconditionally, i would also be fine with it being gated behind an environment variable so i can set it in niri config and forget about it.

Plus, this cannot be used with the actual event streaming, because then there's no way to tell whether a response from a server is the response to the request you just did, or just another event that got streamed in

Sure it can! First of all, we might not want to send events by default, we might have some request to receive them. That makes it backwards compatible by default.

A refactor of the API might be needed to make it "nicer" (i.e. separating events from responses. see above suggestion of response trait), but even without radically changing the interface, the "event stream" should simply correspond to a Response for which there is no Request that creates it. NiriSocket could then handle these kinds of responses, and buffer them internally, skipping forward to the actual response that corresponds to the request we made. In other words, the client should ignore "event" objects sent over the socket.

While implementing some request ID could be necessary if we want to make all requests asynchronous; (i.e. handled in parallel and can be fulfilled in any order), currently we do not do that. they are sequential; one is read at a time, and fully handled, and responded to, before we even try reading another one. I believe this is how HTTP works.

If we consider a client that does not know every event and response it may receive, the "buffer events until we find a response" approach is less trivial. of course, we should just ignore unknown variants because an unknown response will never be received (because we can't possibly have requested it), but this is slightly incompatible with the current behaviour of "panic on incorrect response".

If we distinguish events and request responses (i.e. Reply will be an enum with variants Event(Event) and Response(Result<Response, String>), the filtering described in the previous paragraph will depend a lot less on "assume it's an event"; since it'we do have that info already. This is better than adding new Response variants; because an event can not "fail" in the same way a request might. We didn't send a request, there's nothing to "fail". if there is no event, it should simply not be sent.

all with extra error conditions that are rare (like, it's easy to get an error when opening a socket and sending one message, but it's harder to get an error in-between messages).

I suppose. But these error conditions do not occur if you use it as previously(?). They are also io errors; all io can fail, duh. doing "more" will obviously incur new failure states. io is inherently unreliable, but i don't see these changes creating any "unexpected" failures. any error between messages also applies to errors during messages. previously, a client can open a socket and never write a newline or close it, for example. then, the socket would stay open in compositor forever. NiriSocket should handle any possible io errors and provide a hard-to-misuse API, so i don't see this ultimately being a real problem? i don't get this point.

yes, a client can connect twice. but this too adds extra error conditions: the socket may, in an extreme case, actually be connected to a different instance of niri between consecutive calls.

@YaLTeR
Copy link
Owner

YaLTeR commented Apr 3, 2024

In my use, this edge case is not infrequent. I usually do not restart niri often, even though the binary does update on my system. I find it nice to have a heads up, i suppose.

But what was the last case that using niri msg without restarting after an update didn't work for you?

First of all, we might not want to send events by default, we might have some request to receive them. That makes it backwards compatible by default.

I was planning to have a new niri msg request for streaming, rather than adding it to the existing requests. So this is not a concern.

A refactor of the API might be needed to make it "nicer" (i.e. separating events from responses. see above suggestion of response trait), but even without radically changing the interface, the "event stream" should simply correspond to a Response for which there is no Request that creates it. NiriSocket could then handle these kinds of responses, and buffer them internally, skipping forward to the actual response that corresponds to the request we made. In other words, the client should ignore "event" objects sent over the socket.

If we distinguish events and request responses (i.e. Reply will be an enum with variants Event(Event) and Response(Result<Response, String>), the filtering described in the previous paragraph will depend a lot less on "assume it's an event"; since it'we do have that info already. This is better than adding new Response variants; because an event can not "fail" in the same way a request might. We didn't send a request, there's nothing to "fail". if there is no event, it should simply not be sent.

What I was thinking is something like a niri msg event-stream request, that gets 1 reply (error or Ok(Handled)), and then the events stream in just as a different enum Event until the connection closes. This way there's no need to break API by introducing a different Reply variant or anything. There's also no problem of multiplexing replies and events since if there's 1 request per connection, then there are no more requests you can make so you can only receive events, with no ambiguity.

They are also io errors; all io can fail, duh. doing "more" will obviously incur new failure states. io is inherently unreliable, but i don't see these changes creating any "unexpected" failures.

It's one case if you have short-lived one-off requests where every request creates the connection from scratch, and another case when you have a long-lived connection that can fail, because now you need to potentially handle re-establishing the connection in case of a (comparatively rare) error.

@sodiboo
Copy link
Contributor Author

sodiboo commented Apr 3, 2024

But what was the last case that using niri msg without restarting after an update didn't work for you?

Adding Msg::FocusedWindow. Before that, i believe when some error handling was added (i suppose some commit that changes the protocol to send a Reply = Result<Response, String> instead of just Response). Like yeah, this is not quite infrequently but it's frequently enough that i not only wish for this behaviour to exist but i also took the time to implement it.

I was planning to have a new niri msg request for streaming, rather than adding it to the existing requests. So this is not a concern.

I guess? niri msg subcommands aren't necessarily the same as the underlying ipc protocol, and when discussing event streams above, i did not consider the command line interface that niri msg should expose, only the API it should interact with

What I was thinking is something like a niri msg event-stream request, that gets 1 reply (error or Ok(Handled)), and then the events stream in just as a different enum Event until the connection closes. This way there's no need to break API by introducing a different Reply variant or anything. There's also no problem of multiplexing replies and events since if there's 1 request per connection, then there are no more requests you can make so you can only receive events, with no ambiguity.

Yeah, so for the niri msg implementation we wouldn't have to deal with multiplexing events, because no responses (besides the first one) will be sent to a client that behaves like this. This is compatible with the model i suggested above. You can just... not make additional requests and treat it as an event stream. That's okay and would work fine. I don't see a need to enforce such a pattern when the compositor can fairly easily handle answering requests to a client that it is also sending events to.

It's one case if you have short-lived one-off requests where every request creates the connection from scratch, and another case when you have a long-lived connection that can fail, because now you need to potentially handle re-establishing the connection in case of a (comparatively rare) error.

I suppose? But how is it better to force you to re-establish the connection every time? If you want to make a new socket for each connection, sending just one request then closing it, the compositor will handle that identically to before.

@YaLTeR
Copy link
Owner

YaLTeR commented Apr 3, 2024

Adding Msg::FocusedWindow.

That should not have broken any other functionality though, right? So if you do a version check only on error, then that will work.

Before that, i believe when some error handling was added

Yeah, that was a breaking change, that I intend to try not to do going forward.

I guess? niri msg subcommands aren't necessarily the same as the underlying ipc protocol, and when discussing event streams above, i did not consider the command line interface that niri msg should expose, only the API it should interact with

I consider both niri msg interface and the IPC API at the same time when thinking about design, because it is a real and the primary consumer of the API, so it helps shape the API structure.

But how is it better to force you to re-establish the connection every time? If you want to make a new socket for each connection, sending just one request then closing it, the compositor will handle that identically to before.

It's better because it preserves the error surface, whereas multiplexing introduces additional error surface that will also not be very well tested just because the conditions are rarer.

@sodiboo
Copy link
Contributor Author

sodiboo commented Apr 7, 2024

I've pushed some more changes:

  • A new niri msg nonsense / Request::Nonsense IPC request is added, that always returns an error. Akin to niri panic, it is useful for testing how the error handling will work
  • The struct Versions defined before its singular usage is replaced with the json!() macro, which is much more fitting
  • Error messages in niri msg are generally cleaned up to be more consistent. They will be shown to a user not digging through journalctl, after all. Likewise, the function signature of niri::ipc::server::process() has been changed to take a Request and return a Reply, more closely resembling what its return value should mean (rather than an anyhow::Result, which may contain additional error information that we strip from it anyways)
  • The version request is now only double-checked if the intiial request fails. This, moreover, means that nothing is printed in niri msg --json because if the json command succeeds, the version is never checked to begin with. For non-json, it is also not printed nor checked.
  • I also seem to have broken MSRV 1.72.0. (fixed)

niri-ipc/src/lib.rs Outdated Show resolved Hide resolved

let mut stream =
UnixStream::connect(socket_path).context("error connecting to {socket_path}")?;
// Default SIGPIPE so that our prints don't panic on stdout closing.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally put this after all IPC had been done and before any prints because I'm not sure how this will affect the IPC writes and reads. I suspect it might prevent read/write errors from going through.

Copy link
Contributor Author

@sodiboo sodiboo Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. How should best to do it then? Handle all IPC logic first, encapsulate the results (including errors) in some variable, then set the signal handler, then do all error printing?

Why do we need it? Is it problematic if the niri msg process panics upon stdout hangup? If yes, wouldn't it be better to lock stdio and actually handle the err case in our code (rather than relying on an unsafe call to libc)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it because otherwise things like niri msg -j focused-window | jq aaaa (malformed jq command) will spew a Rust backtrace from niri msg in addition to the jq error.

I am not sure what happens to socket IO when it is set though.

Sidenote, there are discussions in Rust to make this more easily settable (e.g. rust-lang/rust#120832), but they are considering CLI apps that don't do socket IO, from my understanding.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible in this case to do all the requests before any of the printing, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, but somewhat annoying, particularly because of the edge case with version checking. In #294, where i've mainly focused on a more "correct" API (which is easier for e.g. scripting use cases that don't need to handle all possible requests and responses like niri msg does), i've also more clearly separated functionality for sending request, handling version checking, and "outputting" the results of these operations. and then, it's quite trivial to insert a call to ignore SIGPIPE at the appropriate point.

niri/src/ipc/client.rs

Lines 56 to 63 in f1bfef9

let reply = request.send().context("a communication error occurred")?;
// Default SIGPIPE so that our prints don't panic on stdout closing.
unsafe {
libc::signal(libc::SIGPIPE, libc::SIG_DFL);
}
match reply {


let buf = serde_json::to_vec(&reply).context("error formatting reply")?;
write.write_all(&buf).await.context("error writing reply")?;
while let Some(line) = lines.next().await {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind splitting out the "multiple requests per connection" into a separate PR? I'm not convinced that we need it right now, but I would like the rest of this PR in (version, return-error, version checks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made it no longer handle multiple requests on a connection; most of the inner workings to do so are still present, they're just unused.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think the NiriSocket struct is good, but I'd further roll back some logic complexity that will only be necessary for multiple requests per connection. I can do it myself if you'd prefer to save on the comments back and forth.

@sodiboo sodiboo changed the title streamed IPC; niri_ipc::NiriSocket; niri msg version; version checking on IPC ~~streamed IPC~~; niri_ipc::NiriSocket; niri msg version; version checking on IPC Apr 15, 2024
@sodiboo sodiboo changed the title ~~streamed IPC~~; niri_ipc::NiriSocket; niri msg version; version checking on IPC niri_ipc::NiriSocket; niri msg version; version checking on IPC Apr 15, 2024
@YaLTeR
Copy link
Owner

YaLTeR commented Apr 19, 2024

Adjusted this PR to my liking.

///
/// This struct is used to communicate with the niri IPC server. It handles the socket connection
/// and serialization/deserialization of messages.
pub struct NiriSocket {
pub struct Socket {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like this generic name as much. When used in a file, it's easier to confuse with more general I/O operations when skimming (compare to naming of UnixStream, TcpStream, UdpSocket in the std library). It's an unimportant point though; this was an exhaustive list of all the downsides that such a name brings.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that, but I decided to do this because in other niri code I'm already using fully-qualified names for some types like niri_config::Animation and niri_ipc::Output (leaving non-qualified names for more frequently occurring types of the same name). So it's consistent this way.

}
}

return Ok(());
Some(_) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block now handles the Reply::Ok case for non-Response::Version. Is that desirable? This wouldn't be caused by an outdated compositor (it would cause a Reply::Err("error parsing request")), and as such i made it not print any kind of error response. In reality, it should be unreachable since the compositor server Does Not Behave Like This

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is intentional for the reason that you mentioned. If it ever happens then something weird is going on; doesn't hurt to show the warning.

Comment on lines +45 to +48
eprintln!("Running niri compositor has a different version from the niri CLI:");
eprintln!("Compositor version: {compositor_version}");
eprintln!("CLI version: {cli_version}");
eprintln!("Did you forget to restart niri after an update?");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer your phrasing to my original version. "Running niri compositor" is better than "the compositor".


if let Err(err) = &reply {
warn!("error processing IPC request: {err:?}");
if !requested_error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to handle this error flow differently? It's not really problematic to print it; nobody's gonna spam niri msg request-error and then complain that the log output contains the same message; conversely, it's reasonable that someone may want to see what it looks like on the compositor side handling an error in which case this check somewhat obscures it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, no hard opinions here, but in my head this request is aimed much more at the client side, so as far as the server is concerned it shouldn't print a warning.

stream
.read_to_end(&mut buf)
.context("error reading IPC response")?;
let socket = Socket::connect().context("error connecting to the niri socket")?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: naming of niri_ipc::Socket (what? is there a better mechanism to add suggestions as replies to existing review comments?)

in that case, given that we use qualified niri_ipc::Window (among others), it should be more consistent across all niri code.

Suggested change
let socket = Socket::connect().context("error connecting to the niri socket")?;
let socket = niri_ipc::Socket::connect().context("error connecting to the niri socket")?;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file specifically, since it's under src/ipc/ and doesn't import any of the conflicting types, I am importing and using all niri_ipc types directly. The only exception is Transform which seems like an oversight; I'll also import it. (In src/ipc/server.rs it's clearer to fully qualify, because it's filling info from a Smithay Window).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although in general, for other types, i agree with that justification; Qualify them to indicate it's the niri_ipc version of that type when not in the context of IPC. But, see point about preferring NiriSocket over Socket (and i stand by that as it does avoid the need to qualify the name at all): In this context, while it's obvious that we're working with niri_ipc, it's not obvious what Socket means without viewing the imports or having seen the definition. Qualifying the name here is to disambiguate (for the reader) the concept of a "niri IPC socket" from "a generic type of socket with arbitrary data flowing in each direction" (e.g. they are different kinds of objects in the semantic sense); not to disambiguate the specific variant of "a transform but with serde attributes" from "a transform but the one Smithay likes" (i.e. they are the same thing semantically but distinct only in type systems and specific trait impls that both could just as well do)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can always change it later if we need

Copy link
Contributor Author

@sodiboo sodiboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly i cannot mark my approval for others' changes on my own pull request? (focused on your commit; not the whole PR overview)

image

but yeah, looks good to me. concerns are pretty much exclusively with naming of some items or exact behaviour of error printing flow. they don't matter too much and i don't see a problem with merging as-is

sodiboo and others added 3 commits April 19, 2024 16:50
implement version checking; streamed IPC

streamed IPC will allow multiple requests per connection

add nonsense request

change inline struct to json macro

only check version if request actually fails

fix usage of inspect_err (MSRV 1.72.0; stabilized 1.76.0)

"nonsense request" -> "return error"

oneshot connections
@YaLTeR
Copy link
Owner

YaLTeR commented Apr 19, 2024

Thanks!

@YaLTeR YaLTeR merged commit b5f7e4b into YaLTeR:main Apr 19, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants