diff --git a/_posts/2024-06-27-program-explorer-planning.md b/_posts/2024-06-27-program-explorer-planning.md index c709d3d..a02eb07 100644 --- a/_posts/2024-06-27-program-explorer-planning.md +++ b/_posts/2024-06-27-program-explorer-planning.md @@ -160,6 +160,10 @@ crun -v /run/output/dir:/run/pe/output ... \ pearchive pack /run/output outgoing.ar ``` +I discovered that cloud-hypervisor currently does a vmm shutdown on acpi guest shutdown, so we can't reuse the vmm process when the guest shuts down. This is change-able in the future, but for now, it eliminates any benefit of using the api so I'm going to stick with exec'ing 1 vmm process per guest. And the init doesn't have to inotify for the pmem, since they will definitely be there from boot-time. + +I have moved very slowly through writing the web/api server. First trying to do a low level epoll only sans-async version in Rust and then using tokio with async. I didn't really like either of them. I haven't yet tried any of the rust frameworks but the thing I can't figure out for sure is whether/how I can use `splice` for the request body. And I can't figure that out because they are so generic over the request body type and stuff that I just nope out, for better or worse. So my next draft will be in go which I don't love but I think has the things I want, which includes things like setting timeouts per phase of the request lifecycle. I really don't want to rewrite the server logic in go so I think I'm going to have go spin up N worker processes in rust and send them work items on a unix socket. On the go side I'll put the sockets in a channel and goroutines can grab one from that channel and use it exclusively. This is going fine, but now I'm already at the point of "may as well have that be a true load balancer and support remote workers". I evaluated cloudflare's pingora and it seems promising, the servehttp is a single function trait for a basic non-streaming http server, so that is at least nice. And then I could write the routing info for arch in a proxy. One thing I'm probably getting way too focused on is `splice` to save the copies, but tls really screws that up anyways. I'm also at this indecision between having the workers make ready requests to the LB and have the match-making thing or go with a more standard backend server. I kinda liked the idea of having all the workers only contact the LB over wireguard and use http 1.1. One thing that plays into is that we need to decide how many connections to use between the LB and worker for an N guest machine. We could have 1 connection serve all N guests, N connections, or > N connections. Remember that the ideal is to have 3 tasks happening for each guest: receiving our next request(s), running our current request, sending our results. We don't want to tie the running of a guest to the network of a single client. And ideally avoid head of line blocking for either sending/receiving. Whether we use 1 or N connections with http 1.1, we will have head of line blocking. This makes me think kN connections is better and have a single task on the guest that reads complete requests, puts them in a file (TODO with a custom frontend to cloudhypervisor and some patches we should be able to mount pmem from a memory range and skip the file (though again the file is almost better if we use splice to receive)) and then enters the run queue. Ideally, we'd have a mechanism to notify the worker of a canceled request like client left or cancel button in the ui, but that now has to get propagated to the worker and if the file is already in the queue, we have to mark it as skip. I'm not finding whether/how a closed connection is propagated in eg pingora. Anyways, for the number of connections, it seems like h2 is actually the better fit. We should punt on the wireguard thing, can be supported later. Everyone deals with tls and they seem to be fine. Do h2 mtls between lb and worker. Worker pool needs a slight adjustment to fit into the async request handler of pingora, I think we just need to pass a oneshot channel onto the queue so the worker knows where to send the result. Big todo: how to best track and bound the total request size in transit between the lb and worker. Oh and the other major api besides `run` is `get-images` to return all the supported images so far. I guess in pingora this would be a background service that asks the workers what images they have. Okay let's try pingora, 4th times a charm. + # some random benchmarking fedora 39, 5950x, ddr4 2666 MT/s (dmidecode)