-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No input mask available even though MPC servers are generating them #457
Comments
Here's my stab at this -- feel free to take whichever parts you like and fix anything I got wrong :) Problem definition (as I understand it): The MPC system uses Ethereum as a coordination layer for managing certain types of cryptographic primitive resources, including pre-generated Beaver triples and random input masks for secret shared data submitted to the system. One element of the "resource management" smart contract deployed to the Ethereum chain which is used to help allocate these resources is a count of the available input masks that have been generated in the MPC system's offline phase and which are currently available for reservation by clients who wish to use them in an MPC computation. The problem encountered in the Docker Compose orchestration of the overall system (MPC + Ethereum + client) is that while the MPC system would appear to be generating these cryptographic primitives including input masks, when the client queried Ethereum for the availability of input masks, the chain would occasionally report that no input masks were available. While working on this code collaboratively over Zoom, the problem became worse on the presenting machine, and it appeared that in almost all cases the orchestration would report no input masks available. Collaborators running their own copies of the orchestration locally did not encounter the issue as frequently. Troubleshooting phase: While trying to trace the source of the issue, the discrepancy between the client's report of no input masks per the Ethereum chain and the MPC node's apparent generation of basic resources was noted. Additional logs were implemented on the MPC node side to trace not only the generation of input masks, but the posting of the availability of those resources to the Ethereum chain, as well as the receipt from the chain itself when the transaction was sent to update resource availability. When no obvious error in the transaction receipts coming from Ethereum after logging these initial details was noted, it was proposed that the issue might lie in the client querying the wrong contract. Digging into the logs of the bootstrapping service in the Docker Compose orchestration which deployed the resource management contract, it was discovered that in fact the contract (meant to be a singleton in the orchestration) had been deployed twice. The conclusion from the duplicate deployment of the contract and the lack of errors in the MPC system's receipts from Ethereum indicated strongly that the MPC system was using one copy of the contract to update information regarding resource availability, and the client was querying the second copy of the contract which was not receiving any information from the MPC system and subsequently reporting zero resource availability. The fix: [Need further input from Sylvain on this one since I'm not sure I understand it all correctly] The fix was implemented by changing the way that the Docker Compose configuration file was specified. In particular, the Important learnings: Without an extremely thorough understanding of what Docker Compose is doing, it cannot be relied upon to execute an orchestration deterministically in the way that a single Docker container can be relied on to execute deterministically. Logs are extremely valuable, and for a system relying on a blockchain backend such as Ethereum, the transaction receipts can be a helpful form of logging. |
NOTE: This issue was fixed during the recent IC3 Blockchain Camp 2020 hackathon, thanks to the help of @cs79 and @jiaochangyang. The purpose of this issue is to actually document what the issue was, how we were able to troubleshoot it and what the fix was. The issue is particularly interesting as it was a known issue, but it would only occur occasionally and after re-running the application (e.g. asynchromix2) once or a few times, the error would somehow "disappear". The error was never investigated thoroughly as higher priority issues were tackled and since it would not occur very frequently it was not that much of a problem. However, things changed drastically in the depth of a hackathon night as the issue became more or less deterministic as it would occur on every run. We were then forced to look into the issue as we could not make progress unless we fix the issue. This issue aims to highlight what we think the cause of the problem was, and an hypothesis as to why the issue started to occur at every run.
Problem definition
todo
Troubleshooting phase
todo
The fix
todo
Important learnings
The text was updated successfully, but these errors were encountered: