-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug (regression): updating a register can fail on local network with stable-2024.08.2.3 but still works using stable.2024-07-25 #2077
Comments
2024.08.2.3 contains a breaking change for client to get Register from network, which is not supported by the current PROD-01. currently close this issue as it is not relevant. |
Thanks Qi, I understand but it is still an issue until it's fixed. Keeping it open allows others to find it. Right now I understand very few people will be hitting this, but I think the point is important and it's not a good idea to close an issue unless there's another place where someone else having the same problem will be able to find out why. It's also relevant in that it highlights another problem, that a few weeks before launch release are going out with issues if this kind. I also expect you are under pressure to close issues as soon as possible because of the desire to see the end according to the plan. If so that's a mistake imo. Thanks for your work @maqi, it is reassuring to that you are involved in these very tricky areas. 🙏 |
@happybeing , do you have this problem, when client and local nodes are built from the same version? because as I understand, @maqi 's answer suggests, that the issue is from incompatible versions. |
It's not incompatible versions, it is a breaking change in that the register crate is ahead of the node crate since the recent update. |
Hi, @happybeing so first, I was in a rush this morning, so judged the issue and comment with I now get the issue re-opened, as it might does show some new issue. Meanwhile, it does help to confirm the issue by launch a new local testnet with all nodes upgraded to 2024.08.2.3, and run awe app with the same 2024.08.2.3, to see if the problem is reproducable. |
also @happybeing , is that possible when you hit that error of The Thank you very much |
I'm confused Qi. If this is just a matter of waiting until the node catches up with the register crate, what's the purpose of your requests? |
sry, @happybeing , my original judgement of the issue was incorrect, which might gave you wrong impression that this is an issue of mis-matched version between nodes' and clients'. As you mentioned, you used Hence I suggested you to restart you local testnet to make sure client & nodes using same version of 2024.08.2.3 |
Thanks Qi. I don't need to redo anything because I know the client and testnet were both built to the same version. I don't know if I will be able to assist you further as I'm stepping back for a while at least. It was great to work with you briefly. |
Hi, @happybeing, Thx for the clarification info. Really appreciate your contributions, and I also feel great to work with you as well. :) |
I generate the crate versions from the safe_network crate using a script that takes the relevant tag, checks it out and generates the deps for my app's Cargo.toml from the safe_network Cargo.lock. The output of that is included in the OP. Good luck. |
Here's a note to confirm that I have been running my tests successfully against I have just tried today's new release Below is the extract from my testnet-full script log. testnet-full starts a local testnet and then uses As described in the OP, the error does not occur every time a website is updated, but appears to happen at the same point in the test script every time (which I find surprising and may be a useful clue).
|
Hi, @happybeing, thx for the info supplied, really helpful. I think here is why you are hitting this error:
This explains why the It will be much helpful and appreciated, if you can tweak the line at https://github.com/maidsafe/safe_network/blob/main/sn_client/src/register.rs#L845 to be let verification_cfg = GetRecordCfg {
get_quorum: Quorum::One,
retry_strategy: Some(RetryStrategy::Quick),
target_record: None,
expected_holders,
}; you only need to rebuild your awe with this tweaked code. the local testnet can retain there untouched. |
Thanks @maqi. That solved the issue using local testnet so I'm building a new release of |
I shall thank you for helping us verify/pin this issue. |
Here is PR #2103 trying to address this as a formal fix. |
@maqi I'm still seeing two problems related to updating registers, similar to the issue in the OP (problem 1), and another previous issue where I see different versions of a register at different times (problem 2). The behaviour has changed though in both cases. I am still using the 'manual' patch which you suggested above to build my client. That patch appeared to fix this issue on a local network, but I am now testing against the public network with my client built using Problem 1. Registers still not updating. The first issue is that registers are still failing to reflect changes although I am not getting the error described in the OP. I've seen this happen twice with different registers, each time created to store the awe-some-sites website. What happens is that the register is created, two entries are written and merged immediately, leaving a total of two entries. Not long after I wrote a third entry but it was not reflected when I accessed the register which continued to show only two entries for at least ten minutes. I left this and came back hours later to find that the register was now showing 3 entries and this remained the case each time I accessed it. I wasn't sure how long it took to reflect the change so I set up a query command to check the register size every five minutes and wrote a fourth entry. After 24 hours the register is still showing only 3 entries. I tried adding another entry later the same day - again without error - and today it is still showing only 3 entries. The API indicates that the entries are written successfully every time, unlike the situation in the OP where an error is reported and the update does not happen. All the above operations involved running my client on a VPS (creation, writing entries and then querying to see the number of entries every 5 minutes). Problem 2. Register returns different numbers of entries. I've only seen this happen once so it is much less frequent than previously. While testing problem 1, I occasionally tried accessing the same register from my laptop (over mobile broadband) and once it returned the register but with only two entries. You can see the status of the register live on the network yourself using
Here is the output of awe when it successfully updates that register writing a new value of e823ee142c6dbb216c58e6d3c66847fead81a6381e6cc732d26e1bdf41e91047 but which is not present in the 'audit' output immediately above. You can see that it is the correct register (a223f580ce058a3334028fbd3f2497502aae85e9ea703a61bd39d96f772d7b599759117f6e621ed5273acb1e2920ff56f812446a3bda96a2bcfd3eba9900) queried above, and the the update is successful. That update was done at 5pm Tuesday but the register output above shows it is still not being reflected by the network at 12:40 Wednesday.
|
^^ @loziniak Have you used registers on the latest public network? |
No, I'm struggling with local still... :-P |
Hi, @happybeing , thx for the further update and the detailed info provided.
I'd suggest you use the above mentioned PR 2103 to replace the
that |
That may be the case for some uses but not in all cases. If you want a version history (e.g. versioned web, versioned file-system) then you need to access the history, not just the final merged value.
I don't think that is the explanation in my use case but can't be sure. However, perhaps the issue is related to your first point - my code still using the patch. I was awaiting the merged PR before doing any more so we can see if this error goes away once that is in stable and the network is once again reset/updated. The use cases for Registers, their API and the network implementation are still undefined 'launch' is supposedly one month away. 🤷♂️ |
On a local testnet built against safe_network
I attach the full log of my local test output which includes setting up the local testnet and then running |
Hey @happybeing, does this PR #2270 fix the issue for you? |
Thanks for the head's up. I don't have much time for testing and am not sure if my app will have been broken by recent EVM/API changes so may take a while to test but I hope to do so once it is in stable. |
I have a bunch of scripts which I use to test my application (awe) on a local network. In brief, these create a local network and then upload a series of websites, some just a single version, but two involve uploading a series of about four versions.
When uploading multiple versions the following sequence is repeated to load and update the register:
(The above all happens in awe_website_versions.rs).
The scripts and code to do the above have been run tens if not >100 times without ever seeing the following error, which is happening when I try to update the second multi-version website, but not the first!
When the error occurs (and it occurs at the same point in repeated runs of these scripts with a new local network every time), the
write_merging_branches_online()
function fails with:If I use stable-2024.08.2.3 the above happens every time. If I use stable.2024-07-25 this has never happened.
Below are the safe_network crate versions I'm building against in each case:
What is strange to me is:
For all five Registers, including the three other Registers, I create the register and immediately write two values. This always succeeds. There's only an error in one of the two Registers I subsequently try to write a third value too, and it is always the same one.
The text was updated successfully, but these errors were encountered: