-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to delete the blocks due to the data couldn't flushed in time #204
Comments
It seems the glfs_fsync and glfs_fdatasync do not work here also. This issue is: pthread_create(&tid, glusterBlockDeleteRemote, ); // in glusterBlockDeleteRemote() it will update the metadata file in block-meta/metafile pthread_join(tid); blockGetMetaInfo(); // here will read the block-meta/metafile, but it reads the old data sometimes. And I have tried to glfs_fsync/fdatasync it in glusterBlockDeleteRemote(), but still didn't work. And ideas ? |
For some reasons, after updating the metafile then when read it immediately, we could find it won't be updated in time, then though the deletions are all successful, but it will still return as failure. Currently to check the ->exit status is enough. For the case: * When gluster-block delete was executed for the first time, on this node, deletion was successful. But before it can send the response, gluster-blockd died (addr meta status would be CLEANUPFAIL) But for this case we can also check this from ->exit status, not need to check it from the metafile. The deletion failure has also be seen in customer cases. Fixes: gluster#204 Signed-off-by: Xiubo Li <[email protected]>
For the modify I can also hit the same issue in the same client with multi-precesses, here is the gluster-blockd and tcmu-runner. We do glfs_ftruncate in gluster-blockd and then in tcmu-runner to check the size by using the glfs_lstat with getting the old size, but from the mountpoint's ls command, we can see that the size is already upated to new one. |
For the 'gluster-block modify' we can hit one issue in the same client with multi-precesses, here is the gluster-blockd and tcmu-runner. We do glfs_ftruncate in gluster-blockd to resize the volume file and then in tcmu-runner to check the size by using the glfs_lstat but get the old size, which hasn't been updated yet. But from the mountpoint's ls command, we can see that the size is already upated to new one. Here we will went to wait for most 5 seconds and retry 5 times to make sure the cache has been flushed successfully to the volume. Fixes: gluster/gluster-block#204 Signed-off-by: Xiubo Li <[email protected]>
Fixed this in tcmu-runner: open-iscsi/tcmu-runner#546. |
Kind of issue
Bug
Observed behavior
When deleting the block volume it failed and the logs like:
Expected/desired behavior
Delete successfully.
Details on how to reproduce (minimal and precise)
Create and then delete the blocks in the bash script.
Logs and Information about the environment:
By adding more logs manually we can see that, from the metafile it seems the delete not success:
Other useful information
The text was updated successfully, but these errors were encountered: