Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcmur_device: add priv lock support #667

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lxbsz
Copy link
Collaborator

@lxbsz lxbsz commented Aug 26, 2021

When the tcmu-runner detect that the lock is lost, it will try to
queue a work event to reopen the image and at the same time queue
a work event to update the service status. While the reopen is not
atomic, and there has a gap between image close and image open,
during which the rbd image's state resource will be released and if
the update status event is fired, we will hit the crash bug.

This commit will add one rdev->priv_lock to protect the private data
in rdev struct. For the service status updating code just skip it
if it's in the reopen gap. And for all the other IOs just return
EBUSY to let the client try it again.

Signed-off-by: Xiubo Li [email protected]

Copy link

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the scope of the issue. The description says that the problem is that the status update handler could encounter a NULL state but the PR goes on to change all handlers, including I/O handers such as read and write. Would tcmu-runner core really initiate I/O on a closed image?

Could we instead look at something like TCMUR_DEV_FLAG_IS_OPEN? Wrapping all handlers with rdev->priv_lock seems too heavyweight to me.

rbd.c Outdated
@@ -115,11 +115,12 @@ static darray(char *) blacklist_caches;
#ifdef LIBRADOS_SUPPORTS_SERVICES

#ifdef RBD_LOCK_ACQUIRE_SUPPORT
/* rdev->priv_lock is held_*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* rdev->priv_lock is held_*/
/* rdev->priv_lock is held */

here and everywhere else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it.

rbd.c Outdated
struct tcmur_device *rdev = tcmu_dev_get_private(dev);
struct tcmu_rbd_state *state = tcmur_dev_get_private(dev);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a purely cosmetic change, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe introduced by my previous changes and there should be some other changes in this func but removed again, this could be removed and makes no sense.

rbd.c Outdated
bool has_lock)
{
return 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash "rbd: remove possible warning" commit into an earlier commit that made the void -> int change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have split this into a new PR in #673.

And also have split some other commits into separate PRs to be more easy to review: #672, #671 and #670.

Please take a look, thanks.

@idryomov
Copy link

idryomov commented Sep 6, 2021

I would suggest splitting unrelated bug fixes ("rbd: fix use-after-free of addr", "rbd: fix memory leak when fails to get the address" and "rbd: fix and add more debug logs") into a separate PR.

@lxbsz
Copy link
Collaborator Author

lxbsz commented Sep 6, 2021

I'm confused about the scope of the issue. The description says that the problem is that the status update handler could encounter a NULL state but the PR goes on to change all handlers, including I/O handers such as read and write. Would tcmu-runner core really initiate I/O on a closed image?

Yeah, it's possible.

Could we instead look at something like TCMUR_DEV_FLAG_IS_OPEN? Wrapping all handlers with rdev->priv_lock seems too heavyweight to me.

Let me check it more about this and have a try.

I would suggest splitting unrelated bug fixes ("rbd: fix use-after-free of addr", "rbd: fix memory leak when fails to get the address" and "rbd: fix and add more debug logs") into a separate PR.

Done.

@lxbsz lxbsz force-pushed the priv_lock branch 2 times, most recently from 96199cb to 1d948c0 Compare September 9, 2021 09:37
@lxbsz
Copy link
Collaborator Author

lxbsz commented Sep 10, 2021

@idryomov Please take a look, thanks.

@idryomov idryomov mentioned this pull request Sep 13, 2021
tcmur_device.c Outdated
ret = rhandler->report_event(dev);
if (ret)
tcmu_dev_err(dev, "Could not report events. Error %d.\n", ret);
pthread_mutex_unlock(&rdev->state_lock);

pthread_mutex_unlock(&rdev->rdev_lock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this code path tested?

Suggested change
pthread_mutex_unlock(&rdev->rdev_lock);
pthread_mutex_lock(&rdev->rdev_lock);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this code path tested?

Yeah, tested, checked it this has been fixed but wasn't amended to it.

I found in my setups one node has fixed this, another didn't.

For local test I didn't hit any issue.

}

rdev->flags |= TCMUR_DEV_FLAG_REPORTING_EVENT;
pthread_mutex_unlock(&rdev->rdev_lock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is rdev_lock released here? ->report_event() used to be called with it held.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rdev_lock should always be released when calling the handler's hooks. I think we need to pass a has_lock boolean parameter.

tcmur_device.h Outdated

pthread_cond_t report_event_cond;

pthread_spinlock_t cmds_list_lock; /* protects cmds_list */
struct list_head cmds_list;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to have checked in a kmod-devel-25-16.el8.x86_64.rpm binary by mistake.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have removed it.

@idryomov
Copy link

I think I'm still missing something. I'm going to ignore the renames for the moment and speak in terms of what is currently in master.

From my reading of the description, the problem is that ->report_event() and __tcmu_reopen_dev() can race and ->report_event() might crash on a momentarily closed device. Now ->report_event() is currently called under state_lock and __tcmu_reopen_dev() clears TCMUR_DEV_FLAG_IS_OPEN under state_lock before proceeding with closing. Wouldn't it be sufficient to just add a TCMUR_DEV_FLAG_IS_OPEN check to __tcmu_report_event()? Why TCMUR_DEV_FLAG_REPORTING_EVENT and a new condition variable are necessary?

@lxbsz
Copy link
Collaborator Author

lxbsz commented Sep 14, 2021

I think I'm still missing something. I'm going to ignore the renames for the moment and speak in terms of what is currently in master.

From my reading of the description, the problem is that ->report_event() and __tcmu_reopen_dev() can race and ->report_event() might crash on a momentarily closed device. Now ->report_event() is currently called under state_lock and __tcmu_reopen_dev() clears TCMUR_DEV_FLAG_IS_OPEN under state_lock before proceeding with closing. Wouldn't it be sufficient to just add a TCMUR_DEV_FLAG_IS_OPEN check to __tcmu_report_event()? Why TCMUR_DEV_FLAG_REPORTING_EVENT and a new condition variable are necessary?

The reopen and event report will be run in two different threads.

The reopen will be split into handler->close() and handler->open() without holding the state_lock, that means the reopen is none atomic.

In case that just after the __tcmu_report_event() checked that the TCMUR_DEV_FLAG_IS_OPEN flag is set and then it will try to access to the rbd_state private data to report the events to ceph cluster. While at the same time the reopen thread could be fired, which will clear the TCMUR_DEV_FLAG_IS_OPEN flag and release the rbd_state private data in handler->close().

The use-after-free bug still exists...

We need to let the reopen thread wait a bit to be sure that the event report thread has finished.

If the device is in recovery, we can defer reporting the event in
the recovery when reopening the device. And if the device is stopped
or stopping we can just skip it.

Just wait for the report event to finish when recoverying the device,
because the recovery will close and then open the device during which
the private data maybe released. And it may cause use-after-free crash
in report event routine.

Signed-off-by: Xiubo Li <[email protected]>
@lxbsz
Copy link
Collaborator Author

lxbsz commented Sep 14, 2021

Run the following test for 2 hours, worked fine for me.

[root@client ~]# lsblk 
NAME        MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda           8:0    0   25G  0 disk  
├─sda1        8:1    0    1G  0 part  /boot
└─sda2        8:2    0   24G  0 part  
  ├─cl-root 253:0    0 21.9G  0 lvm   /
  └─cl-swap 253:1    0  2.1G  0 lvm   [SWAP]
sdb           8:16   0    1M  0 disk  
└─mpathd    253:2    0    1M  0 mpath 
sdc           8:32   0    1M  0 disk  
└─mpathd    253:2    0    1M  0 mpath 
sr0          11:0    1  8.6G  0 rom   

# while [ 1 ]; do dd if=/dev/zero of=/dev/sdb bs=1K count=1024; sleep 1; dd if=/dev/zero of=/dev/sdc bs=1K count=1024; done

@idryomov
Copy link

The reopen and event report will be run in two different threads.

The reopen will be split into handler->close() and handler->open() without holding the state_lock, that means the reopen is none atomic.

In case that just after the __tcmu_report_event() checked that the TCMUR_DEV_FLAG_IS_OPEN flag is set and then it will try to access to the rbd_state private data to report the events to ceph cluster. While at the same time the reopen thread could be fired, which will clear the TCMUR_DEV_FLAG_IS_OPEN flag and release the rbd_state private data in handler->close().

In master, state_lock is held across ->report_event() so if while reopen is indeed not atomic, event report is. __tcmu_reopen_dev() would not be able to clear TCMUR_DEV_FLAG_IS_OPEN and close the image while ->report_event() is executing.

You are removing state_lock protection from ->report_event() though, saying that it (now renamed to rdev_lock) "should always be released when calling the handler's hooks". Can you elaborate on why it is not OK to hold it across ->report_event() as it is currently done?

@lxbsz
Copy link
Collaborator Author

lxbsz commented Sep 15, 2021

The reopen and event report will be run in two different threads.
The reopen will be split into handler->close() and handler->open() without holding the state_lock, that means the reopen is none atomic.
In case that just after the __tcmu_report_event() checked that the TCMUR_DEV_FLAG_IS_OPEN flag is set and then it will try to access to the rbd_state private data to report the events to ceph cluster. While at the same time the reopen thread could be fired, which will clear the TCMUR_DEV_FLAG_IS_OPEN flag and release the rbd_state private data in handler->close().

In master, state_lock is held across ->report_event() so if while reopen is indeed not atomic, event report is. __tcmu_reopen_dev() would not be able to clear TCMUR_DEV_FLAG_IS_OPEN and close the image while ->report_event() is executing.

You are removing state_lock protection from ->report_event() though, saying that it (now renamed to rdev_lock) "should always be released when calling the handler's hooks". Can you elaborate on why it is not OK to hold it across ->report_event() as it is currently done?

As I remembered long time ago as discussed, the rule is that the state_lock should be only used in the libtcmu, and shouldn't be used in the handlers. And when calling any handler hook we should release the state_lock, because there may have third part handlers could call libtcmu's helpers, which will be possibly acquire the state_lock again, potentially will introduce dead lock bug.

Or possibly in the handler's hooks it will sleep, so holding the state_lock is not a good idea.

The current code in master is buggy when begin to support event report feature.

@lxbsz lxbsz changed the base branch from master to main August 10, 2022 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants