You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Set up a snapshot: dbdctl setup-snapshot $LOOP_DEVICE /mnt/tests/.cow 0
Destroy the snapshot: dbdctl destroy 0
Unmount the device: umount /mnt/tests
Detach the loop device (Hungs here): losetup -d $LOOP_DEVICE
I've used gdb to debug the kernel and found that the root cause is when detaching the loop device. If no one else is using it, the kernel (loop_clr_fd in loop.c) calls the __loop_clr_fd function internally. This function then calls blk_mq_freeze_queue, where the hang occurs.
The reason for the hang is due to abnormal ref count changes in the request queue of the loop device.
Here is the image
In the second red box, it can be seen that the value of lo->lo_queue->q_usage_counter->data inexplicably increased from 1 to 22. This is very strange. I experimented a few times and found that sometimes it increases to over 100. This results in the inability to freeze lo->lo_queue.
I suspect this issue might be related to changes in the kernel loop device. Two commits seem particularly relevant, but i am not sure the root cause is related with them Commit 1 Commit 2
Additionally, this situation only occurs when we perform setup & destroy & umount before detaching, leading to a hang. If we follow the sequence setup -> destroy -> detach -> umount, or setup -> umount -> detach -> destroy, the losetup -d command won't result in a hang. This is because our module is still using the loop device, so it doesn't call __loop_clr_fd in loop_clr_fd .
And it may affect kernel versions 5.16 and above, confirmed on Fedora 34 (5.16.19 / 5.17.12) and Fedora 38 (6.2).
However, this error does not seem to affect physical disks but not sure will effect the ref cnt for request queue of disk.
The text was updated successfully, but these errors were encountered:
hongyuntw
changed the title
loop device detach (losetup -d) hung for kernel 5.17+
loop device detach (losetup -d) hung
Feb 15, 2024
Hi man, thanks for raising that issue. We will look at this, sorry currently whole team has another priorities, but it should change soon and we will go back to this+ to your PR
Hi, I've encountered an issue when using losetup -d to detach a loop device, it hangs. Here are the steps to reproduce:
dbdctl setup-snapshot $LOOP_DEVICE /mnt/tests/.cow 0
dbdctl destroy 0
umount /mnt/tests
losetup -d $LOOP_DEVICE
I've used gdb to debug the kernel and found that the root cause is when detaching the loop device. If no one else is using it, the kernel (
loop_clr_fd
inloop.c
) calls the__loop_clr_fd
function internally. This function then callsblk_mq_freeze_queue
, where the hang occurs.The reason for the hang is due to abnormal ref count changes in the request queue of the loop device.
Here is the image
In the second red box, it can be seen that the value of
lo->lo_queue->q_usage_counter->data
inexplicably increased from 1 to 22. This is very strange. I experimented a few times and found that sometimes it increases to over 100. This results in the inability to freezelo->lo_queue
.I suspect this issue might be related to changes in the kernel loop device. Two commits seem particularly relevant, but i am not sure the root cause is related with them
Commit 1
Commit 2
Additionally, this situation only occurs when we perform setup & destroy & umount before detaching, leading to a hang. If we follow the sequence
setup -> destroy -> detach -> umount
, orsetup -> umount -> detach -> destroy
, thelosetup -d
command won't result in a hang. This is because our module is still using the loop device, so it doesn't call__loop_clr_fd
inloop_clr_fd
.And it may affect kernel versions 5.16 and above, confirmed on Fedora 34 (5.16.19 / 5.17.12) and Fedora 38 (6.2).
However, this error does not seem to affect physical disks but not sure will effect the ref cnt for request queue of disk.
The text was updated successfully, but these errors were encountered: