loop device detach (losetup -d) hung #356

hongyuntw · 2024-02-15T02:09:48Z

Hi, I've encountered an issue when using losetup -d to detach a loop device, it hangs. Here are the steps to reproduce:

Create a loop device:

dd if=/dev/zero of=./x.img count=400 bs=1M
LOOP_DEVICE=$(losetup --find --show --partscan ./x.img) && echo $LOOP_DEVICE
mkfs.ext4 -F $LOOP_DEVICE
mkdir -p /mnt/tests/ && mount $LOOP_DEVICE /mnt/tests/

Set up a snapshot: dbdctl setup-snapshot $LOOP_DEVICE /mnt/tests/.cow 0
Destroy the snapshot: dbdctl destroy 0
Unmount the device: umount /mnt/tests
Detach the loop device (Hungs here): losetup -d $LOOP_DEVICE

I've used gdb to debug the kernel and found that the root cause is when detaching the loop device. If no one else is using it, the kernel (loop_clr_fd in loop.c) calls the __loop_clr_fd function internally. This function then calls blk_mq_freeze_queue, where the hang occurs.

The reason for the hang is due to abnormal ref count changes in the request queue of the loop device.
Here is the image

In the second red box, it can be seen that the value of lo->lo_queue->q_usage_counter->data inexplicably increased from 1 to 22. This is very strange. I experimented a few times and found that sometimes it increases to over 100. This results in the inability to freeze lo->lo_queue.

I suspect this issue might be related to changes in the kernel loop device. Two commits seem particularly relevant, but i am not sure the root cause is related with them
Commit 1
Commit 2

Additionally, this situation only occurs when we perform setup & destroy & umount before detaching, leading to a hang. If we follow the sequence setup -> destroy -> detach -> umount, or setup -> umount -> detach -> destroy, the losetup -d command won't result in a hang. This is because our module is still using the loop device, so it doesn't call __loop_clr_fd in loop_clr_fd .

And it may affect kernel versions 5.16 and above, confirmed on Fedora 34 (5.16.19 / 5.17.12) and Fedora 38 (6.2).

However, this error does not seem to affect physical disks but not sure will effect the ref cnt for request queue of disk.

The text was updated successfully, but these errors were encountered:

Swistusmen · 2024-02-15T12:17:39Z

Hi man, thanks for raising that issue. We will look at this, sorry currently whole team has another priorities, but it should change soon and we will go back to this+ to your PR

hongyuntw changed the title ~~loop device detach (losetup -d) hung for kernel 5.17+~~ loop device detach (losetup -d) hung Feb 15, 2024

Swistusmen added the bug label Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loop device detach (losetup -d) hung #356

loop device detach (losetup -d) hung #356

hongyuntw commented Feb 15, 2024 •

edited

Loading

Swistusmen commented Feb 15, 2024

loop device detach (losetup -d) hung #356

loop device detach (losetup -d) hung #356

Comments

hongyuntw commented Feb 15, 2024 • edited Loading

Swistusmen commented Feb 15, 2024

hongyuntw commented Feb 15, 2024 •

edited

Loading