Description of problem: Upgrading "grub2-tools" package during RHEL7 to RHEL8 upgrade causes rpm transaction to get stuck when partition with type "iso9660" is present. In our case the partition is used by OSP cloud-init as a bare-metal-as-a-service config drive. Version-Release number of selected component (if applicable): grub2-tools-1:2.02-66.el8.x86_64 How reproducible: When partition with type "iso9660" is present. Steps to Reproduce: 1. Provision OSP cloud with controller having "iso9660" config drive on RHEL7 2. Install Leapp tool 3. Try to upgrade from RHEL7 to RHEL8 Actual results: "grub2-tools" package upgrade is stuck Expected results: Upgrade rpm transaction passes Additional info: We are performing rpm upgrade in our custom initramfs (RHEL8) using systemd container with following options: nspawn_opts="--capability=all --bind=/sys --bind=/dev --bind=/proc --bind=/run/udev --keep-unit --register=no"
From analyzing memory dump we can see: crash> ps -p ... PID: 0 TASK: ffffffff8ea12780 CPU: 0 COMMAND: "swapper/0" PID: 1 TASK: ffff9205061fc740 CPU: 1 COMMAND: "systemd" PID: 323 TASK: ffff920720a68000 CPU: 0 COMMAND: "upgrade" PID: 327 TASK: ffff92071507df00 CPU: 0 COMMAND: "upgrade" PID: 331 TASK: ffff92071507c740 CPU: 0 COMMAND: "systemd-nspawn" PID: 332 TASK: ffff9207142d17c0 CPU: 0 COMMAND: "leapp" PID: 674 TASK: ffff92071124df00 CPU: 0 COMMAND: "leapp" PID: 680 TASK: ffff92071124af80 CPU: 0 COMMAND: "dnf" PID: 1329 TASK: ffff9207142d4740 CPU: 1 COMMAND: "sh" PID: 1330 TASK: ffff92071507af80 CPU: 1 COMMAND: "grub2-switch-to" PID: 1366 TASK: ffff92071126c740 CPU: 0 COMMAND: "grub2-mkconfig" PID: 1620 TASK: ffff9207112497c0 CPU: 1 COMMAND: "30_os-prober" PID: 1626 TASK: ffff92070fcc0000 CPU: 0 COMMAND: "30_os-prober" PID: 1627 TASK: ffff92070fc92f80 CPU: 1 COMMAND: "os-prober" PID: 1677 TASK: ffff9207142d5f00 CPU: 0 COMMAND: "os-prober" PID: 1678 TASK: ffff92070fc90000 CPU: 1 COMMAND: "os-prober" PID: 1706 TASK: ffff92070fcc2f80 CPU: 0 COMMAND: "50mounted-tests" PID: 1714 TASK: ffff92070fcc8000 CPU: 1 COMMAND: "50mounted-tests" PID: 1718 TASK: ffff92070fcc5f00 CPU: 1 COMMAND: "dmsetup" with "dmsetup" waiting on semaphore: crash> bt 1718 PID: 1718 TASK: ffff92070fcc5f00 CPU: 1 COMMAND: "dmsetup" #0 [ffff9e65c5c17c28] __schedule at ffffffff8e01b704 #1 [ffff9e65c5c17cc0] schedule at ffffffff8e01bd18 #2 [ffff9e65c5c17cc8] do_semtimedop at ffffffff8db4c03a #3 [ffff9e65c5c17f38] do_syscall_64 at ffffffff8d80424b #4 [ffff9e65c5c17f50] entry_SYSCALL_64_after_hwframe at ffffffff8e2000ad RIP: 00007fbbc36ab88b RSP: 00007ffe5d7cf428 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 000000000d4d6ad7 RCX: 00007fbbc36ab88b RDX: 0000000000000001 RSI: 00007ffe5d7cf442 RDI: 0000000000000000 RBP: 00007ffe5d7cf442 R8: 00007fbbc39b24a0 R9: 00007ffe5d7cf340 R10: 0000555bca354250 R11: 0000000000000206 R12: 0000555bca354250 R13: 00007ffe5d7cf6e0 R14: 0000555bca353a48 R15: 0000000000000001 ORIG_RAX: 0000000000000041 CS: 0033 SS: 002b After extracting a core dump from the memory we see dmsetup is trying to create a device: #3 0x000055a50674ce53 in _create_one_device (name=<optimized out>, file=<optimized out>) at dmsetup.c:1204 <coredump attached>
Created attachment 1571672 [details] core_dump_dmsetup_bt
Forgot to mention that using "DM_DISABLE_UDEV=1" env var helps.
dmsetup is being called by os-prober to create a read-only, linear dm device named osprober-linux-vda1 mapping the vda1 partition (/usr/share/os-prober/common.sh): do_dmsetup () { local prefix partition dm_device partition_name size_p prefix="$1" partition="$2" dm_device= if type dmsetup >/dev/null 2>&1 && \ type blockdev >/dev/null 2>&1; then partition_name="osprober-linux-${partition##*/}" dm_device="/dev/mapper/$partition_name" size_p=$(blockdev --getsize $partition ) if [ -e "$dm_device" ]; then error "$dm_device already exists" dm_device= else debug "creating device mapper device $dm_device" echo "0 $size_p linear $partition 0" | dmsetup create -r $partition_name fi fi echo "$dm_device" } Udev and dmsetup synchronise using SysV IPC semaphores, but since this is happening in a systemd-nspawn container it does not have access to the same IPC namespace as the udev daemon running on the host. If udev is used for activating devices then the udev and dmsetup/lvm processes must use the same SysV namespace.
Is it possible then that the freeze happens because of how the Leapp upgrade container is configured? Can this be fixed by `--ipc=host` on the upgrade container, or something in that sense?
It's certainly worth testing - the container has access to the right parts of the file sytem, but I can't be certain that this is the only blocker to this os-prober action succeeding.
IIRC systemd-nspawn does not support `--ipc=host`. But we could try playing with "nsenter". So I think this BZ can be closed and we can track the work in bz1709890.
*** Bug 1683682 has been marked as a duplicate of this bug. ***