1712456 – Upgrading "grub2-tools" package during RHEL7 to RHEL8 upgrade causes rpm transaction to get stuck when partition with filesystem type "iso9660" is present

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1712456 - Upgrading "grub2-tools" package during RHEL7 to RHEL8 upgrade causes rpm transaction to get stuck when partition with filesystem type "iso9660" is present

Summary: Upgrading "grub2-tools" package during RHEL7 to RHEL8 upgrade causes rpm tran...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	8.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	8.0
Assignee:	LVM and device-mapper development team
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1683682 (view as bug list)
Depends On:
Blocks:	1708241 1727807
TreeView+	depends on / blocked

Reported:	2019-05-21 14:37 UTC by Michal Reznik
Modified:	2019-10-29 16:25 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-22 15:42:29 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
core_dump_dmsetup_bt (3.18 KB, text/plain) 2019-05-21 14:50 UTC, Michal Reznik	no flags	Details
View All

Description Michal Reznik 2019-05-21 14:37:51 UTC

Description of problem:

Upgrading "grub2-tools" package during RHEL7 to RHEL8 upgrade causes rpm transaction to get stuck when partition with type "iso9660" is present.

In our case the partition is used by OSP cloud-init as a bare-metal-as-a-service config drive.

Version-Release number of selected component (if applicable):

grub2-tools-1:2.02-66.el8.x86_64

How reproducible:

When partition with type "iso9660" is present.


Steps to Reproduce:
1. Provision OSP cloud with controller having "iso9660" config drive on RHEL7
2. Install Leapp tool
3. Try to upgrade from RHEL7 to RHEL8

Actual results:

"grub2-tools" package upgrade is stuck

Expected results:

Upgrade rpm transaction passes

Additional info:

We are performing rpm upgrade in our custom initramfs (RHEL8) using systemd container with following options:

nspawn_opts="--capability=all --bind=/sys --bind=/dev --bind=/proc --bind=/run/udev --keep-unit --register=no"

Comment 1 Michal Reznik 2019-05-21 14:47:22 UTC

From analyzing memory dump we can see:

crash> ps -p
...
PID: 0      TASK: ffffffff8ea12780  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff9205061fc740  CPU: 1   COMMAND: "systemd"
  PID: 323    TASK: ffff920720a68000  CPU: 0   COMMAND: "upgrade"
   PID: 327    TASK: ffff92071507df00  CPU: 0   COMMAND: "upgrade"
    PID: 331    TASK: ffff92071507c740  CPU: 0   COMMAND: "systemd-nspawn"
     PID: 332    TASK: ffff9207142d17c0  CPU: 0   COMMAND: "leapp"
      PID: 674    TASK: ffff92071124df00  CPU: 0   COMMAND: "leapp"
       PID: 680    TASK: ffff92071124af80  CPU: 0   COMMAND: "dnf"
        PID: 1329   TASK: ffff9207142d4740  CPU: 1   COMMAND: "sh"
         PID: 1330   TASK: ffff92071507af80  CPU: 1   COMMAND: "grub2-switch-to"
          PID: 1366   TASK: ffff92071126c740  CPU: 0   COMMAND: "grub2-mkconfig"
           PID: 1620   TASK: ffff9207112497c0  CPU: 1   COMMAND: "30_os-prober"
            PID: 1626   TASK: ffff92070fcc0000  CPU: 0   COMMAND: "30_os-prober"
             PID: 1627   TASK: ffff92070fc92f80  CPU: 1   COMMAND: "os-prober"
              PID: 1677   TASK: ffff9207142d5f00  CPU: 0   COMMAND: "os-prober"
               PID: 1678   TASK: ffff92070fc90000  CPU: 1   COMMAND: "os-prober"
                PID: 1706   TASK: ffff92070fcc2f80  CPU: 0   COMMAND: "50mounted-tests"
                 PID: 1714   TASK: ffff92070fcc8000  CPU: 1   COMMAND: "50mounted-tests"
                  PID: 1718   TASK: ffff92070fcc5f00  CPU: 1   COMMAND: "dmsetup"

with "dmsetup" waiting on semaphore:

crash> bt 1718
PID: 1718   TASK: ffff92070fcc5f00  CPU: 1   COMMAND: "dmsetup"
 #0 [ffff9e65c5c17c28] __schedule at ffffffff8e01b704
 #1 [ffff9e65c5c17cc0] schedule at ffffffff8e01bd18
 #2 [ffff9e65c5c17cc8] do_semtimedop at ffffffff8db4c03a
 #3 [ffff9e65c5c17f38] do_syscall_64 at ffffffff8d80424b
 #4 [ffff9e65c5c17f50] entry_SYSCALL_64_after_hwframe at ffffffff8e2000ad
    RIP: 00007fbbc36ab88b  RSP: 00007ffe5d7cf428  RFLAGS: 00000206
    RAX: ffffffffffffffda  RBX: 000000000d4d6ad7  RCX: 00007fbbc36ab88b
    RDX: 0000000000000001  RSI: 00007ffe5d7cf442  RDI: 0000000000000000
    RBP: 00007ffe5d7cf442   R8: 00007fbbc39b24a0   R9: 00007ffe5d7cf340
    R10: 0000555bca354250  R11: 0000000000000206  R12: 0000555bca354250
    R13: 00007ffe5d7cf6e0  R14: 0000555bca353a48  R15: 0000000000000001
    ORIG_RAX: 0000000000000041  CS: 0033  SS: 002b


After extracting a core dump from the memory we see dmsetup is trying to create a device:

#3  0x000055a50674ce53 in _create_one_device (name=<optimized out>, file=<optimized out>) at dmsetup.c:1204

<coredump attached>

Comment 3 Michal Reznik 2019-05-21 14:50:56 UTC

Created attachment 1571672 [details]
core_dump_dmsetup_bt

Comment 4 Michal Reznik 2019-05-22 11:03:42 UTC

Forgot to mention that using "DM_DISABLE_UDEV=1" env var helps.

Comment 14 Bryn M. Reeves 2019-10-07 14:07:37 UTC

dmsetup is being called by os-prober to create a read-only, linear dm device named osprober-linux-vda1 mapping the vda1 partition (/usr/share/os-prober/common.sh):

do_dmsetup () {
        local prefix partition dm_device partition_name size_p
        prefix="$1"
        partition="$2"
        dm_device=

        if type dmsetup >/dev/null 2>&1 && \
           type blockdev >/dev/null 2>&1; then
                partition_name="osprober-linux-${partition##*/}"
                dm_device="/dev/mapper/$partition_name"
                size_p=$(blockdev --getsize $partition )
                if [ -e "$dm_device" ]; then
                        error "$dm_device already exists"
                        dm_device=
                else
                        debug "creating device mapper device $dm_device"
                        echo "0 $size_p linear $partition 0" | dmsetup create -r $partition_name
                fi
        fi
        echo "$dm_device"
}

Udev and dmsetup synchronise using SysV IPC semaphores, but since this is happening in a systemd-nspawn container it does not have access to the same IPC namespace as the udev daemon running on the host. If udev is used for activating devices then the udev and dmsetup/lvm processes must use the same SysV namespace.

Comment 15 Jiri Stransky 2019-10-16 13:01:23 UTC

Is it possible then that the freeze happens because of how the Leapp upgrade container is configured? Can this be fixed by `--ipc=host` on the upgrade container, or something in that sense?

Comment 16 Bryn M. Reeves 2019-10-16 14:40:11 UTC

It's certainly worth testing - the container has access to the right parts of the file sytem, but I can't be certain that this is the only blocker to this os-prober action succeeding.

Comment 17 Michal Reznik 2019-10-22 15:42:29 UTC

IIRC systemd-nspawn does not support `--ipc=host`. But we could try playing with "nsenter".

So I think this BZ can be closed and we can track the work in bz1709890.

Comment 18 Michal Reznik 2019-10-29 16:25:44 UTC

*** Bug 1683682 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.