Hide Forgot
During the system_upgrade_run step of the upgrade, the node goes down for the RPM upgrade as expected, but the RPM upgrade itself gets stuck. In upgrade command: TASK [set leapp facts] ********************************************************* Thursday 09 May 2019 11:52:45 +0000 (0:00:00.085) 0:00:11.320 ********** ok: [controller-0] => {"ansible_facts": {"upgrade_leapp_debug": true, "upgrade_leapp_enabled": true, "upgrade_leapp_reboot_timeout": 1800}, "changed": false} TASK [reboot to perform the upgrade] ******************************************* Thursday 09 May 2019 11:52:45 +0000 (0:00:00.093) 0:00:11.413 ********** [DEPRECATION WARNING]: evaluating upgrade_leapp_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. fatal: [controller-0]: FAILED! => {"changed": false, "elapsed": 1946, "msg": "Timed out waiting for last boot time check (timeout=1800)", "rebooted": true} In console log of the machine: [ 100.544016] upgrade[331]: Installing : python3-dnf-plugin-spacewalk-2.8.5-9.module+el 619/1848 [ 100.545208] upgrade[331]: Installing : python3-rhn-check-2.8.16-12.module+el8+2754+6a 620/1848 [ 100.546502] upgrade[331]: Upgrading : rhn-check-2.8.16-12.module+el8+2754+6a08e8f4.x 621/1848 [ 100.547701] upgrade[331]: Upgrading : rhnsd-5.0.35-3.module+el8+2754+6a08e8f4.x86_64 622/1848 [ 100.548906] upgrade[331]: Running scriptlet: rhnsd-5.0.35-3.module+el8+2754+6a08e8f4.x86_64 622/1848 [ 100.550105] upgrade[331]: Installing : python3-rhn-setup-2.8.16-12.module+el8+2754+6a 623/1848 [ 100.551291] upgrade[331]: Upgrading : rhn-setup-2.8.16-12.module+el8+2754+6a08e8f4.x 624/1848 [ 100.552505] upgrade[331]: Upgrading : tuned-2.10.0-15.el8.noarch 625/1848 [ 100.553747] upgrade[331]: Running scriptlet: tuned-2.10.0-15.el8.noarch 625/1848 [ 100.554978] upgrade[331]: Installing : dnf-plugin-subscription-manager-1.23.8-35.el8. 626/1848 [ 100.556284] upgrade[331]: Upgrading : fxload-2008_10_13-10.el8.x86_64 627/1848 [ 100.557473] upgrade[331]: Upgrading : alsa-firmware-1.0.29-6.el8.noarch 628/1848 [ 100.558654] upgrade[331]: Upgrading : alsa-tools-firmware-1.1.6-1.el8.x86_64 629/1848 [ 100.559907] upgrade[331]: Upgrading : oddjob-0.34.4-7.el8.x86_64 630/1848 [ 100.561217] upgrade[331]: Running scriptlet: oddjob-0.34.4-7.el8.x86_64 630/1848 [ 100.562606] upgrade[331]: Upgrading : oddjob-mkhomedir-0.34.4-7.el8.x86_64 631/1848 [ 202.109413] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 302.109316] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 382.109331] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 492.109332] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 562.109448] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 672.109456] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 742.109355] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 852.109356] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 922.109421] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1032.109341] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1102.109443] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1212.109323] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1282.109440] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1392.109342] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1462.109348] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1572.109346] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1642.109473] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1752.109351] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1822.109355] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 1932.109318] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2002.109426] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2112.109334] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2182.109383] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2292.109314] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2362.109418] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2472.109312] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2542.109332] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2652.109306] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2722.109463] systemd-journald[206]: Sent WATCHDOG=1 notification. [ 2832.109341] systemd-journald[206]: Sent WATCHDOG=1 notification.
(Happened in virtual development environment.)
The bug reproduced again, but there doesn't seem to be a connection between the package being upgraded and when the freeze happens: [ 84.177189] upgrade[319]: Running scriptlet: dnf-4.0.9.2-5.el8.noarch 499/1517 [ 84.179274] upgrade[319]: Running scriptlet: samba-common-4.9.1-8.el8.noarch 500/1517 [ 84.180827] upgrade[319]: Upgrading : samba-common-4.9.1-8.el8.noarch 500/1517 [ 84.182388] upgrade[319]: Running scriptlet: samba-common-4.9.1-8.el8.noarch 500/1517 [ 84.183897] upgrade[319]: Upgrading : device-mapper-event-libs-8:1.02.155-6.el8.x86_ 501/1517 [ 84.185475] upgrade[319]: Running scriptlet: device-mapper-event-libs-8:1.02.155-6.el8.x86_ 501/1517 [ 84.187116] upgrade[319]: Upgrading : grub2-tools-extra-1:2.02-66.el8.x86_64 502/1517 [ 84.189373] upgrade[319]: Installing : iptables-ebtables-1.8.2-9.el8.x86_64 503/1517 [ 84.191948] upgrade[319]: Running scriptlet: iptables-ebtables-1.8.2-9.el8.x86_64 503/1517 [ 84.193502] upgrade[319]: Running scriptlet: rpcbind-1.2.5-3.el8.x86_64 504/1517 [ 84.195218] upgrade[319]: Upgrading : rpcbind-1.2.5-3.el8.x86_64 504/1517 [ 126.059762] systemd-journald[201]: Sent WATCHDOG=1 notification. [ 186.059747] systemd-journald[201]: Sent WATCHDOG=1 notification. [ 296.059866] systemd-journald[201]: Sent WATCHDOG=1 notification. [ 366.059756] systemd-journald[201]: Sent WATCHDOG=1 notification. [ 486.059753] systemd-journald[201]: Sent WATCHDOG=1 notification.
The 2nd attempt was with as-many-as-possible OpenStack packages removed before the upgrade. The 3rd attempt was also with the packages removed and also with Leapp taken from the Leapp team's Copr repository instead of from subscription. But it seems to hang always, and it prints a different RPM as upgrading/installing last, so it doesn't look like it's caused by upgrade of any particular RPM. The latest attempt: [ 90.226146] upgrade[319]: Upgrading : device-mapper-multipath-libs-0.7.8-7.el8.x86_6 509/1532 [ 90.227339] upgrade[319]: Running scriptlet: device-mapper-multipath-libs-0.7.8-7.el8.x86_6 509/1532 [ 90.228502] upgrade[319]: Upgrading : device-mapper-event-8:1.02.155-6.el8.x86_64 510/1532 [ 90.229685] upgrade[319]: Running scriptlet: device-mapper-event-8:1.02.155-6.el8.x86_64 510/1532 [ 90.230916] upgrade[319]: Upgrading : lvm2-libs-8:2.03.02-6.el8.x86_64 511/1532 [ 90.232097] upgrade[319]: Running scriptlet: lvm2-libs-8:2.03.02-6.el8.x86_64 511/1532 [ 90.233332] upgrade[319]: Upgrading : lvm2-8:2.03.02-6.el8.x86_64 512/1532 [ 90.234512] upgrade[319]: Running scriptlet: lvm2-8:2.03.02-6.el8.x86_64 512/1532 [ 90.235703] upgrade[319]: Upgrading : samba-common-libs-4.9.1-8.el8.x86_64 513/1532 [ 90.236892] upgrade[319]: Running scriptlet: samba-common-libs-4.9.1-8.el8.x86_64 513/1532 [ 90.238110] upgrade[319]: Upgrading : libwbclient-4.9.1-8.el8.x86_64 514/1532 [ 90.239290] upgrade[319]: Upgrading : samba-client-libs-4.9.1-8.el8.x86_64 515/1532 [ 90.240565] upgrade[319]: Running scriptlet: samba-client-libs-4.9.1-8.el8.x86_64 515/1532 [ 90.241800] upgrade[319]: Upgrading : libsmbclient-4.9.1-8.el8.x86_64 516/1532 [ 90.242996] upgrade[319]: Running scriptlet: libsmbclient-4.9.1-8.el8.x86_64 516/1532 [ 90.244199] upgrade[319]: Installing : network-scripts-10.00.1-1.el8.x86_64 517/1532 [ 90.245442] upgrade[319]: Running scriptlet: network-scripts-10.00.1-1.el8.x86_64 517/1532 [ 90.246640] upgrade[319]: Installing : dhcp-client-12:4.3.6-30.el8.x86_64 518/1532 [ 90.247829] upgrade[319]: Upgrading : gnutls-dane-3.6.5-2.el8.x86_64 519/1532 [ 94.430563] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 224.526152] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 294.526160] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 404.526152] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 474.526173] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 584.526150] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 654.526180] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 764.526172] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 834.526180] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 944.526152] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1014.526147] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1124.526153] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1194.526149] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1304.526152] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1374.526145] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1484.526178] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1554.526188] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1664.526176] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1734.526244] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1844.526156] systemd-journald[200]: Sent WATCHDOG=1 notification. [ 1914.526161] systemd-journald[200]: Sent WATCHDOG=1 notification.
This is blocking further progress on upgrade. Reported bug 1709890 for Leapp.
Took memory dump of the hanging VM and see: crash> ps -p ... PID: 0 TASK: ffffffff8ea12780 CPU: 0 COMMAND: "swapper/0" PID: 1 TASK: ffff9205061fc740 CPU: 1 COMMAND: "systemd" PID: 323 TASK: ffff920720a68000 CPU: 0 COMMAND: "upgrade" PID: 327 TASK: ffff92071507df00 CPU: 0 COMMAND: "upgrade" PID: 331 TASK: ffff92071507c740 CPU: 0 COMMAND: "systemd-nspawn" PID: 332 TASK: ffff9207142d17c0 CPU: 0 COMMAND: "leapp" PID: 674 TASK: ffff92071124df00 CPU: 0 COMMAND: "leapp" PID: 680 TASK: ffff92071124af80 CPU: 0 COMMAND: "dnf" PID: 1329 TASK: ffff9207142d4740 CPU: 1 COMMAND: "sh" PID: 1330 TASK: ffff92071507af80 CPU: 1 COMMAND: "grub2-switch-to" PID: 1366 TASK: ffff92071126c740 CPU: 0 COMMAND: "grub2-mkconfig" PID: 1620 TASK: ffff9207112497c0 CPU: 1 COMMAND: "30_os-prober" PID: 1626 TASK: ffff92070fcc0000 CPU: 0 COMMAND: "30_os-prober" PID: 1627 TASK: ffff92070fc92f80 CPU: 1 COMMAND: "os-prober" PID: 1677 TASK: ffff9207142d5f00 CPU: 0 COMMAND: "os-prober" PID: 1678 TASK: ffff92070fc90000 CPU: 1 COMMAND: "os-prober" PID: 1706 TASK: ffff92070fcc2f80 CPU: 0 COMMAND: "50mounted-tests" PID: 1714 TASK: ffff92070fcc8000 CPU: 1 COMMAND: "50mounted-tests" PID: 1718 TASK: ffff92070fcc5f00 CPU: 1 COMMAND: "dmsetup" "dmsetup" is waiting on semaphore: crash> bt 1718 PID: 1718 TASK: ffff92070fcc5f00 CPU: 1 COMMAND: "dmsetup" #0 [ffff9e65c5c17c28] __schedule at ffffffff8e01b704 #1 [ffff9e65c5c17cc0] schedule at ffffffff8e01bd18 #2 [ffff9e65c5c17cc8] do_semtimedop at ffffffff8db4c03a #3 [ffff9e65c5c17f38] do_syscall_64 at ffffffff8d80424b #4 [ffff9e65c5c17f50] entry_SYSCALL_64_after_hwframe at ffffffff8e2000ad RIP: 00007fbbc36ab88b RSP: 00007ffe5d7cf428 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 000000000d4d6ad7 RCX: 00007fbbc36ab88b RDX: 0000000000000001 RSI: 00007ffe5d7cf442 RDI: 0000000000000000 RBP: 00007ffe5d7cf442 R8: 00007fbbc39b24a0 R9: 00007ffe5d7cf340 R10: 0000555bca354250 R11: 0000000000000206 R12: 0000555bca354250 R13: 00007ffe5d7cf6e0 R14: 0000555bca353a48 R15: 0000000000000001 ORIG_RAX: 0000000000000041 CS: 0033 SS: 002b crash> which is supposed to be released by udev. While taking a look at the system storage: [root@controller-0 ~]# blkid /dev/vda1: UUID="2019-03-08-11-09-02-00" LABEL="config-2" TYPE="iso9660" /dev/vda2: LABEL="img-rootfs" UUID="de7e67c7-6390-4fba-8908-8e6641c90740" TYPE="xfs" /dev/loop2: UUID="4sdrKA-MWNe-hzxS-JdGY-YAM5-9pOB-jcLAzi" TYPE="LVM2_member" I see this /dev/vda1 TYPE="iso9660" device which as explained is some kind of baremetal-as-a-service config drive used by cloud-init. When I remove this partition the rpm upgrade passes (the upgrade process is getting stuck later with not being able to mount root partition). Consulted this issue with Zdenek Kabelac from storage team and as per him "dmsetup" is using a semaphore only in case it is creating a device. The question is why is "grub2-tools" creating a device? Maybe some virtual device to check "iso9660" partition? Anyway as a "workaround", instead of blowing the "vda1" partition up, we can use "DM_DISABLE_UDEV=1" when invoking the rpm upgrade process. Not sure what the implications would be in the upgrade process. Maybe it is even a "solution" in this case. This needs to be investigated further.
We have identified more granular individual issues (see depends-on list in this BZ). I'm renaming this BZ to reflect that it's essentially a tracking BZ for the individual issues we've hit.
Closing this tracking BZ as i've unwrapped the "depends on" list into the main OSP 15 upgrades tracking BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1727807 *** This bug has been marked as a duplicate of bug 1727807 ***