Bug 1708241 - OSP 14->15: Tracking BZ for issues related to Leapp
Summary: OSP 14->15: Tracking BZ for issues related to Leapp
Keywords:
Status: CLOSED DUPLICATE of bug 1727807
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: ---
Assignee: Jiri Stransky
QA Contact: Ronnie Rasouli
URL:
Whiteboard:
Depends On: 1680572 1709890 1712456 1720578 1739150
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-09 12:45 UTC by Jiri Stransky
Modified: 2019-08-28 12:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-28 12:25:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jiri Stransky 2019-05-09 12:45:16 UTC
During the system_upgrade_run step of the upgrade, the node goes down for the RPM upgrade as expected, but the RPM upgrade itself gets stuck. In upgrade command:

TASK [set leapp facts] *********************************************************
Thursday 09 May 2019  11:52:45 +0000 (0:00:00.085)       0:00:11.320 ********** 
ok: [controller-0] => {"ansible_facts": {"upgrade_leapp_debug": true, "upgrade_leapp_enabled": true, "upgrade_leapp_reboot_timeout": 1800}, "changed": false}

TASK [reboot to perform the upgrade] *******************************************
Thursday 09 May 2019  11:52:45 +0000 (0:00:00.093)       0:00:11.413 ********** 
[DEPRECATION WARNING]: evaluating upgrade_leapp_enabled as a bare variable, 
this behaviour will go away and you might need to add |bool to the expression 
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This 
feature will be removed in version 2.12. Deprecation warnings can be disabled 
by setting deprecation_warnings=False in ansible.cfg.
fatal: [controller-0]: FAILED! => {"changed": false, "elapsed": 1946, "msg": "Timed out waiting for last boot time check (timeout=1800)", "rebooted": true}


In console log of the machine:

[  100.544016] upgrade[331]:   Installing       : python3-dnf-plugin-spacewalk-2.8.5-9.module+el    619/1848
[  100.545208] upgrade[331]:   Installing       : python3-rhn-check-2.8.16-12.module+el8+2754+6a    620/1848
[  100.546502] upgrade[331]:   Upgrading        : rhn-check-2.8.16-12.module+el8+2754+6a08e8f4.x    621/1848
[  100.547701] upgrade[331]:   Upgrading        : rhnsd-5.0.35-3.module+el8+2754+6a08e8f4.x86_64    622/1848
[  100.548906] upgrade[331]:   Running scriptlet: rhnsd-5.0.35-3.module+el8+2754+6a08e8f4.x86_64    622/1848
[  100.550105] upgrade[331]:   Installing       : python3-rhn-setup-2.8.16-12.module+el8+2754+6a    623/1848
[  100.551291] upgrade[331]:   Upgrading        : rhn-setup-2.8.16-12.module+el8+2754+6a08e8f4.x    624/1848
[  100.552505] upgrade[331]:   Upgrading        : tuned-2.10.0-15.el8.noarch                        625/1848
[  100.553747] upgrade[331]:   Running scriptlet: tuned-2.10.0-15.el8.noarch                        625/1848
[  100.554978] upgrade[331]:   Installing       : dnf-plugin-subscription-manager-1.23.8-35.el8.    626/1848
[  100.556284] upgrade[331]:   Upgrading        : fxload-2008_10_13-10.el8.x86_64                   627/1848
[  100.557473] upgrade[331]:   Upgrading        : alsa-firmware-1.0.29-6.el8.noarch                 628/1848
[  100.558654] upgrade[331]:   Upgrading        : alsa-tools-firmware-1.1.6-1.el8.x86_64            629/1848
[  100.559907] upgrade[331]:   Upgrading        : oddjob-0.34.4-7.el8.x86_64                        630/1848
[  100.561217] upgrade[331]:   Running scriptlet: oddjob-0.34.4-7.el8.x86_64                        630/1848
[  100.562606] upgrade[331]:   Upgrading        : oddjob-mkhomedir-0.34.4-7.el8.x86_64              631/1848
[  202.109413] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  302.109316] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  382.109331] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  492.109332] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  562.109448] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  672.109456] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  742.109355] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  852.109356] systemd-journald[206]: Sent WATCHDOG=1 notification.
[  922.109421] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1032.109341] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1102.109443] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1212.109323] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1282.109440] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1392.109342] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1462.109348] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1572.109346] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1642.109473] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1752.109351] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1822.109355] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 1932.109318] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2002.109426] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2112.109334] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2182.109383] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2292.109314] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2362.109418] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2472.109312] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2542.109332] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2652.109306] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2722.109463] systemd-journald[206]: Sent WATCHDOG=1 notification.
[ 2832.109341] systemd-journald[206]: Sent WATCHDOG=1 notification.

Comment 2 Jiri Stransky 2019-05-09 12:53:31 UTC
(Happened in virtual development environment.)

Comment 4 Jiri Stransky 2019-05-14 11:31:19 UTC
The bug reproduced again, but there doesn't seem to be a connection between the package being upgraded and when the freeze happens:

[   84.177189] upgrade[319]:   Running scriptlet: dnf-4.0.9.2-5.el8.noarch                          499/1517
[   84.179274] upgrade[319]:   Running scriptlet: samba-common-4.9.1-8.el8.noarch                   500/1517
[   84.180827] upgrade[319]:   Upgrading        : samba-common-4.9.1-8.el8.noarch                   500/1517
[   84.182388] upgrade[319]:   Running scriptlet: samba-common-4.9.1-8.el8.noarch                   500/1517
[   84.183897] upgrade[319]:   Upgrading        : device-mapper-event-libs-8:1.02.155-6.el8.x86_    501/1517
[   84.185475] upgrade[319]:   Running scriptlet: device-mapper-event-libs-8:1.02.155-6.el8.x86_    501/1517
[   84.187116] upgrade[319]:   Upgrading        : grub2-tools-extra-1:2.02-66.el8.x86_64            502/1517
[   84.189373] upgrade[319]:   Installing       : iptables-ebtables-1.8.2-9.el8.x86_64              503/1517
[   84.191948] upgrade[319]:   Running scriptlet: iptables-ebtables-1.8.2-9.el8.x86_64              503/1517
[   84.193502] upgrade[319]:   Running scriptlet: rpcbind-1.2.5-3.el8.x86_64                        504/1517
[   84.195218] upgrade[319]:   Upgrading        : rpcbind-1.2.5-3.el8.x86_64                        504/1517
[  126.059762] systemd-journald[201]: Sent WATCHDOG=1 notification.
[  186.059747] systemd-journald[201]: Sent WATCHDOG=1 notification.
[  296.059866] systemd-journald[201]: Sent WATCHDOG=1 notification.
[  366.059756] systemd-journald[201]: Sent WATCHDOG=1 notification.
[  486.059753] systemd-journald[201]: Sent WATCHDOG=1 notification.

Comment 5 Jiri Stransky 2019-05-14 13:22:45 UTC
The 2nd attempt was with as-many-as-possible OpenStack packages removed before the upgrade. The 3rd attempt was also with the packages removed and also with Leapp taken from the Leapp team's Copr repository instead of from subscription. But it seems to hang always, and it prints a different RPM as upgrading/installing last, so it doesn't look like it's caused by upgrade of any particular RPM.

The latest attempt:

[   90.226146] upgrade[319]:   Upgrading        : device-mapper-multipath-libs-0.7.8-7.el8.x86_6    509/1532
[   90.227339] upgrade[319]:   Running scriptlet: device-mapper-multipath-libs-0.7.8-7.el8.x86_6    509/1532
[   90.228502] upgrade[319]:   Upgrading        : device-mapper-event-8:1.02.155-6.el8.x86_64       510/1532
[   90.229685] upgrade[319]:   Running scriptlet: device-mapper-event-8:1.02.155-6.el8.x86_64       510/1532
[   90.230916] upgrade[319]:   Upgrading        : lvm2-libs-8:2.03.02-6.el8.x86_64                  511/1532
[   90.232097] upgrade[319]:   Running scriptlet: lvm2-libs-8:2.03.02-6.el8.x86_64                  511/1532
[   90.233332] upgrade[319]:   Upgrading        : lvm2-8:2.03.02-6.el8.x86_64                       512/1532
[   90.234512] upgrade[319]:   Running scriptlet: lvm2-8:2.03.02-6.el8.x86_64                       512/1532
[   90.235703] upgrade[319]:   Upgrading        : samba-common-libs-4.9.1-8.el8.x86_64              513/1532
[   90.236892] upgrade[319]:   Running scriptlet: samba-common-libs-4.9.1-8.el8.x86_64              513/1532
[   90.238110] upgrade[319]:   Upgrading        : libwbclient-4.9.1-8.el8.x86_64                    514/1532
[   90.239290] upgrade[319]:   Upgrading        : samba-client-libs-4.9.1-8.el8.x86_64              515/1532
[   90.240565] upgrade[319]:   Running scriptlet: samba-client-libs-4.9.1-8.el8.x86_64              515/1532
[   90.241800] upgrade[319]:   Upgrading        : libsmbclient-4.9.1-8.el8.x86_64                   516/1532
[   90.242996] upgrade[319]:   Running scriptlet: libsmbclient-4.9.1-8.el8.x86_64                   516/1532
[   90.244199] upgrade[319]:   Installing       : network-scripts-10.00.1-1.el8.x86_64              517/1532
[   90.245442] upgrade[319]:   Running scriptlet: network-scripts-10.00.1-1.el8.x86_64              517/1532
[   90.246640] upgrade[319]:   Installing       : dhcp-client-12:4.3.6-30.el8.x86_64                518/1532
[   90.247829] upgrade[319]:   Upgrading        : gnutls-dane-3.6.5-2.el8.x86_64                    519/1532
[   94.430563] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  224.526152] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  294.526160] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  404.526152] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  474.526173] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  584.526150] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  654.526180] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  764.526172] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  834.526180] systemd-journald[200]: Sent WATCHDOG=1 notification.
[  944.526152] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1014.526147] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1124.526153] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1194.526149] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1304.526152] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1374.526145] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1484.526178] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1554.526188] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1664.526176] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1734.526244] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1844.526156] systemd-journald[200]: Sent WATCHDOG=1 notification.
[ 1914.526161] systemd-journald[200]: Sent WATCHDOG=1 notification.

Comment 6 Jiri Stransky 2019-05-14 13:44:00 UTC
This is blocking further progress on upgrade. Reported bug 1709890 for Leapp.

Comment 12 Michal Reznik 2019-05-20 13:27:34 UTC
Took memory dump of the hanging VM and see:

crash> ps -p
...
PID: 0      TASK: ffffffff8ea12780  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff9205061fc740  CPU: 1   COMMAND: "systemd"
  PID: 323    TASK: ffff920720a68000  CPU: 0   COMMAND: "upgrade"
   PID: 327    TASK: ffff92071507df00  CPU: 0   COMMAND: "upgrade"
    PID: 331    TASK: ffff92071507c740  CPU: 0   COMMAND: "systemd-nspawn"
     PID: 332    TASK: ffff9207142d17c0  CPU: 0   COMMAND: "leapp"
      PID: 674    TASK: ffff92071124df00  CPU: 0   COMMAND: "leapp"
       PID: 680    TASK: ffff92071124af80  CPU: 0   COMMAND: "dnf"
        PID: 1329   TASK: ffff9207142d4740  CPU: 1   COMMAND: "sh"
         PID: 1330   TASK: ffff92071507af80  CPU: 1   COMMAND: "grub2-switch-to"
          PID: 1366   TASK: ffff92071126c740  CPU: 0   COMMAND: "grub2-mkconfig"
           PID: 1620   TASK: ffff9207112497c0  CPU: 1   COMMAND: "30_os-prober"
            PID: 1626   TASK: ffff92070fcc0000  CPU: 0   COMMAND: "30_os-prober"
             PID: 1627   TASK: ffff92070fc92f80  CPU: 1   COMMAND: "os-prober"
              PID: 1677   TASK: ffff9207142d5f00  CPU: 0   COMMAND: "os-prober"
               PID: 1678   TASK: ffff92070fc90000  CPU: 1   COMMAND: "os-prober"
                PID: 1706   TASK: ffff92070fcc2f80  CPU: 0   COMMAND: "50mounted-tests"
                 PID: 1714   TASK: ffff92070fcc8000  CPU: 1   COMMAND: "50mounted-tests"
                  PID: 1718   TASK: ffff92070fcc5f00  CPU: 1   COMMAND: "dmsetup"


"dmsetup" is waiting on semaphore:

crash> bt 1718
PID: 1718   TASK: ffff92070fcc5f00  CPU: 1   COMMAND: "dmsetup"
 #0 [ffff9e65c5c17c28] __schedule at ffffffff8e01b704
 #1 [ffff9e65c5c17cc0] schedule at ffffffff8e01bd18
 #2 [ffff9e65c5c17cc8] do_semtimedop at ffffffff8db4c03a
 #3 [ffff9e65c5c17f38] do_syscall_64 at ffffffff8d80424b
 #4 [ffff9e65c5c17f50] entry_SYSCALL_64_after_hwframe at ffffffff8e2000ad
    RIP: 00007fbbc36ab88b  RSP: 00007ffe5d7cf428  RFLAGS: 00000206
    RAX: ffffffffffffffda  RBX: 000000000d4d6ad7  RCX: 00007fbbc36ab88b
    RDX: 0000000000000001  RSI: 00007ffe5d7cf442  RDI: 0000000000000000
    RBP: 00007ffe5d7cf442   R8: 00007fbbc39b24a0   R9: 00007ffe5d7cf340
    R10: 0000555bca354250  R11: 0000000000000206  R12: 0000555bca354250
    R13: 00007ffe5d7cf6e0  R14: 0000555bca353a48  R15: 0000000000000001
    ORIG_RAX: 0000000000000041  CS: 0033  SS: 002b
crash> 

which is supposed to be released by udev.

While taking a look at the system storage:

[root@controller-0 ~]# blkid
/dev/vda1: UUID="2019-03-08-11-09-02-00" LABEL="config-2" TYPE="iso9660" 
/dev/vda2: LABEL="img-rootfs" UUID="de7e67c7-6390-4fba-8908-8e6641c90740" TYPE="xfs" 
/dev/loop2: UUID="4sdrKA-MWNe-hzxS-JdGY-YAM5-9pOB-jcLAzi" TYPE="LVM2_member"

I see this /dev/vda1 TYPE="iso9660" device which as explained is some kind of baremetal-as-a-service config drive used by cloud-init. When I remove this partition the rpm upgrade passes (the upgrade process is getting stuck later with not being able to mount root partition). Consulted this issue with Zdenek Kabelac from storage team and as per him "dmsetup" is using a semaphore only in case it is creating a device. The question is why is "grub2-tools" creating a device? Maybe some virtual device to check "iso9660" partition?

Anyway as a "workaround", instead of blowing the "vda1" partition up, we can use "DM_DISABLE_UDEV=1" when invoking the rpm upgrade process. Not sure what the implications would be in the upgrade process. Maybe it is even a "solution" in this case.

This needs to be investigated further.

Comment 14 Jiri Stransky 2019-06-19 12:15:57 UTC
We have identified more granular individual issues (see depends-on list in this BZ). I'm renaming this BZ to reflect that it's essentially a tracking BZ for the individual issues we've hit.

Comment 15 Jiri Stransky 2019-08-28 12:25:49 UTC
Closing this tracking BZ as i've unwrapped the "depends on" list into the main OSP 15 upgrades tracking BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1727807

*** This bug has been marked as a duplicate of bug 1727807 ***


Note You need to log in before you can comment on or make changes to this bug.