Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1582338

Summary: OSP10: undercloud update gets stuck when updating from GA to latest
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: Lukas Bezdicka <lbezdick>
Status: CLOSED CURRENTRELEASE QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: ccamacho, cylopez, dbecker, eelena, lbezdick, mburns, morazi, mowens, pliu, rcernin, segutier, sgolovat, skramaja
Target Milestone: z12Keywords: ReleaseNotes, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-16 08:24:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1572543, 1596241    
Bug Blocks:    
Attachments:
Description Flags
undercloud_Update.log none

Description Marius Cornea 2018-05-25 00:15:24 UTC
Created attachment 1441310 [details]
undercloud_Update.log

Description of problem:
OSP10: undercloud update gets stuck when updating from GA(rhel 7.3) to latest(rhel 7.5):


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 GA
2. Minor update to OSP10 latest(including rhel update from 7.3 to 7.5)

Actual results:
Undercloud update gets stuck.

Expected results:
Undercloud upate is successful.

Additional info:
Side note: running 'sysctl -a' gets stuck:

[root@undercloud-0 ~]# sysctl -a
abi.vsyscall32 = 1
crypto.fips_enabled = 0
debug.exception-trace = 1
debug.kprobes-optimization = 1
dev.hpet.max-user-freq = 64
dev.mac_hid.mouse_button2_keycode = 97
dev.mac_hid.mouse_button3_keycode = 100
dev.mac_hid.mouse_button_emulation = 0
dev.parport.default.spintime = 500
dev.parport.default.timeslice = 200
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
dev.scsi.logging_level = 0
fs.aio-max-nr = 1048576
fs.aio-nr = 0


Attaching undercloud update output.

Comment 1 Marius Cornea 2018-05-25 00:23:47 UTC
Note: trying to generate the sosreport gets stuck as well.

Comment 2 Lukas Bezdicka 2018-05-29 09:41:02 UTC
[May29 05:09] INFO: task fuser:32191 blocked for more than 120 seconds.
[  +0.003212] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.003369] fuser           D ffff8804fbb14500     0 32191      1 0x00000084
[  +0.003227]  ffff880458f23a30 0000000000000086 ffff880228283ec0 ffff880458f23fd8
[  +0.003467]  ffff880458f23fd8 ffff880458f23fd8 ffff880228283ec0 ffff880512fb0a18
[  +0.003326]  ffff880512fb0a20 7fffffffffffffff ffff880228283ec0 ffff8804fbb14500
[  +0.003653] Call Trace:
[  +0.002500]  [<ffffffff8168c6f9>] schedule+0x29/0x70
[  +0.002937]  [<ffffffff8168a139>] schedule_timeout+0x239/0x2c0
[  +0.003023]  [<ffffffff810b18e6>] ? finish_wait+0x56/0x70
[  +0.002947]  [<ffffffff8168a882>] ? mutex_lock+0x12/0x2f
[  +0.003015]  [<ffffffff8128a052>] ? autofs4_wait+0x3f2/0x900
[  +0.002952]  [<ffffffff8168cad6>] wait_for_completion+0x116/0x170
[  +0.002973]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
[  +0.002821]  [<ffffffff8128b1cb>] autofs4_expire_wait+0x6b/0x110
[  +0.002882]  [<ffffffff81288282>] do_expire_wait+0x172/0x190
[  +0.002746]  [<ffffffff8128847f>] autofs4_d_manage+0x6f/0x170
[  +0.002733]  [<ffffffff812092e5>] follow_managed+0xb5/0x300
[  +0.002744]  [<ffffffff81209c4b>] lookup_fast+0x19b/0x2e0
[  +0.002722]  [<ffffffff8120c535>] path_lookupat+0x165/0x7a0
[  +0.002687]  [<ffffffff81686062>] ? avc_alloc_node+0x116/0x125
[  +0.002677]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[  +0.002715]  [<ffffffff8120f48f>] ? getname_flags+0x4f/0x1a0
[  +0.002705]  [<ffffffff8120cb9b>] filename_lookup+0x2b/0xc0
[  +0.002507]  [<ffffffff812105b7>] user_path_at_empty+0x67/0xc0
[  +0.002541]  [<ffffffff81114bb2>] ? from_kgid_munged+0x12/0x20
[  +0.002511]  [<ffffffff81203f9f>] ? cp_new_stat+0x14f/0x180
[  +0.002502]  [<ffffffff81210621>] user_path_at+0x11/0x20
[  +0.002452]  [<ffffffff81203a93>] vfs_fstatat+0x63/0xc0
[  +0.002338]  [<ffffffff81203ffe>] SYSC_newstat+0x2e/0x60
[  +0.002361]  [<ffffffff8111f486>] ? __audit_syscall_exit+0x1e6/0x280
[  +0.002492]  [<ffffffff812042de>] SyS_newstat+0xe/0x10
[  +0.002272]  [<ffffffff81697709>] system_call_fastpath+0x16/0x1b

Comment 3 Lukas Bezdicka 2018-05-29 09:52:10 UTC
[root@undercloud-0 ~]# ip netns exec qdhcp-8b0d5db6-e61f-435a-bf0f-fb8b91de50a6 ls /proc/sys/fs
aio-max-nr  binfmt_misc   dir-notify-enable  file-max  inode-nr     inotify           leases-enable  nfs      overflowgid  pipe-max-size         pipe-user-pages-soft  protected_symlinks  suid_dumpable
aio-nr      dentry-state  epoll              file-nr   inode-state  lease-break-time  mqueue         nr_open  overflowuid  pipe-user-pages-hard  protected_hardlinks   quota               xfs
[root@undercloud-0 ~]# ls /proc/sys/fs
... HANGS

Comment 4 Lukas Bezdicka 2018-05-29 09:52:46 UTC
[stack@undercloud-0 ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=9818928k,nr_inodes=2454732,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/vda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=1967712k,mode=700,uid=1001,gid=1001)
tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
proc on /run/netns/qdhcp-8b0d5db6-e61f-435a-bf0f-fb8b91de50a6 type proc (rw,nosuid,nodev,noexec,relatime)
proc on /run/netns/qdhcp-8b0d5db6-e61f-435a-bf0f-fb8b91de50a6 type proc (rw,nosuid,nodev,noexec,relatime)

Comment 5 Lukas Bezdicka 2018-05-29 10:04:15 UTC
[root@undercloud-0 ~]# lsof | grep proc | grep fs
systemd       1                   root   23r      DIR               0,36         0      12511 /proc/sys/fs/binfmt_misc
systemd       1 27015             root   23r      DIR               0,36         0      12511 /proc/sys/fs/binfmt_misc
kdevtmpfs    17                   root  txt   unknown                                         /proc/17/exe
fsnotify_    33                   root  txt   unknown                                         /proc/33/exe
xfsalloc    279                   root  txt   unknown                                         /proc/279/exe
xfs_mru_c   280                   root  txt   unknown                                         /proc/280/exe
xfs-buf/v   281                   root  txt   unknown                                         /proc/281/exe
xfs-data/   282                   root  txt   unknown                                         /proc/282/exe
xfs-conv/   283                   root  txt   unknown                                         /proc/283/exe
xfs-cil/v   284                   root  txt   unknown                                         /proc/284/exe
xfs-recla   285                   root  txt   unknown                                         /proc/285/exe
xfs-log/v   286                   root  txt   unknown                                         /proc/286/exe
xfs-eofbl   287                   root  txt   unknown                                         /proc/287/exe
xfsaild/v   288                   root  txt   unknown                                         /proc/288/exe
ls        11180                  stack    3r      DIR                0,3         0       8672 /proc/sys/fs
ls        20238                   root    3r      DIR                0,3         0       8672 /proc/sys/fs
sysctl    24338                   root    4r      DIR                0,3         0       8672 /proc/sys/fs
sysctl    32648                   root    4r      DIR                0,3         0       8672 /proc/sys/fs

Comment 6 Lukas Bezdicka 2018-05-29 10:08:13 UTC
I realized this is probably something with systemd. I Tried downgrade:
yum downgrade systemd* libgudev*

and system got unstuck.

Comment 7 Lukas Bezdicka 2018-05-29 10:14:22 UTC
Resolving Dependencies
--> Running transaction check
---> Package libgudev1.x86_64 0:219-42.el7_4.10 will be a downgrade
---> Package libgudev1.x86_64 0:219-57.el7 will be erased
---> Package systemd.x86_64 0:219-42.el7_4.10 will be a downgrade
---> Package systemd.x86_64 0:219-57.el7 will be erased
---> Package systemd-libs.i686 0:219-42.el7_4.10 will be a downgrade
---> Package systemd-libs.x86_64 0:219-42.el7_4.10 will be a downgrade
---> Package systemd-libs.i686 0:219-57.el7 will be erased
---> Package systemd-libs.x86_64 0:219-57.el7 will be erased
---> Package systemd-sysv.x86_64 0:219-42.el7_4.10 will be a downgrade
---> Package systemd-sysv.x86_64 0:219-57.el7 will be erased
--> Finished Dependency Resolution

Dependencies Resolved

===================================================================================================================================================================================================================
 Package                                          Arch                                       Version                                             Repository                                                   Size
===================================================================================================================================================================================================================
Downgrading:
 libgudev1                                        x86_64                                     219-42.el7_4.10                                     rhelosp-rhel-7.5-server                                      85 k
 systemd                                          x86_64                                     219-42.el7_4.10                                     rhelosp-rhel-7.5-server                                     5.2 M
 systemd-libs                                     i686                                       219-42.el7_4.10                                     rhelosp-rhel-7.5-server                                     378 k
 systemd-libs                                     x86_64                                     219-42.el7_4.10                                     rhelosp-rhel-7.5-server                                     378 k
 systemd-sysv                                     x86_64                                     219-42.el7_4.10                                     rhelosp-rhel-7.5-server                                      72 k

Transaction Summary
===================================================================================================================================================================================================================
Downgrade  5 Packages

Comment 8 Lukas Bezdicka 2018-05-30 10:07:10 UTC
Reproduced the issue on Overcloud too.

Comment 9 Lukas Bezdicka 2018-05-30 11:49:40 UTC
for i in `nova list|awk '/ACTIVE/ {print $(NF-1)}' |awk -F"=" '{print $NF}'`; do echo $i; ssh -o StrictHostKeyChecking=no heat-admin@$i "sudo yum versionlock del lib* system* ; sudo yum versionlock add systemd-219-42.el7_4.10 systemd-libs-219-42.el7_4.10 libgudev1-219-42.el7_4.10 systemd-sysv-219-42.el7_4.10; sudo yum -y downgrade libgudev1 systemd*"; done

Comment 10 Marius Cornea 2018-06-01 01:47:09 UTC
Workaround that allowed me to complete the undercloud upgrade:

sudo yum install -y yum-plugin-versionlock
sudo yum versionlock add systemd systemd-libs libgudev1 systemd-sysv rsyslog

sudo systemctl stop 'openstack-*' 'neutron-*' httpd
sudo yum update python-tripleoclient -y
openstack undercloud upgrade

## wait for the upgrade to fail because of:
2018-05-31 21:24:06 - Error: Could not start Service[docker]: Execution of '/bin/systemctl start docker' returned 1: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
2018-05-31 21:24:06 - Error: /Stage[main]/Tripleo::Profile::Base::Docker_registry/Service[docker]/ensure: change from stopped to running failed: Could not start Service[docker]: Execution of '/bin/systemctl start docker' returned 1: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

## delete the version lock and update the remaining packages
sudo yum versionlock del systemd systemd-libs libgudev1 systemd-sysv rsyslog
sudo yum update -y
reboot

## re-run undercloud upgrade
openstack undercloud upgrade

Comment 11 Marius Cornea 2018-06-01 22:30:03 UTC
Instructions for overcloud update:

apply https://review.openstack.org/#/c/571482/

## Before starting the minor update
source ~/stackrc
for address in $(openstack server list -f json | jq -r -c '.[] | .Networks' | grep -oP '[0-9.]+'); do \
  ssh -q -o StrictHostKeyChecking=no heat-admin@$address \
  'sudo yum install -y yum-plugin-versionlock; \
  sudo yum versionlock add systemd systemd-libs libgudev1 systemd-sysv rsyslog;'
done

## Run the overcloud minor update

## After completing the minor update
source ~/stackrc
for address in $(openstack server list -f json | jq -r -c '.[] | .Networks' | grep -oP '[0-9.]+'); do \
  ssh -q -o StrictHostKeyChecking=no heat-admin@$address \
  'sudo yum versionlock del systemd systemd-libs libgudev1 systemd-sysv rsyslog;
  sudo yum update -y'
done

## Reboot

Comment 12 Marius Cornea 2018-06-07 14:39:55 UTC
Note for undercloud upgrade:

if undercloud upgrade fails with:

2018-06-07 10:35:36 - Error: Could not start Service[nova-compute]: Execution of '/bin/systemctl start openstack-nova-compute' returned 1: Job for openstack-nova-compute.service failed because the control process exited with error code. See "systemctl status openstack-nova-compute.service" and "journalctl -xe" for details.
2018-06-07 10:35:36 - Error: /Stage[main]/Nova::Compute/Nova::Generic_service[compute]/Service[nova-compute]/ensure: change from stopped to running failed: Could not start Service[nova-compute]: Execution of '/bin/systemctl start openstack-nova-compute' returned 1: Job for openstack-nova-compute.service failed because the control process exited with error code. See "systemctl status openstack-nova-compute.service" and "journalctl -xe" for details.

re-running 'openstack undercloud upgrade' after failure allows it to complete.

Comment 15 Lukas Bezdicka 2019-04-08 11:04:51 UTC
*** Bug 1557176 has been marked as a duplicate of this bug. ***

Comment 16 Lukas Bezdicka 2019-05-16 08:24:36 UTC
This issue was resolved in RHEL7.6.