1325702 – Undercloud upgrade fails with ['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 6

Bug 1325702 - Undercloud upgrade fails with ['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 6

Summary: Undercloud upgrade fails with ['dib-run-parts', '/usr/libexec/os-refresh-conf...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	instack-undercloud
Sub Component:
Version:	8.0 (Liberty)
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	async
Target Release:	8.0 (Liberty)
Assignee:	Brad P. Crochet
QA Contact:	Dan Yasny
Docs Contact:
URL:
Whiteboard:
Depends On:	1326644
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-10 19:09 UTC by Dan Yasny
Modified:	2016-06-15 12:39 UTC (History)
CC List:	5 users (show)
Fixed In Version:	instack-undercloud-2.2.7-5.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-15 12:39:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1229	0	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 8 director Bug Fix Advisory	2016-06-15 16:38:45 UTC

Description Dan Yasny 2016-04-10 19:09:13 UTC

Description of problem:

Notice: Finished catalog run in 89.68 seconds
+ rc=6
+ set -e
+ echo 'puppet apply exited with exit code 6'
puppet apply exited with exit code 6
+ '[' 6 '!=' 2 -a 6 '!=' 0 ']'
+ exit 6
[2016-04-10 04:43:38,601] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 6]
 
[2016-04-10 04:43:38,602] (os-refresh-config) [ERROR] Aborting...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 815, in install
    _run_orc(instack_env)
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 699, in _run_orc
    _run_live_command(args, instack_env, 'os-refresh-config')
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 370, in _run_live_command
    raise RuntimeError('%s failed. See log for details.' % name)
RuntimeError: os-refresh-config failed. See log for details.
Command 'instack-install-undercloud' returned non-zero exit status 1

Version-Release number of selected component (if applicable):
openstack-heat-engine-5.0.1-5.el7ost.noarch
openstack-nova-scheduler-12.0.2-5.el7ost.noarch
openstack-ironic-api-4.2.2-4.el7ost.noarch
openstack-selinux-0.6.58-1.el7ost.noarch
openstack-swift-container-2.5.0-2.el7ost.noarch
python-django-openstack-auth-2.0.1-1.2.el7ost.noarch
openstack-neutron-common-7.0.1-15.el7ost.noarch
openstack-dashboard-theme-8.0.1-2.el7ost.noarch
openstack-tempest-liberty-20160317.1.el7ost.noarch
openstack-nova-novncproxy-12.0.2-5.el7ost.noarch
openstack-ceilometer-collector-5.0.2-2.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-2.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.5-1.el7ost.noarch
openstack-swift-plugin-swift3-1.9-1.el7ost.noarch
openstack-ceilometer-common-5.0.2-2.el7ost.noarch
openstack-tripleo-common-0.3.1-1.el7ost.noarch
openstack-heat-api-5.0.1-5.el7ost.noarch
openstack-nova-cert-12.0.2-5.el7ost.noarch
openstack-nova-api-12.0.2-5.el7ost.noarch
openstack-neutron-openvswitch-7.0.1-15.el7ost.noarch
openstack-glance-11.0.1-4.el7ost.noarch
openstack-keystone-8.0.1-1.el7ost.noarch
openstack-swift-proxy-2.5.0-2.el7ost.noarch
openstack-swift-object-2.5.0-2.el7ost.noarch
openstack-tuskar-0.4.18-5.el7ost.noarch
openstack-tripleo-image-elements-0.9.9-1.el7ost.noarch
openstack-swift-2.5.0-2.el7ost.noarch
openstack-ironic-common-4.2.2-4.el7ost.noarch
openstack-nova-common-12.0.2-5.el7ost.noarch
openstack-heat-common-5.0.1-5.el7ost.noarch
openstack-heat-api-cfn-5.0.1-5.el7ost.noarch
openstack-nova-compute-12.0.2-5.el7ost.noarch
openstack-nova-conductor-12.0.2-5.el7ost.noarch
openstack-neutron-7.0.1-15.el7ost.noarch
openstack-ceilometer-central-5.0.2-2.el7ost.noarch
openstack-ceilometer-alarm-5.0.2-2.el7ost.noarch
openstack-swift-account-2.5.0-2.el7ost.noarch
openstack-tripleo-0.0.7-1.el7ost.noarch
openstack-dashboard-8.0.1-2.el7ost.noarch
openstack-neutron-ml2-7.0.1-15.el7ost.noarch
openstack-ceilometer-api-5.0.2-2.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-tuskar-ui-0.4.0-5.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch
openstack-ceilometer-polling-5.0.2-2.el7ost.noarch
python-openstackclient-1.7.2-1.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.1-5.el7ost.noarch
openstack-nova-console-12.0.2-5.el7ost.noarch
openstack-ironic-conductor-4.2.2-4.el7ost.noarch
openstack-ironic-inspector-2.2.5-2.el7ost.noarch
openstack-puppet-modules-7.0.17-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch
openstack-ceilometer-notification-5.0.2-2.el7ost.noarch
instack-0.0.8-2.el7ost.noarch
instack-undercloud-2.2.7-4.el7ost.noarch



How reproducible:
always

Steps to Reproduce:
1. deploy 7.3
2. switch to rhos-release 8 puddle or poodle
3. yum -y update
4. openstack undercloud upgrade

Actual results:
see the error above

Expected results:
upgrade should work

Additional info:
[stack@instack ~]$ dib-run-parts /usr/libexec/os-refresh-config/configure.d
dib-run-parts Sun Apr 10 15:07:44 EDT 2016 Running /usr/libexec/os-refresh-config/configure.d/00-apply-selinux-policy
+ set -o pipefail
+ '[' -x /usr/sbin/semanage ']'
+ semodule -i /opt/stack/selinux-policy/ipxe.pp
semodule: SELinux policy is not managed or store cannot be accessed.

Same result with setenforce 0 and 1

Also tried to reboot the undercloud node, rerun the upgrade command several times, switch between poodles and puddles.

Comment 2 James Slagle 2016-04-11 13:36:08 UTC

what is the output of "sudo sestatus" on the undercloud?

Comment 4 Dan Yasny 2016-04-12 06:51:07 UTC

My suspicion was that I missed the steps in https://bugzilla.redhat.com/show_bug.cgi?id=1312143, since the UC had SSL enabled.

I redeployed the entire stack again, to have a clean experiment, installed 7.3 again as follows:
 Deployment command: openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 1   --neutron-tunnel-types vxlan,gre --neutron-network-type vxlan,gre --neutron-network-vlan-ranges datacentre:118:143 --neutron-bridge-mappings datacentre:br-ex  --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e ~/ssl-heat-templates/environments/enable-tls.yaml -e ~/ssl-heat-templates/environments/inject-trust-anchor.yaml

populated the OC with 5 tenants/instances/volumes/etc

started a clean session, to avoid any remnants of overcloudrc being exported by the population script

commands:
sudo rhos-release -P 8-director
sudo yum update -y
sudo cp cacert.pem /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust extract
openstack undercloud upgrade

... this failed. I reran the command again, as a potential workaround for another bug where this command fails on the first run, which failed again:

Notice: /Stage[main]/Heat::Deps/Anchor[heat::service::end]: Dependency Keystone_user[admin] has failures: true
Warning: /Stage[main]/Heat::Deps/Anchor[heat::service::end]: Skipping because of failed dependencies
Notice: Finished catalog run in 193.55 seconds
+ rc=6
+ set -e
+ echo 'puppet apply exited with exit code 6'
puppet apply exited with exit code 6
+ '[' 6 '!=' 2 -a 6 '!=' 0 ']'
+ exit 6
[2016-04-12 02:27:46,784] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 6]

[2016-04-12 02:27:46,784] (os-refresh-config) [ERROR] Aborting...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 815, in install
    _run_orc(instack_env)
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 699, in _run_orc
    _run_live_command(args, instack_env, 'os-refresh-config')
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 370, in _run_live_command
    raise RuntimeError('%s failed. See log for details.' % name)
RuntimeError: os-refresh-config failed. See log for details.
Command 'instack-install-undercloud' returned non-zero exit status 1

[stack@instack ~]$ dib-run-parts /usr/libexec/os-refresh-config/configure.d
dib-run-parts Tue Apr 12 02:44:37 EDT 2016 Running /usr/libexec/os-refresh-config/configure.d/00-apply-selinux-policy
+ set -o pipefail
+ '[' -x /usr/sbin/semanage ']'
+ semodule -i /opt/stack/selinux-policy/ipxe.pp
semodule: SELinux policy is not managed or store cannot be accessed.

[stack@instack ~]$ sudo sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

I am keeping the system available for further troubleshooting, please ping me for access

Comment 5 Marius Cornea 2016-04-12 09:33:25 UTC

I checked the environment and the br-ctlplane interface was missing the undercloud_public_vip address. Since the undercloud is SSL enabled then it was unable to reach the public APIs that use undercloud_public_vip. After manually adding the undercloud_public_vip to the br-ctlplane interface and rerunning the undercloud upgrade command it completed successfully:

sudo ip addr add 192.0.2.2/24 dev br-ctlplane
openstack undercloud upgrade

Comment 6 Dan Yasny 2016-04-12 14:54:06 UTC

We just confirmed it, looks like undercloud upgrade or the yum update beforehand removes the UC VIP address.

Maybe the BZ should be renamed...

Also, if this is the case, why does UC upgrade work on most setups?

Comment 7 Marius Cornea 2016-04-12 15:12:37 UTC

Indeed, we can see that the br-ctlplane interface is missing the VIPs on all environments but the undercloud upgrade fails only on SSL deployments since these are using a VIP for the OS_AUTH_URL.

The VIPs are set by keepalived on the br-ctlplane interface and on my upgraded overcloud keepalived was reporting FAULT state for the VIP instances after the upgrade:

 Keepalived_vrrp[1256]: Kernel is reporting: interface br-ctlplane DOWN
 Keepalived_vrrp[1256]: VRRP_Instance(51) Entering FAULT STATE
 Keepalived_vrrp[1256]: VRRP_Instance(51) removing protocol VIPs.
 Keepalived_vrrp[1256]: Netlink: error: No such device, type=(21), seq=1460464571, pid=0
 Keepalived_vrrp[1256]: VRRP_Instance(51) Now in FAULT state
 Keepalived_vrrp[1256]: Kernel is reporting: interface br-ctlplane DOWN
 Keepalived_vrrp[1256]: VRRP_Instance(52) Entering FAULT STATE
 Keepalived_vrrp[1256]: VRRP_Instance(52) removing protocol VIPs.
 Keepalived_vrrp[1256]: Netlink: error: No such device, type=(21), seq=1460464572, pid=0
 Keepalived_vrrp[1256]: VRRP_Instance(52) Now in FAULT state


After doing 'systemctl restart keepalived' the VIPs were set to the br-ctlplane interface:

stack@instack:~>>> ip a s dev br-ctlplane
16: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:c2:0a:86:bf:66 brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.1/24 brd 192.0.2.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.0.2.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.0.2.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::2c2:aff:fe86:bf66/64 scope link 
       valid_lft forever preferred_lft forever

Comment 8 Omri Hochman 2016-04-15 15:02:48 UTC

First failure occurred due to missing steps (for undercloud with ssl) 

Note : if undercloud with ssl, run: 
   -  sudo cp cacert.pem /etc/pki/ca-trust/source/anchors/
   -  sudo update-ca-trust extract


Second part of this bug, I believe (need to check on that), handled in: 
https://bugzilla.redhat.com/show_bug.cgi?id=1326644


Switching QA Contact to: Dan Yasny to verify it.

Comment 9 Dan Yasny 2016-04-28 14:17:16 UTC

The fix for https://bugzilla.redhat.com/show_bug.cgi?id=1326644 will fix this issue as well

Comment 11 Omri Hochman 2016-06-10 14:38:05 UTC

Verified with instack-undercloud-2.2.7-6.el7ost.noarch     

Upgraded 7.3GA to 8.0 worked : https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/E2E/view/rhos7-upgrade-on-BM/job/BM_rhos18_Upgrade_7.3_to_8.0_UCSSL_OCSSL/7/consoleFull

Comment 13 errata-xmlrpc 2016-06-15 12:39:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1229

Note You need to log in before you can comment on or make changes to this bug.