Description of problem: ----------------------- RHOS-11 minor updates fails: openstack stack failures list overcloud overcloud.AllNodesDeploySteps.ControllerDeployment_Step3.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 124cac89-bd17-44d9-8192-02d21bdfb6d0 status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 deploy_stdout: | ... Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/10-gnocchi_wsgi.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/10-horizon_vhost.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/10-panko_wsgi.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/openstack-dashboard.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/remoteip.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/remoteip.load]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/status.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/status.load]/ensure: removed Notice: /Stage[main]/Apache/Concat[/etc/httpd/conf/ports.conf]/File[/etc/httpd/conf/ports.conf]/content: content changed '{md5}35f25b87e1f8ad89b39518066549ea6e' to '{md5}5d60f0a60394ddc109afc8df5168fa5b' Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 1 events (truncated, view all with --long) deploy_stderr: | ... Warning: Scope(Oslo::Messaging::Rabbit[keystone_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[glance_api_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[glance_registry_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[neutron_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[ceilometer_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[aodh_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[sahara_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications. Error: /Stage[main]/Nova::Cell_v2::Simple_setup/Nova::Cell_v2::Cell[default]/Exec[nova-cell_v2-cell-default]/unless: Check "nova-manage cell_v2 list_cells | grep -q default" exceeded timeout Error: Failed to apply catalog: Command: 'openstack ["domain", "list", "--quiet", "--format", "csv", []]' has been running for more than 40 seconds (tried 4, for a total of 170 seconds) (truncated, view all with --long) Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-tripleo-heat-templates-6.0.0-0.20170222195630.46117f4.el7ost.noarch openstack-nova-cert-15.0.1-0.20170224183627.6087675.el7ost.noarch python-novaclient-7.1.0-0.20170208162119.f6e0128.el7ost.noarch python-nova-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-placement-api-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-common-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-scheduler-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-novncproxy-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-conductor-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-api-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-console-15.0.1-0.20170224183627.6087675.el7ost.noarch puppet-nova-10.3.0-0.20170220173041.97656fb.el7ost.noarch openstack-nova-compute-15.0.1-0.20170224183627.6087675.el7ost.noarch Steps to Reproduce: ------------------- 1. Deploy RHOS-1 (2017-02-24.2) 2. Setup latest repos on undercloud and overcloud 3. Update undercloud 4. Update overcloud Actual results: --------------- Update fails
So the error was from the nova cell command failing to list the domains: Mar 3 10:08:24 localhost os-collect-config: #033[1;31mError: Failed to apply catalog: Command: 'openstack ["domain", "list", "--quiet", "--format", "csv", []]' has been running for more than 40 seconds (tried 4, for a total of 170 seconds)#033[0m Mar 3 10:08:24 localhost os-collect-config: [2017-03-03 15:08:24,550] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/57c9d1cf-5bf6-4074-93a9-37215261fc1e.pp. [1] The cluster is okay except for haproxy (which has constraints with the VIPs so those are down as a consequence): Cluster name: tripleo_cluster Stack: corosync Current DC: controller-1 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum Last updated: Fri Mar 3 15:34:07 2017 Last change: Fri Mar 3 14:51:10 2017 by root via cibadmin on controller-0 *** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services 3 nodes and 20 resources configured Online: [ controller-0 controller-1 controller-2 ] Full list of resources: Master/Slave Set: galera-master [galera] (unmanaged) galera (ocf::heartbeat:galera): Master controller-2 (unmanaged) galera (ocf::heartbeat:galera): Master controller-1 (unmanaged) galera (ocf::heartbeat:galera): Master controller-0 (unmanaged) Clone Set: rabbitmq-clone [rabbitmq] (unmanaged) rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started controller-2 (unmanaged) rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started controller-1 (unmanaged) rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started controller-0 (unmanaged) Master/Slave Set: redis-master [redis] (unmanaged) redis (ocf::heartbeat:redis): Slave controller-2 (unmanaged) redis (ocf::heartbeat:redis): Master controller-1 (unmanaged) redis (ocf::heartbeat:redis): Slave controller-0 (unmanaged) ip-192.168.24.11 (ocf::heartbeat:IPaddr2): Stopped (unmanaged) ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Stopped (unmanaged) ip-172.17.1.18 (ocf::heartbeat:IPaddr2): Stopped (unmanaged) ip-172.17.1.11 (ocf::heartbeat:IPaddr2): Stopped (unmanaged) ip-172.17.3.12 (ocf::heartbeat:IPaddr2): Stopped (unmanaged) ip-172.17.4.18 (ocf::heartbeat:IPaddr2): Stopped (unmanaged) Clone Set: haproxy-clone [haproxy] (unmanaged) Stopped: [ controller-0 controller-1 controller-2 ] openstack-cinder-backup (systemd:openstack-cinder-backup): Started controller-1 (unmanaged) openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0 (unmanaged) Haproxy fails to start because: messages:Mar 3 09:15:21 localhost systemd: Started Cluster Controlled haproxy. messages:Mar 3 09:15:21 localhost systemd: Starting Cluster Controlled haproxy... messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: [WARNING] 061/141521 (284010) : Setting tune.ssl.default-dh-param to 1024 by default, if your workload permits it you should set it to at least 2048. Please set a value >= 1024 to make this warning disappear. messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: [ALERT] 061/141521 (284010) : Starting frontend GLOBAL: error when trying to preserve previous UNIX socket [/var/run/haproxy.sock]messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: haproxy-systemd-wrapper: exit, haproxy RC=1 messages:Mar 3 09:15:21 localhost systemd: haproxy.service: main process exited, code=exited, status=1/FAILURE messages:Mar 3 09:15:21 localhost systemd: Unit haproxy.service entered failed state. messages:Mar 3 09:15:21 localhost systemd: haproxy.service failed. messages:Mar 3 09:15:23 localhost crmd[283248]: notice: Result of start operation for haproxy on controller-0: 7 (not running) messages:Mar 3 09:15:26 localhost crmd[283248]: notice: Result of stop operation for haproxy on controller-0: 0 (ok) The corresponding configuration is: Haproxy Configuration says: ssl-default-bind-ciphers !SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!DES ssl-default-bind-options no-sslv3 stats socket /var/run/haproxy.sock mode 600 level user stats timeout 2m user haproxy Seems like a selinux issue: var/log/audit/audit.log:type=AVC msg=audit(1488550521.483:4475): avc: denied { link } for pid=284010 comm="haproxy" name="haproxy.sock" dev="tmpfs" ino=330803 scontext=system_u:system_r:haproxy_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=sock_file var/log/messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: [ALERT] 061/141521 (284010) : Starting frontend GLOBAL: error when trying to preserve previous UNIX socket [/var/run/haproxy.sock]
The file is mislabeled. We'll have to assign a context to it; let me check upstream.
./policy/modules/contrib/rhcs.fc:/var/run/haproxy\.sock.* -- gen_context(system_u:object_r:haproxy_var_run_t,s0) ... going back to RHEL 7.2.z Whatever's creating the file needs to be calling restorecon; it's being created with the incorrect label. [root@localhost ~]# touch /var/run/haproxy.sock [root@localhost ~]# ls -lZ !$ ls -lZ /var/run/haproxy.sock -rw-r--r--. root root unconfined_u:object_r:var_run_t:s0 /var/run/haproxy.sock [root@localhost ~]# restorecon !$ restorecon /var/run/haproxy.sock [root@localhost ~]# ls -lZ /var/run/haproxy.sock -rw-r--r--. root root unconfined_u:object_r:haproxy_var_run_t:s0 /var/run/haproxy.sock
So we added the stat socket for stats in haproxy, by adding this to the global section: global ... stats socket /var/run/haproxy.sock mode 600 level user stats timeout 2m ... Does haproxy need to be modified to call restorecon or is something else needed here?
The stats socket differs from the default in RHEL-7.3 branch: # turn on stats unix socket stats socket /var/lib/haproxy/stats
[root@localhost ~]# touch /var/lib/haproxy/sock [root@localhost ~]# ls -lZ !$ ls -lZ /var/lib/haproxy/sock -rw-r--r--. root root unconfined_u:object_r:haproxy_var_lib_t:s0 /var/lib/haproxy/sock
[root@localhost ~]# ls -lZ /var/lib/haproxy [root@localhost ~]# touch /var/lib/haproxy/stats [root@localhost ~]# ls -lZ /var/lib/haproxy -rw-r--r--. root root unconfined_u:object_r:haproxy_var_lib_t:s0 stats
1) Follow the location in RHEL for the stats socket (/var/lib/haproxy/stats), or 2) Add a patch to create and call restorecon somewhere in puppet-haproxy for the stats socket. I believe that either of these patches which are likely specific to running on RHEL. haproxy_var_run_t and haproxy_var_lib_t are both allowed to be utilized by haproxy.
The problem with (2) is that it might not work if something unlinks /var/run/haproxy.sock
I assume you'd want to change this downstream-only. Other distributions might be fine with /var/run/haproxy.sock.
Thanks Lon, since the wrong path is generated via puppet-tripleo I will fix it there
Note that I will go for option 1) and then later on make the path a parameter in case other distros/operators might need a different path.
ACK, sounds good.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days