Bug 1428915
| Summary: | [UPDATES] selinux prevents haproxy stat socket creation | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Yurii Prokulevych <yprokule> |
| Component: | puppet-tripleo | Assignee: | RHOS Maint <rhos-maint> |
| Status: | CLOSED ERRATA | QA Contact: | Tomas Jamrisko <tjamrisk> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 11.0 (Ocata) | CC: | berrange, bperkins, dasmith, eglynn, fdinitto, jjoyce, jschluet, kchamart, lbezdick, lhh, mburns, mcornea, mgrepl, michele, rhallise, rohara, royoung, sbauza, sferdjao, sgordon, slinaber, srevivo, tvignaud, ushkalim, vromanso |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 11.0 (Ocata) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | puppet-tripleo-6.3.0-5.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-05-17 20:04:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1394025 | ||
So the error was from the nova cell command failing to list the domains:
Mar 3 10:08:24 localhost os-collect-config: #033[1;31mError: Failed to apply catalog: Command: 'openstack ["domain", "list", "--quiet", "--format", "csv", []]' has been running for more than 40 seconds (tried 4, for a total of 170 seconds)#033[0m
Mar 3 10:08:24 localhost os-collect-config: [2017-03-03 15:08:24,550] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/57c9d1cf-5bf6-4074-93a9-37215261fc1e.pp. [1]
The cluster is okay except for haproxy (which has constraints with the VIPs so those are down as a consequence):
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-1 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum
Last updated: Fri Mar 3 15:34:07 2017 Last change: Fri Mar 3 14:51:10 2017 by root via cibadmin on controller-0
*** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services
3 nodes and 20 resources configured
Online: [ controller-0 controller-1 controller-2 ]
Full list of resources:
Master/Slave Set: galera-master [galera] (unmanaged)
galera (ocf::heartbeat:galera): Master controller-2 (unmanaged)
galera (ocf::heartbeat:galera): Master controller-1 (unmanaged)
galera (ocf::heartbeat:galera): Master controller-0 (unmanaged)
Clone Set: rabbitmq-clone [rabbitmq] (unmanaged)
rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started controller-2 (unmanaged)
rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started controller-1 (unmanaged)
rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started controller-0 (unmanaged)
Master/Slave Set: redis-master [redis] (unmanaged)
redis (ocf::heartbeat:redis): Slave controller-2 (unmanaged)
redis (ocf::heartbeat:redis): Master controller-1 (unmanaged)
redis (ocf::heartbeat:redis): Slave controller-0 (unmanaged)
ip-192.168.24.11 (ocf::heartbeat:IPaddr2): Stopped (unmanaged)
ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Stopped (unmanaged)
ip-172.17.1.18 (ocf::heartbeat:IPaddr2): Stopped (unmanaged)
ip-172.17.1.11 (ocf::heartbeat:IPaddr2): Stopped (unmanaged)
ip-172.17.3.12 (ocf::heartbeat:IPaddr2): Stopped (unmanaged)
ip-172.17.4.18 (ocf::heartbeat:IPaddr2): Stopped (unmanaged)
Clone Set: haproxy-clone [haproxy] (unmanaged)
Stopped: [ controller-0 controller-1 controller-2 ]
openstack-cinder-backup (systemd:openstack-cinder-backup): Started controller-1 (unmanaged)
openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0 (unmanaged)
Haproxy fails to start because:
messages:Mar 3 09:15:21 localhost systemd: Started Cluster Controlled haproxy.
messages:Mar 3 09:15:21 localhost systemd: Starting Cluster Controlled haproxy...
messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: [WARNING] 061/141521 (284010) : Setting tune.ssl.default-dh-param to 1024 by default, if your workload permits it you should set it to at least 2048. Please set a value >= 1024 to make this warning disappear.
messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: [ALERT] 061/141521 (284010) : Starting frontend GLOBAL: error when trying to preserve previous UNIX socket [/var/run/haproxy.sock]messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: haproxy-systemd-wrapper: exit, haproxy RC=1
messages:Mar 3 09:15:21 localhost systemd: haproxy.service: main process exited, code=exited, status=1/FAILURE
messages:Mar 3 09:15:21 localhost systemd: Unit haproxy.service entered failed state.
messages:Mar 3 09:15:21 localhost systemd: haproxy.service failed.
messages:Mar 3 09:15:23 localhost crmd[283248]: notice: Result of start operation for haproxy on controller-0: 7 (not running)
messages:Mar 3 09:15:26 localhost crmd[283248]: notice: Result of stop operation for haproxy on controller-0: 0 (ok)
The corresponding configuration is:
Haproxy Configuration says:
ssl-default-bind-ciphers !SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!DES
ssl-default-bind-options no-sslv3
stats socket /var/run/haproxy.sock mode 600 level user
stats timeout 2m
user haproxy
Seems like a selinux issue:
var/log/audit/audit.log:type=AVC msg=audit(1488550521.483:4475): avc: denied { link } for pid=284010 comm="haproxy" name="haproxy.sock" dev="tmpfs" ino=330803 scontext=system_u:system_r:haproxy_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=sock_file
var/log/messages:Mar 3 09:15:21 localhost haproxy-systemd-wrapper: [ALERT] 061/141521 (284010) : Starting frontend GLOBAL: error when trying to preserve previous UNIX socket [/var/run/haproxy.sock]
The file is mislabeled. We'll have to assign a context to it; let me check upstream. ./policy/modules/contrib/rhcs.fc:/var/run/haproxy\.sock.* -- gen_context(system_u:object_r:haproxy_var_run_t,s0) ... going back to RHEL 7.2.z Whatever's creating the file needs to be calling restorecon; it's being created with the incorrect label. [root@localhost ~]# touch /var/run/haproxy.sock [root@localhost ~]# ls -lZ !$ ls -lZ /var/run/haproxy.sock -rw-r--r--. root root unconfined_u:object_r:var_run_t:s0 /var/run/haproxy.sock [root@localhost ~]# restorecon !$ restorecon /var/run/haproxy.sock [root@localhost ~]# ls -lZ /var/run/haproxy.sock -rw-r--r--. root root unconfined_u:object_r:haproxy_var_run_t:s0 /var/run/haproxy.sock So we added the stat socket for stats in haproxy, by adding this to the global section: global ... stats socket /var/run/haproxy.sock mode 600 level user stats timeout 2m ... Does haproxy need to be modified to call restorecon or is something else needed here? The stats socket differs from the default in RHEL-7.3 branch:
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
[root@localhost ~]# touch /var/lib/haproxy/sock [root@localhost ~]# ls -lZ !$ ls -lZ /var/lib/haproxy/sock -rw-r--r--. root root unconfined_u:object_r:haproxy_var_lib_t:s0 /var/lib/haproxy/sock [root@localhost ~]# ls -lZ /var/lib/haproxy [root@localhost ~]# touch /var/lib/haproxy/stats [root@localhost ~]# ls -lZ /var/lib/haproxy -rw-r--r--. root root unconfined_u:object_r:haproxy_var_lib_t:s0 stats 1) Follow the location in RHEL for the stats socket (/var/lib/haproxy/stats), or 2) Add a patch to create and call restorecon somewhere in puppet-haproxy for the stats socket. I believe that either of these patches which are likely specific to running on RHEL. haproxy_var_run_t and haproxy_var_lib_t are both allowed to be utilized by haproxy. The problem with (2) is that it might not work if something unlinks /var/run/haproxy.sock I assume you'd want to change this downstream-only. Other distributions might be fine with /var/run/haproxy.sock. Thanks Lon, since the wrong path is generated via puppet-tripleo I will fix it there Note that I will go for option 1) and then later on make the path a parameter in case other distros/operators might need a different path. ACK, sounds good. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Description of problem: ----------------------- RHOS-11 minor updates fails: openstack stack failures list overcloud overcloud.AllNodesDeploySteps.ControllerDeployment_Step3.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 124cac89-bd17-44d9-8192-02d21bdfb6d0 status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 deploy_stdout: | ... Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/10-gnocchi_wsgi.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/10-horizon_vhost.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/10-panko_wsgi.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.d/openstack-dashboard.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/remoteip.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/remoteip.load]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/status.conf]/ensure: removed Notice: /Stage[main]/Apache/File[/etc/httpd/conf.modules.d/status.load]/ensure: removed Notice: /Stage[main]/Apache/Concat[/etc/httpd/conf/ports.conf]/File[/etc/httpd/conf/ports.conf]/content: content changed '{md5}35f25b87e1f8ad89b39518066549ea6e' to '{md5}5d60f0a60394ddc109afc8df5168fa5b' Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 1 events (truncated, view all with --long) deploy_stderr: | ... Warning: Scope(Oslo::Messaging::Rabbit[keystone_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[glance_api_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[glance_registry_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[neutron_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[ceilometer_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[aodh_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Oslo::Messaging::Rabbit[sahara_config]): The oslo_messaging rabbit_host, rabbit_hosts, rabbit_port, rabbit_userid, rabbit_password, rabbit_virtual_host parameters have been deprecated by the [DEFAULT]\transport_url. Please use oslo::messaging::default::transport_url instead. Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications. Error: /Stage[main]/Nova::Cell_v2::Simple_setup/Nova::Cell_v2::Cell[default]/Exec[nova-cell_v2-cell-default]/unless: Check "nova-manage cell_v2 list_cells | grep -q default" exceeded timeout Error: Failed to apply catalog: Command: 'openstack ["domain", "list", "--quiet", "--format", "csv", []]' has been running for more than 40 seconds (tried 4, for a total of 170 seconds) (truncated, view all with --long) Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-tripleo-heat-templates-6.0.0-0.20170222195630.46117f4.el7ost.noarch openstack-nova-cert-15.0.1-0.20170224183627.6087675.el7ost.noarch python-novaclient-7.1.0-0.20170208162119.f6e0128.el7ost.noarch python-nova-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-placement-api-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-common-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-scheduler-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-novncproxy-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-conductor-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-api-15.0.1-0.20170224183627.6087675.el7ost.noarch openstack-nova-console-15.0.1-0.20170224183627.6087675.el7ost.noarch puppet-nova-10.3.0-0.20170220173041.97656fb.el7ost.noarch openstack-nova-compute-15.0.1-0.20170224183627.6087675.el7ost.noarch Steps to Reproduce: ------------------- 1. Deploy RHOS-1 (2017-02-24.2) 2. Setup latest repos on undercloud and overcloud 3. Update undercloud 4. Update overcloud Actual results: --------------- Update fails