Description of problem: During a HA deployment, the overcloud deployment command returns with zero, but the heat stack ends up in CREATE_FAILED state. The failed part is "overcloud-ControllerNodesPostDeployment-qi55q7oqwo26-ControllerServicesBaseDeployment_Step2-cfncnhcqxihv" The deployment fails on the clustercheck command: Error: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Exec[galera-ready]/returns: change from notrun to 0 failed: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]\u001b[0m\n", "deploy_status_code": Apr 26 05:21:07 overcloud-controller-1.localdomain os-collect-config[3013]: 6} And on a reproducer system, galera is indeed in a degraded state: # pcs status [..] Master/Slave Set: galera-master [galera] galera (ocf::heartbeat:galera): FAILED Master overcloud-controller-0 (unmanaged) galera (ocf::heartbeat:galera): FAILED Master overcloud-controller-1 (unmanaged) Masters: [ overcloud-controller-2 ] Clone Set: mongod-clone [mongod] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: rabbitmq-clone [rabbitmq] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: memcached-clone [memcached] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Failed Actions: * galera_promote_0 on overcloud-controller-0 'unknown error' (1): call=71, status=complete, exitreason='MySQL server failed to start (pid=16096) (rc=0), please check your installation', last-rc-change='Wed Apr 27 15:27:07 2016', queued=0ms, exec=14362ms * galera_promote_0 on overcloud-controller-1 'unknown error' (1): call=69, status=complete, exitreason='MySQL server failed to start (pid=15015) (rc=0), please check your installation', last-rc-change='Wed Apr 27 15:27:07 2016', queued=0ms, exec=14366ms Version-Release number of selected component (if applicable): [root@overcloud-controller-0 ~]# rpm -qa|grep galera galera-25.3.5-7.el7.x86_64 mariadb-server-galera-10.1.12-4.el7.x86_64 [stack@instack ~]$ rpm -qa|grep tripleo openstack-tripleo-image-elements-0.9.10-0.20160419165211.fdf717f.el7.centos.noarch tripleo-common-1.0.1-0.20160323101840.d52d04b.el7.centos.noarch openstack-tripleo-heat-templates-2.0.1-0.20160423124014.671f5c8.el7.centos.noarch openstack-tripleo-0.0.1-0.20160411152951.b076a5a.el7.centos.noarch python-tripleoclient-2.0.1-0.20160415042551.c084825.el7.centos.noarch openstack-tripleo-puppet-elements-2.0.1-0.20160415124916.75e3610.el7.centos.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy a HA overcloud of RDO Mitaka delorean on RHEL 7.2 Additional info: There are downstream gate jobs that can reproduce the issue.
Likely a simple selinux relabel when building the image is missing: type=AVC msg=audit(1461646313.398:171): avc: denied { setpgid } for pid=12510 comm="mysqld" scontext=system_u:system_r:mysqld_t:s0 tcontext=system_u:system_r:mysqld_t:s0 tclass=process type=SYSCALL msg=audit(1461646313.398:171): arch=c000003e syscall=109 success=no exit=-13 a0=0 a1=0 a2=1 a3=8 items=0 ppid=12502 pid=12510 auid=4294967295 uid=27 gid=27 euid=27 suid=27 fsuid=27 egid=27 sgid=27 fsgid=27 tty=(none) ses=4294967295 comm="mysqld" exe="/usr/libexec/mysqld" subj=system_u:system_r:mysqld_t:s0 key=(null)
Is this with tripleo-quickstart image or from-scratch custom built images?
So I looked at the image building logs and it seems to set the selinux files: + echo dib-run-parts Tue Apr 26 00:15:04 EDT 2016 Running /tmp/in_target.d/finalise.d/90-selinux-fixfiles-restore dib-run-parts Tue Apr 26 00:15:04 EDT 2016 Running /tmp/in_target.d/finalise.d/90-selinux-fixfiles-restore + target_tag=90-selinux-fixfiles-restore + date +%s.%N + /tmp/in_target.d/finalise.d/90-selinux-fixfiles-restore + set -eu + set -o pipefail ++ which setfiles + SETFILES=/usr/sbin/setfiles + '[' -e /etc/selinux/targeted/contexts/files/file_contexts -a -x /usr/sbin/setfiles ']' + setfiles /etc/selinux/targeted/contexts/files/file_contexts / + target_tag=90-selinux-fixfiles-restore + date +%s.%N + output '90-selinux-fixfiles-restore completed' Could we do one of the two following steps to troubleshoot further? a) Add -v to 90-selinux-fixfiles-restore dib b) Call a virt-customize --selinux-relabel on the produced overcloud image and see if the issue is still there
This bug is against a Version which has reached End of Life. If it's still present in supported release (http://releases.openstack.org), please update Version and reopen.
Hi, We currently have an RDO mitaka environment experiencing this issue. Was the root cause ever accurately determined? It was closed as EOL but I'm not entirely sure that is accurate Regards, Graeme
Graeme, did you observe the denials in comment 2? cheers, Michele
I see some similar denials for mysql and haproxy type=AVC msg=audit(1467790984.329:131): avc: denied { name_bind } for pid=9632 comm="haproxy" src=3306 scontext=system_u:system_r:haproxy_t:s0 tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket type=AVC msg=audit(1467790992.061:163): avc: denied { write } for pid=10505 comm="mysqld_safe" path="/tmp/tmp.c2XPC6oag1" dev="sda2" ino=16900546 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file type=SYSCALL msg=audit(1467790992.061:163): arch=c000003e syscall=59 success=yes exit=0 a0=9aa610 a1=93a010 a2=976640 a3=7ffd526d34a0 items=0 ppid=10386 pid=10505 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mysqld_safe" exe="/usr/bin/bash" subj=system_u:system_r:mysqld_safe_t:s0 key=(null) type=AVC msg=audit(1467790994.678:166): avc: denied { read } for pid=10865 comm="mysqld_safe" name="cores" dev="sda2" ino=19519726 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=unconfined_u:object_r:cluster_var_lib_t:s0 tclass=dir type=SYSCALL msg=audit(1467790994.678:166): arch=c000003e syscall=257 success=yes exit=3 a0=ffffffffffffff9c a1=4a9e9f a2=90800 a3=0 items=0 ppid=10864 pid=10865 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mysqld_safe" exe="/usr/bin/bash" subj=system_u:system_r:mysqld_safe_t:s0 key=(null) type=AVC msg=audit(1467790994.679:167): avc: denied { write } for pid=10505 comm="mysqld_safe" path="/tmp/tmp.c2XPC6oag1" dev="sda2" ino=16900546 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file type=SYSCALL msg=audit(1467790994.679:167): arch=c000003e syscall=1 success=yes exit=94 a0=1 a1=7fa2e2821000 a2=5e a3=5d items=0 ppid=10386 pid=10505 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mysqld_safe" exe="/usr/bin/bash" subj=system_u:system_r:mysqld_safe_t:s0 key=(null) type=AVC msg=audit(1467791012.260:199): avc: denied { read } for pid=12255 comm="mysqld_safe" name="cores" dev="sda2" ino=19519726 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=unconfined_u:object_r:cluster_var_lib_t:s0 tclass=dir type=SYSCALL msg=audit(1467791012.260:199): arch=c000003e syscall=257 success=yes exit=3 a0=ffffffffffffff9c a1=4a9e9f a2=90800 a3=0 items=0 ppid=12254 pid=12255 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mysqld_safe" exe="/usr/bin/bash" subj=system_u:system_r:mysqld_safe_t:s0 key=(null) but note that all nodes are running in selinux permissive mode, so this shouldn't affect anything right?
Correct, if your systems are in permissive mode these do not apply. If we could get sosreports from all three nodes we can try and take a look and see what is going on there.