Bug 1464114

Summary: Failed to connect to mysql at overcloud deploy [pacemaker]
Product: Red Hat OpenStack Reporter: Attila Fazekas <afazekas>
Component: openstack-selinuxAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Attila Fazekas <afazekas>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: afazekas, chjones, dciabrin, dsavinea, emacchi, gael_rehault, jjoyce, jschluet, lhh, mbayer, mgrepl, oblaut, rhallise, slinaber, srevivo, tvignaud
Target Milestone: Upstream M2Keywords: AutomationBlocker, Regression, Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-selinux-0.8.7-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 21:33:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
selinux audit.log from enforcing run on controller none

Description Attila Fazekas 2017-06-22 12:52:53 UTC
Overcloud deployment fails in both 1 and 3 controller setup(s) (without containers):

22:35:21     Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql/Exec[galera-ready]: Failed to call refresh: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
22:35:21     Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql/Exec[galera-ready]: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
22:35:21     Error: Failed to apply catalog: Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

Snapshot: 2017-06-21.5
openstack-tripleo-heat-templates                                   -7.0.0-0.20170616123155.el7ost

puppet-tripleo-7.1.1-0.20170615141731.el7ost

Comment 2 Damien Ciabrini 2017-06-22 12:58:26 UTC
looking at the captured var/log/cluster/corosync.log on the overcloud, I see:

Jun 21 22:04:49  galera(galera)[23595]:    INFO: now attempting to detect last commit version using 'mysqld_safe --wsrep-recover'
Jun 21 22:04:49  galera(galera)[23595]:    ERROR: Unable to detect last known write sequence number
Jun 21 18:04:49 [16546] controller-0       lrmd:   notice: operation_finished:  galera_start_0:23595:stderr [ cat: /var/lib/mysql/grastate.dat: No such file or directory ]
Jun 21 18:04:49 [16546] controller-0       lrmd:   notice: operation_finished:  galera_start_0:23595:stderr [ ocf-exit-reason:Unable to detect last known write sequence number ]

Which shows that the galera resource agent could not parse the initial empty last-seqno from mysql. Consequently resource start operation failed, and galera never went up. hence the reported clustercheck failure.

At this stage it's unclear why. Investigation ongoing

Comment 4 Damien Ciabrini 2017-06-22 13:48:32 UTC
OK, it's probably a SELinux issue:

[0;31mtype=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.659:318): avc:  denied  { setattr } for  pid=24496 comm="chown" name="tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.682:319): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.682:320): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.689:321): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.689:322): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.691:323): avc:  denied  { open } for  pid=24546 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.691:324): avc:  denied  { open } for  pid=24546 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.695:325): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.695:326): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31m[0m
[0;36m...ignoring[0m

The temporary file that the resource agent needs to create to hold the output of mysqld_safe is being denied.

SELinux issue seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1439182

Comment 5 Damien Ciabrini 2017-06-22 14:39:38 UTC
Attila, could you tell if openstack-selinux-0.8.5-6.el7ost or greater was available in the overcloud image that you used for deployment?

Comment 6 Dimitri Savineau 2017-06-22 15:14:26 UTC
I have exactly the same error.

# rpm -qa openstack-selinux
openstack-selinux-0.8.7-1.el7ost.noarch

Comment 7 Jon Schlueter 2017-06-22 15:19:22 UTC
Lon can you peek at the AVC and if this is real?

Also can we get the job run in permissive and capture the /var/log/audit/audit.log file from that run please?

Comment 8 Jon Schlueter 2017-06-22 15:55:44 UTC
Created attachment 1290761 [details]
selinux audit.log from enforcing run on controller

Comment 9 Lon Hohberger 2017-06-22 17:29:09 UTC
Are there logs from a permissive run?  Without them, this is going to end up being whack-a-mole with AVCs...

On the surface, this looks like a base selinux-policy issue concerning clustering and management of mysqld.

Granting access to cluster_tmp_t isn't likely to be dangerous, but, doing it within openstack-selinux should be a last resort (as this will likely affect RHEL customers who are not using OpenStack at all)

Comment 10 Lon Hohberger 2017-06-22 18:04:31 UTC
Hmm.

[root@localhost ~]# audit2why
type=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file
type=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file

	Was caused by:
		Unknown - would be allowed by active policy
		Possible mismatch between this policy and the one under which the audit message was generated.

		Possible mismatch between current in-memory boolean settings vs. permanent ones.

[root@localhost ~]# rpm -q selinux-policy
selinux-policy-3.13.1-102.el7_3.15.noarch
[root@localhost ~]# rpm -q openstack-selinux
openstack-selinux-0.8.6-2.el7ost.noarch

Comment 11 Lon Hohberger 2017-06-22 18:07:16 UTC
Updating to 0.8.7-1 causes this to begin occurring.

Comment 20 errata-xmlrpc 2017-12-13 21:33:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462