Bug 1464114 - Failed to connect to mysql at overcloud deploy [pacemaker]
Summary: Failed to connect to mysql at overcloud deploy [pacemaker]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-selinux
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: Upstream M2
: 12.0 (Pike)
Assignee: Lon Hohberger
QA Contact: Attila Fazekas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-22 12:52 UTC by Attila Fazekas
Modified: 2018-02-05 19:07 UTC (History)
16 users (show)

Fixed In Version: openstack-selinux-0.8.7-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 21:33:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
selinux audit.log from enforcing run on controller (441.53 KB, text/plain)
2017-06-22 15:55 UTC, Jon Schlueter
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Internal Links: 1516415

Description Attila Fazekas 2017-06-22 12:52:53 UTC
Overcloud deployment fails in both 1 and 3 controller setup(s) (without containers):

22:35:21     Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql/Exec[galera-ready]: Failed to call refresh: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
22:35:21     Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql/Exec[galera-ready]: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
22:35:21     Error: Failed to apply catalog: Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

Snapshot: 2017-06-21.5
openstack-tripleo-heat-templates                                   -7.0.0-0.20170616123155.el7ost

puppet-tripleo-7.1.1-0.20170615141731.el7ost

Comment 2 Damien Ciabrini 2017-06-22 12:58:26 UTC
looking at the captured var/log/cluster/corosync.log on the overcloud, I see:

Jun 21 22:04:49  galera(galera)[23595]:    INFO: now attempting to detect last commit version using 'mysqld_safe --wsrep-recover'
Jun 21 22:04:49  galera(galera)[23595]:    ERROR: Unable to detect last known write sequence number
Jun 21 18:04:49 [16546] controller-0       lrmd:   notice: operation_finished:  galera_start_0:23595:stderr [ cat: /var/lib/mysql/grastate.dat: No such file or directory ]
Jun 21 18:04:49 [16546] controller-0       lrmd:   notice: operation_finished:  galera_start_0:23595:stderr [ ocf-exit-reason:Unable to detect last known write sequence number ]

Which shows that the galera resource agent could not parse the initial empty last-seqno from mysql. Consequently resource start operation failed, and galera never went up. hence the reported clustercheck failure.

At this stage it's unclear why. Investigation ongoing

Comment 4 Damien Ciabrini 2017-06-22 13:48:32 UTC
OK, it's probably a SELinux issue:

[0;31mtype=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.659:318): avc:  denied  { setattr } for  pid=24496 comm="chown" name="tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.682:319): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.682:320): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.689:321): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.689:322): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.691:323): avc:  denied  { open } for  pid=24546 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.691:324): avc:  denied  { open } for  pid=24546 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.695:325): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.695:326): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31m[0m
[0;36m...ignoring[0m

The temporary file that the resource agent needs to create to hold the output of mysqld_safe is being denied.

SELinux issue seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1439182

Comment 5 Damien Ciabrini 2017-06-22 14:39:38 UTC
Attila, could you tell if openstack-selinux-0.8.5-6.el7ost or greater was available in the overcloud image that you used for deployment?

Comment 6 Dimitri Savineau 2017-06-22 15:14:26 UTC
I have exactly the same error.

# rpm -qa openstack-selinux
openstack-selinux-0.8.7-1.el7ost.noarch

Comment 7 Jon Schlueter 2017-06-22 15:19:22 UTC
Lon can you peek at the AVC and if this is real?

Also can we get the job run in permissive and capture the /var/log/audit/audit.log file from that run please?

Comment 8 Jon Schlueter 2017-06-22 15:55:44 UTC
Created attachment 1290761 [details]
selinux audit.log from enforcing run on controller

Comment 9 Lon Hohberger 2017-06-22 17:29:09 UTC
Are there logs from a permissive run?  Without them, this is going to end up being whack-a-mole with AVCs...

On the surface, this looks like a base selinux-policy issue concerning clustering and management of mysqld.

Granting access to cluster_tmp_t isn't likely to be dangerous, but, doing it within openstack-selinux should be a last resort (as this will likely affect RHEL customers who are not using OpenStack at all)

Comment 10 Lon Hohberger 2017-06-22 18:04:31 UTC
Hmm.

[root@localhost ~]# audit2why
type=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file
type=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file

	Was caused by:
		Unknown - would be allowed by active policy
		Possible mismatch between this policy and the one under which the audit message was generated.

		Possible mismatch between current in-memory boolean settings vs. permanent ones.

[root@localhost ~]# rpm -q selinux-policy
selinux-policy-3.13.1-102.el7_3.15.noarch
[root@localhost ~]# rpm -q openstack-selinux
openstack-selinux-0.8.6-2.el7ost.noarch

Comment 11 Lon Hohberger 2017-06-22 18:07:16 UTC
Updating to 0.8.7-1 causes this to begin occurring.

Comment 20 errata-xmlrpc 2017-12-13 21:33:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.