Bug 1464114 - Failed to connect to mysql at overcloud deploy [pacemaker]
Failed to connect to mysql at overcloud deploy [pacemaker]
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-selinux (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
urgent Severity urgent
: Upstream M2
: 12.0 (Pike)
Assigned To: Lon Hohberger
Attila Fazekas
: AutomationBlocker, Regression, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-22 08:52 EDT by Attila Fazekas
Modified: 2018-02-05 14:07 EST (History)
16 users (show)

See Also:
Fixed In Version: openstack-selinux-0.8.7-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 16:33:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
selinux audit.log from enforcing run on controller (441.53 KB, text/plain)
2017-06-22 11:55 EDT, Jon Schlueter
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Attila Fazekas 2017-06-22 08:52:53 EDT
Overcloud deployment fails in both 1 and 3 controller setup(s) (without containers):

22:35:21     Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql/Exec[galera-ready]: Failed to call refresh: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
22:35:21     Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql/Exec[galera-ready]: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
22:35:21     Error: Failed to apply catalog: Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

Snapshot: 2017-06-21.5
openstack-tripleo-heat-templates                                   -7.0.0-0.20170616123155.el7ost

puppet-tripleo-7.1.1-0.20170615141731.el7ost
Comment 2 Damien Ciabrini 2017-06-22 08:58:26 EDT
looking at the captured var/log/cluster/corosync.log on the overcloud, I see:

Jun 21 22:04:49  galera(galera)[23595]:    INFO: now attempting to detect last commit version using 'mysqld_safe --wsrep-recover'
Jun 21 22:04:49  galera(galera)[23595]:    ERROR: Unable to detect last known write sequence number
Jun 21 18:04:49 [16546] controller-0       lrmd:   notice: operation_finished:  galera_start_0:23595:stderr [ cat: /var/lib/mysql/grastate.dat: No such file or directory ]
Jun 21 18:04:49 [16546] controller-0       lrmd:   notice: operation_finished:  galera_start_0:23595:stderr [ ocf-exit-reason:Unable to detect last known write sequence number ]

Which shows that the galera resource agent could not parse the initial empty last-seqno from mysql. Consequently resource start operation failed, and galera never went up. hence the reported clustercheck failure.

At this stage it's unclear why. Investigation ongoing
Comment 4 Damien Ciabrini 2017-06-22 09:48:32 EDT
OK, it's probably a SELinux issue:

[0;31mtype=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.659:318): avc:  denied  { setattr } for  pid=24496 comm="chown" name="tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.682:319): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.682:320): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.689:321): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.689:322): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.691:323): avc:  denied  { open } for  pid=24546 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.691:324): avc:  denied  { open } for  pid=24546 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.695:325): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31mtype=AVC msg=audit(1498082689.695:326): avc:  denied  { open } for  pid=23728 comm="mysqld_safe" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file[0m
[0;31m[0m
[0;36m...ignoring[0m

The temporary file that the resource agent needs to create to hold the output of mysqld_safe is being denied.

SELinux issue seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1439182
Comment 5 Damien Ciabrini 2017-06-22 10:39:38 EDT
Attila, could you tell if openstack-selinux-0.8.5-6.el7ost or greater was available in the overcloud image that you used for deployment?
Comment 6 Dimitri Savineau 2017-06-22 11:14:26 EDT
I have exactly the same error.

# rpm -qa openstack-selinux
openstack-selinux-0.8.7-1.el7ost.noarch
Comment 7 Jon Schlueter 2017-06-22 11:19:22 EDT
Lon can you peek at the AVC and if this is real?

Also can we get the job run in permissive and capture the /var/log/audit/audit.log file from that run please?
Comment 8 Jon Schlueter 2017-06-22 11:55 EDT
Created attachment 1290761 [details]
selinux audit.log from enforcing run on controller
Comment 9 Lon Hohberger 2017-06-22 13:29:09 EDT
Are there logs from a permissive run?  Without them, this is going to end up being whack-a-mole with AVCs...

On the surface, this looks like a base selinux-policy issue concerning clustering and management of mysqld.

Granting access to cluster_tmp_t isn't likely to be dangerous, but, doing it within openstack-selinux should be a last resort (as this will likely affect RHEL customers who are not using OpenStack at all)
Comment 10 Lon Hohberger 2017-06-22 14:04:31 EDT
Hmm.

[root@localhost ~]# audit2why
type=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file
type=AVC msg=audit(1498082689.658:317): avc:  denied  { open } for  pid=24495 comm="touch" path="/tmp/tmp.cgvP0Qe4oc" dev="vda2" ino=910 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:cluster_tmp_t:s0 tclass=file

	Was caused by:
		Unknown - would be allowed by active policy
		Possible mismatch between this policy and the one under which the audit message was generated.

		Possible mismatch between current in-memory boolean settings vs. permanent ones.

[root@localhost ~]# rpm -q selinux-policy
selinux-policy-3.13.1-102.el7_3.15.noarch
[root@localhost ~]# rpm -q openstack-selinux
openstack-selinux-0.8.6-2.el7ost.noarch
Comment 11 Lon Hohberger 2017-06-22 14:07:16 EDT
Updating to 0.8.7-1 causes this to begin occurring.
Comment 20 errata-xmlrpc 2017-12-13 16:33:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.