Bug 1375184 - mysqld service prevents haproxy to get started and deployment fails
Summary: mysqld service prevents haproxy to get started and deployment fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: mariadb-galera
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 10.0 (Newton)
Assignee: Michael Bayer
QA Contact: Asaf Hirshberg
URL:
Whiteboard:
Depends On:
Blocks: 1376908 1376909 1376910 1376912 1376913
TreeView+ depends on / blocked
 
Reported: 2016-09-12 12:29 UTC by Marius Cornea
Modified: 2016-12-14 16:00 UTC (History)
14 users (show)

Fixed In Version: mariadb-galera-5.5.42-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Because Red Hat Enterprise Linux 7.3 changed the return format of the "systemctl is-enabled" command as consumed by shell scripts, the mariadb-galera RPM package, upon installation, erroneously detected that the MariaDB service was enabled when it was not. As a result, the Red Hat OpenStack Platform installer, which then tried to run mariadb-galera using Pacemaker and not systemd, failed to start Galera. With this update, mariadb-galera's RPM installation scripts now use a different systemctl command, correctly detecting the default MariaDB as disabled, and the installer can succeed.
Clone Of:
: 1376908 1376909 1376910 1376912 1376913 (view as bug list)
Environment:
Last Closed: 2016-12-14 16:00:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
overcloud image rebuild logs (2.12 MB, text/plain)
2016-09-14 15:31 UTC, Damien Ciabrini
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Marius Cornea 2016-09-12 12:29:39 UTC
Description of problem:
mysqld service is enabled on boot, binds on all interfaces and prevents haproxy to get started because it's unable to bind on the internal_api_virtual_ip, port 3306.

Version-Release number of selected component (if applicable):
rhosp-director-images-10.0-20160907.1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy HA overcloud:
source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud deploy --templates \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e ~/templates/network-environment.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/disk-layout.yaml \
-e ~/templates/wipe-disk-env.yaml \
--control-scale 3 \
--control-flavor controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 \
--compute-scale 1 \
--compute-flavor compute-b634c10a-570f-59ba-bdbf-0c313d745a10 \
--ceph-storage-scale 1 \
--ceph-storage-flavor ceph-cf1f074b-dadb-5eb8-9eb0-55828273fab7 \
--ntp-server clock.redhat.com 

2. SSH to one of the controllers
 
3. Check PCS status

Actual results:
Full list of resources:

 ip-10.0.0.15	(ocf::heartbeat:IPaddr2):	Stopped
 ip-10.0.1.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 ip-172.16.18.25	(ocf::heartbeat:IPaddr2):	Stopped
 ip-192.168.0.20	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 Clone Set: haproxy-clone [haproxy]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]


Expected results:
The resources are started.

Additional info:

mysqld should be disabled but it's enabled and running:

systemctl status mysqld
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-09-12 11:45:12 UTC; 41min ago
 Main PID: 5270 (mysqld_safe)

[root@overcloud-controller-0 heat-admin]#  lsof -i :3306 -P 
COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
mysqld  7980 mysql   14u  IPv4  28126      0t0  TCP *:3306 (LISTEN)

HAProxy fails to start because it's unable to bind on the internal api vip, port 3306:

Sep 12 11:57:23 overcloud-controller-0.localdomain haproxy-systemd-wrapper[3499]: [ALERT] 255/115723 (3500) : Starting proxy mysql: cannot bind socket [10.0.0.15:3306]

Workaround:
Run 'systemctl disable mysqld' on the image so mysqld isn't loaded on boot.

Comment 2 Marius Cornea 2016-09-12 13:05:34 UTC
Workaround command for disabling mysqld:

virt-customize -a overcloud-full.qcow2  --selinux-relabel --run-command 'systemctl stop mysqld; systemctl disable mysqld'

Comment 4 Damien Ciabrini 2016-09-13 15:10:49 UTC
Seeing the info in the description, the service is enabled locally on the image, but this is not the default in package mariadb-galera-server.

Nothing changed regarding enabling service at startup in package mariadb-galera-server between 7.2 and 7.3, which is the version targeted by rhos10.

At install time, postinst script can enable mariadb.service at startup _if and only if_ mysqld.service was present and already enabled before install time. So I would rule out that possibility.

Some script may have been run during the creation of the overcloud image that ended up "systemctl enable"-ing mariabdb.

Could you provide access to the image, or content of file /var/log/yum.log and timestamps of files in /etc/systemd/system/multi-user.target.wants/ ?

Comment 5 Mike Burns 2016-09-13 15:39:42 UTC
(In reply to Damien Ciabrini from comment #4)

> Could you provide access to the image, or content of file /var/log/yum.log
> and timestamps of files in /etc/systemd/system/multi-user.target.wants/ ?

Image:  

http://download.eng.bos.redhat.com/brewroot/packages/overcloud-full/10.0/20160907.1/images/overcloud-full.tar

yum.log is empty in the image


# ls -l /etc/systemd/system/multi-user.target.wants/
total 128
lrwxrwxrwx. 1 root root 35 Sep  7 16:10 atd.service -> /usr/lib/systemd/system/atd.service
lrwxrwxrwx. 1 root root 38 Sep  7 15:42 auditd.service -> /usr/lib/systemd/system/auditd.service
lrwxrwxrwx. 1 root root 39 Sep  7 16:34 ceph-mon.target -> /usr/lib/systemd/system/ceph-mon.target
lrwxrwxrwx. 1 root root 39 Sep  7 16:34 ceph-osd.target -> /usr/lib/systemd/system/ceph-osd.target
lrwxrwxrwx. 1 root root 43 Sep  7 16:34 ceph-radosgw.target -> /usr/lib/systemd/system/ceph-radosgw.target
lrwxrwxrwx. 1 root root 35 Sep  7 16:29 ceph.target -> /usr/lib/systemd/system/ceph.target
lrwxrwxrwx. 1 root root 39 Sep  7 15:43 chronyd.service -> /usr/lib/systemd/system/chronyd.service
lrwxrwxrwx. 1 root root 44 Sep  7 15:42 cloud-config.service -> /usr/lib/systemd/system/cloud-config.service
lrwxrwxrwx. 1 root root 43 Sep  7 15:42 cloud-final.service -> /usr/lib/systemd/system/cloud-final.service
lrwxrwxrwx. 1 root root 48 Sep  7 15:42 cloud-init-local.service -> /usr/lib/systemd/system/cloud-init-local.service
lrwxrwxrwx. 1 root root 42 Sep  7 15:42 cloud-init.service -> /usr/lib/systemd/system/cloud-init.service
lrwxrwxrwx. 1 root root 37 Sep  7 15:40 crond.service -> /usr/lib/systemd/system/crond.service
lrwxrwxrwx. 1 root root 45 Sep  7 16:43 dynamic-login.service -> /usr/lib/systemd/system/dynamic-login.service
lrwxrwxrwx. 1 root root 42 Sep  7 15:42 irqbalance.service -> /usr/lib/systemd/system/irqbalance.service
lrwxrwxrwx. 1 root root 37 Sep  7 15:42 kdump.service -> /usr/lib/systemd/system/kdump.service
lrwxrwxrwx. 1 root root 35 Sep  7 16:29 ksm.service -> /usr/lib/systemd/system/ksm.service
lrwxrwxrwx. 1 root root 40 Sep  7 16:29 ksmtuned.service -> /usr/lib/systemd/system/ksmtuned.service
lrwxrwxrwx. 1 root root 40 Sep  7 16:29 libvirtd.service -> /usr/lib/systemd/system/libvirtd.service
lrwxrwxrwx. 1 root root 39 Sep  7 16:36 mariadb.service -> /usr/lib/systemd/system/mariadb.service
lrwxrwxrwx. 1 root root 41 Sep  7 16:26 mdmonitor.service -> /usr/lib/systemd/system/mdmonitor.service
lrwxrwxrwx. 1 root root 49 Sep  7 16:21 netcf-transaction.service -> /usr/lib/systemd/system/netcf-transaction.service
lrwxrwxrwx. 1 root root 46 Sep  7 15:42 NetworkManager.service -> /usr/lib/systemd/system/NetworkManager.service
lrwxrwxrwx. 1 root root 41 Sep  7 16:32 nfs-client.target -> /usr/lib/systemd/system/nfs-client.target
lrwxrwxrwx. 1 root root 43 Sep  7 16:43 openvswitch.service -> /usr/lib/systemd/system/openvswitch.service
lrwxrwxrwx. 1 root root 49 Sep  7 16:42 os-collect-config.service -> /usr/lib/systemd/system/os-collect-config.service
lrwxrwxrwx. 1 root root 39 Sep  7 15:42 postfix.service -> /usr/lib/systemd/system/postfix.service
lrwxrwxrwx. 1 root root 40 Sep  7 15:40 remote-fs.target -> /usr/lib/systemd/system/remote-fs.target
lrwxrwxrwx. 1 root root 41 Sep  7 15:43 rhsmcertd.service -> /usr/lib/systemd/system/rhsmcertd.service
lrwxrwxrwx. 1 root root 39 Sep  7 15:42 rsyslog.service -> /usr/lib/systemd/system/rsyslog.service
lrwxrwxrwx. 1 root root 36 Sep  7 15:42 sshd.service -> /usr/lib/systemd/system/sshd.service
lrwxrwxrwx. 1 root root 37 Sep  7 15:42 tuned.service -> /usr/lib/systemd/system/tuned.service
lrwxrwxrwx. 1 root root 38 Sep  7 16:36 xinetd.service -> /usr/lib/systemd/system/xinetd.service

Comment 6 Damien Ciabrini 2016-09-13 17:04:07 UTC
When looking into the overcloud image in ~stack/images, I can see that mariadb.service is enabled at VM start, and this is what probably breaks the overcloud installation.

[stack@undercloud ~]$ virt-ls -a images/overcloud-full.qcow2 /etc/systemd/system/multi-user.target.wants/
NetworkManager.service
atd.service
auditd.service
ceph-mon.target
ceph-osd.target
ceph-radosgw.target
ceph.target
chronyd.service
cloud-config.service
cloud-final.service
cloud-init-local.service
cloud-init.service
crond.service
dynamic-login.service
irqbalance.service
kdump.service
ksm.service
ksmtuned.service
libvirtd.service
mariadb.service
mdmonitor.service
netcf-transaction.service
nfs-client.target
openvswitch.service
os-collect-config.service
postfix.service
remote-fs.target
rhsmcertd.service
rsyslog.service
sshd.service
tuned.service
xinetd.service


The initial description only show the overcloud install steps, it does not elaborate on how the image was built. Do you know how the image got generated?
If we can reproduce we'll find out which component is wrongly enabling mariadb at startup.

Comment 7 Marius Cornea 2016-09-13 17:55:30 UTC
The images are provided via rpm by the rhosp-director-images package in /usr/share/rhosp-director-images/ and I just take from there the overcloud-full.tar and ironic-python-agent.tar archives. I don't have details of the build process.

Comment 8 Mike Burns 2016-09-13 17:58:20 UTC
Image builds are a bit convoluted.  They're built by running a script called tripleo-build-images from tripleo-common upstream.  This script is basically a wrapper for diskimage-builder.

The inputs to the script are the base and -rhel7 yaml files from [1].  The base image that diskimage-builder uses is something we build internally.  It can be found at [2].  I've looked at that image and confirmed that it does not have mariadb or mysql installed or configured to start on boot.

[1] https://github.com/openstack/tripleo-common/tree/master/image-yaml
[2] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=512415

Comment 9 Damien Ciabrini 2016-09-14 15:31:16 UTC
Created attachment 1200871 [details]
overcloud image rebuild logs

Logs captured after a manual rebuild of the overcloud image
DIB settings:
export DIB_CLOUD_INIT_ETC_HOSTS=false
export DIB_LOCAL_IMAGE=/home/stack/dciabrin/images/director-input-10.0-20160907.1.x86_64.qcow2
export DIB_YUM_REPO_CONF=/etc/yum.repos.d/rhos-release-10.repo /etc/yum.repos.d/rhos-release-rhel-7.3.repo
export NO_SOURCE_REPOSITORIES=1
export RHOS=1
export USE_DELOREAN_TRUNK=0


rebuild command:
tripleo-build-images --image-config-file  /usr/share/openstack-tripleo-common/image-yaml/overcloud-images.yaml --image-config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-rhel7.yaml --verbose --debug

Comment 10 Damien Ciabrini 2016-09-14 15:50:04 UTC
So after rebuilding an overcloud image manually with tripleo-build-images as per comment #9, I can confirm that mariadb is enabled at startup in the resulting image:

$ virt-ls -a overcloud-full.qcow2 /etc/systemd/system/multi-user.target.wants
NetworkManager.service
atd.service
auditd.service
ceph-mon.target
ceph-osd.target
ceph-radosgw.target
ceph.target
chronyd.service
cloud-config.service
cloud-final.service
cloud-init-local.service
cloud-init.service
crond.service
dynamic-login.service
irqbalance.service
kdump.service
ksm.service
ksmtuned.service
libvirtd.service
mariadb.service
mdmonitor.service
netcf-transaction.service
nfs-client.target
openvswitch.service
os-collect-config.service
postfix.service
remote-fs.target
rhsmcertd.service
rsyslog.service
sshd.service
tuned.service
xinetd.service

I've tried to narrow down the problem by creating a new VM out of base image director-input-10.0-20160907.1.x86_64.qcow2, and only do the "yum install" part of the image creation.
I've installed the 609 packages found in the attached build log (see comment #9) and right after this installation mariadb is _not_ enabled at startup:

[cloud-user@localhost ~]$ systemctl status mysqld
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

[cloud-user@localhost ~]$ ls -1 /etc/systemd/system/multi-user.target.wants/
atd.service
auditd.service
ceph-mon.target
ceph-osd.target
ceph-radosgw.target
ceph.target
chronyd.service
cloud-config.service
cloud-final.service
cloud-init-local.service
crond.service
irqbalance.service
kdump.service
ksm.service
ksmtuned.service
libvirtd.service
mdmonitor.service
netcf-transaction.service
nfs-client.target
postfix.service
remote-fs.target
rhsmcertd.service
rsyslog.service
sshd.service
tuned.service
xinetd.service

So at this stage, I would say that during the image creation process, there are steps which are executed by dib-run-parts after package installation, and those are enabling mariadb at startup while they shouldn't.

Comment 11 Damien Ciabrini 2016-09-15 12:12:23 UTC
Actually when installing mariadb-galera packages directly into the base image mentionned in comment #8, I can reproduce the unwanted behaviour:

[stack@undercloud images]$ virt-ls -l -a director-input-10.0-20160907.1.x86_64.qcow2 /etc/systemd/system/multi-user.target.wants | grep mariadb

[stack@undercloud images]$ virt-customize -a director-input-10.0-20160907.1.x86_64.qcow2 --selinux-relabel --run-command "rpm -ivh http://download.eng.bos.redhat.com/brewroot/packages/mariadb-galera/5.5.42/3.el7ost/x86_64/mariadb-galera-server-5.5.42-3.el7ost.x86_64.rpm --nodeps"[   0.0] Examining the guest ...
[   8.0] Setting a random seed
[   8.0] Running: rpm -ivh http://download.eng.bos.redhat.com/brewroot/packages/mariadb-galera/5.5.42/3.el7ost/x86_64/mariadb-galera-server-5.5.42-3.el7ost.x86_64.rpm --nodeps
[  16.0] SELinux relabelling
[  16.0] Finishing off

[stack@undercloud images]$ virt-ls -l -a director-input-10.0-20160907.1.x86_64.qcow2 /etc/systemd/system/multi-user.target.wants | grep mariadblrwxrwxrwx  1 root root   39 Sep 15 12:11 mariadb.service -> /usr/lib/systemd/system/mariadb.service

So something is misbehaving in package mariadb-galera. Investigating...

Comment 12 Michael Bayer 2016-09-15 14:41:10 UTC
we have a theory on why this is happening, regarding the behavior of the "systemctl is-enabled" command on rhel7.3 in this environment - the specfile uses this to determine if it should automatically re-enable mariadb-galera on installation and the return code may be being misinterpreted.   Damien is looking into it and we'll produce a patch soon.

Comment 15 Michael Bayer 2016-09-15 22:58:39 UTC
ready for testing, please try it out in your env over there so we can see it's doing the right thing now.

Comment 16 Marius Cornea 2016-09-16 12:32:12 UTC
I manually updated the image with mariadb-galera-5.5.42-4 but I can still see the issue:

[root@overcloud-controller-0 heat-admin]# rpm -qa | grep mariadb-galera
mariadb-galera-common-5.5.42-4.el7ost.x86_64
mariadb-galera-server-5.5.42-4.el7ost.x86_64

[root@overcloud-controller-0 heat-admin]# systemctl status mysqld
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-09-16 12:16:27 UTC; 15min ago
 Main PID: 4978 (mysqld_safe)
   CGroup: /system.slice/mariadb.service
           ├─4978 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
           └─6190 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --wsrep-provider=none --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/var/lib/...

Comment 17 Damien Ciabrini 2016-09-16 13:22:45 UTC
Note the new package should be installed on an image where mariadb is not already enabled, because the install scriptlets in mariadb-galera keep the existing settings when the package is installed.

So one would need to either start from an image with no mariadb installed, or make sure service is disabled.

Comment 19 Marius Cornea 2016-09-16 14:44:17 UTC
(In reply to Damien Ciabrini from comment #17)
> Note the new package should be installed on an image where mariadb is not
> already enabled, because the install scriptlets in mariadb-galera keep the
> existing settings when the package is installed.
> 
> So one would need to either start from an image with no mariadb installed,
> or make sure service is disabled.

Right, I was updating the overcloud-full image which already contained the old package installed. When installing the new package from scratch the service is indeed disabled.

Comment 20 Asaf Hirshberg 2016-09-25 08:10:55 UTC
OpenStack-10.0-RHEL-7 Puddle: 2016-09-22.2

[root@overcloud-controller-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-0 (version 1.1.15-9.el7-e174ec8) - partition with quorum
Last updated: Sun Sep 25 04:01:25 2016		Last change: Thu Sep 22 15:40:19 2016 by root via cibadmin on overcloud-controller-0

3 nodes and 19 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

 ip-10.35.180.19	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-172.18.0.14	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 ip-172.17.0.13	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
 ip-192.0.2.9	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-0 ]
     Slaves: [ overcloud-controller-1 overcloud-controller-2 ]
 ip-172.19.0.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 ip-172.17.0.15	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started overcloud-controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@overcloud-controller-0 ~]# systemctl status mysqld
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Comment 23 errata-xmlrpc 2016-12-14 16:00:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.