Description of problem: mysqld service is enabled on boot, binds on all interfaces and prevents haproxy to get started because it's unable to bind on the internal_api_virtual_ip, port 3306. Version-Release number of selected component (if applicable): rhosp-director-images-10.0-20160907.1.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy HA overcloud: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates openstack overcloud deploy --templates \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e ~/templates/network-environment.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/disk-layout.yaml \ -e ~/templates/wipe-disk-env.yaml \ --control-scale 3 \ --control-flavor controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 \ --compute-scale 1 \ --compute-flavor compute-b634c10a-570f-59ba-bdbf-0c313d745a10 \ --ceph-storage-scale 1 \ --ceph-storage-flavor ceph-cf1f074b-dadb-5eb8-9eb0-55828273fab7 \ --ntp-server clock.redhat.com 2. SSH to one of the controllers 3. Check PCS status Actual results: Full list of resources: ip-10.0.0.15 (ocf::heartbeat:IPaddr2): Stopped ip-10.0.1.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 ip-172.16.18.25 (ocf::heartbeat:IPaddr2): Stopped ip-192.168.0.20 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 Clone Set: haproxy-clone [haproxy] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Master/Slave Set: galera-master [galera] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Expected results: The resources are started. Additional info: mysqld should be disabled but it's enabled and running: systemctl status mysqld ● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2016-09-12 11:45:12 UTC; 41min ago Main PID: 5270 (mysqld_safe) [root@overcloud-controller-0 heat-admin]# lsof -i :3306 -P COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mysqld 7980 mysql 14u IPv4 28126 0t0 TCP *:3306 (LISTEN) HAProxy fails to start because it's unable to bind on the internal api vip, port 3306: Sep 12 11:57:23 overcloud-controller-0.localdomain haproxy-systemd-wrapper[3499]: [ALERT] 255/115723 (3500) : Starting proxy mysql: cannot bind socket [10.0.0.15:3306] Workaround: Run 'systemctl disable mysqld' on the image so mysqld isn't loaded on boot.
Workaround command for disabling mysqld: virt-customize -a overcloud-full.qcow2 --selinux-relabel --run-command 'systemctl stop mysqld; systemctl disable mysqld'
Seeing the info in the description, the service is enabled locally on the image, but this is not the default in package mariadb-galera-server. Nothing changed regarding enabling service at startup in package mariadb-galera-server between 7.2 and 7.3, which is the version targeted by rhos10. At install time, postinst script can enable mariadb.service at startup _if and only if_ mysqld.service was present and already enabled before install time. So I would rule out that possibility. Some script may have been run during the creation of the overcloud image that ended up "systemctl enable"-ing mariabdb. Could you provide access to the image, or content of file /var/log/yum.log and timestamps of files in /etc/systemd/system/multi-user.target.wants/ ?
(In reply to Damien Ciabrini from comment #4) > Could you provide access to the image, or content of file /var/log/yum.log > and timestamps of files in /etc/systemd/system/multi-user.target.wants/ ? Image: http://download.eng.bos.redhat.com/brewroot/packages/overcloud-full/10.0/20160907.1/images/overcloud-full.tar yum.log is empty in the image # ls -l /etc/systemd/system/multi-user.target.wants/ total 128 lrwxrwxrwx. 1 root root 35 Sep 7 16:10 atd.service -> /usr/lib/systemd/system/atd.service lrwxrwxrwx. 1 root root 38 Sep 7 15:42 auditd.service -> /usr/lib/systemd/system/auditd.service lrwxrwxrwx. 1 root root 39 Sep 7 16:34 ceph-mon.target -> /usr/lib/systemd/system/ceph-mon.target lrwxrwxrwx. 1 root root 39 Sep 7 16:34 ceph-osd.target -> /usr/lib/systemd/system/ceph-osd.target lrwxrwxrwx. 1 root root 43 Sep 7 16:34 ceph-radosgw.target -> /usr/lib/systemd/system/ceph-radosgw.target lrwxrwxrwx. 1 root root 35 Sep 7 16:29 ceph.target -> /usr/lib/systemd/system/ceph.target lrwxrwxrwx. 1 root root 39 Sep 7 15:43 chronyd.service -> /usr/lib/systemd/system/chronyd.service lrwxrwxrwx. 1 root root 44 Sep 7 15:42 cloud-config.service -> /usr/lib/systemd/system/cloud-config.service lrwxrwxrwx. 1 root root 43 Sep 7 15:42 cloud-final.service -> /usr/lib/systemd/system/cloud-final.service lrwxrwxrwx. 1 root root 48 Sep 7 15:42 cloud-init-local.service -> /usr/lib/systemd/system/cloud-init-local.service lrwxrwxrwx. 1 root root 42 Sep 7 15:42 cloud-init.service -> /usr/lib/systemd/system/cloud-init.service lrwxrwxrwx. 1 root root 37 Sep 7 15:40 crond.service -> /usr/lib/systemd/system/crond.service lrwxrwxrwx. 1 root root 45 Sep 7 16:43 dynamic-login.service -> /usr/lib/systemd/system/dynamic-login.service lrwxrwxrwx. 1 root root 42 Sep 7 15:42 irqbalance.service -> /usr/lib/systemd/system/irqbalance.service lrwxrwxrwx. 1 root root 37 Sep 7 15:42 kdump.service -> /usr/lib/systemd/system/kdump.service lrwxrwxrwx. 1 root root 35 Sep 7 16:29 ksm.service -> /usr/lib/systemd/system/ksm.service lrwxrwxrwx. 1 root root 40 Sep 7 16:29 ksmtuned.service -> /usr/lib/systemd/system/ksmtuned.service lrwxrwxrwx. 1 root root 40 Sep 7 16:29 libvirtd.service -> /usr/lib/systemd/system/libvirtd.service lrwxrwxrwx. 1 root root 39 Sep 7 16:36 mariadb.service -> /usr/lib/systemd/system/mariadb.service lrwxrwxrwx. 1 root root 41 Sep 7 16:26 mdmonitor.service -> /usr/lib/systemd/system/mdmonitor.service lrwxrwxrwx. 1 root root 49 Sep 7 16:21 netcf-transaction.service -> /usr/lib/systemd/system/netcf-transaction.service lrwxrwxrwx. 1 root root 46 Sep 7 15:42 NetworkManager.service -> /usr/lib/systemd/system/NetworkManager.service lrwxrwxrwx. 1 root root 41 Sep 7 16:32 nfs-client.target -> /usr/lib/systemd/system/nfs-client.target lrwxrwxrwx. 1 root root 43 Sep 7 16:43 openvswitch.service -> /usr/lib/systemd/system/openvswitch.service lrwxrwxrwx. 1 root root 49 Sep 7 16:42 os-collect-config.service -> /usr/lib/systemd/system/os-collect-config.service lrwxrwxrwx. 1 root root 39 Sep 7 15:42 postfix.service -> /usr/lib/systemd/system/postfix.service lrwxrwxrwx. 1 root root 40 Sep 7 15:40 remote-fs.target -> /usr/lib/systemd/system/remote-fs.target lrwxrwxrwx. 1 root root 41 Sep 7 15:43 rhsmcertd.service -> /usr/lib/systemd/system/rhsmcertd.service lrwxrwxrwx. 1 root root 39 Sep 7 15:42 rsyslog.service -> /usr/lib/systemd/system/rsyslog.service lrwxrwxrwx. 1 root root 36 Sep 7 15:42 sshd.service -> /usr/lib/systemd/system/sshd.service lrwxrwxrwx. 1 root root 37 Sep 7 15:42 tuned.service -> /usr/lib/systemd/system/tuned.service lrwxrwxrwx. 1 root root 38 Sep 7 16:36 xinetd.service -> /usr/lib/systemd/system/xinetd.service
When looking into the overcloud image in ~stack/images, I can see that mariadb.service is enabled at VM start, and this is what probably breaks the overcloud installation. [stack@undercloud ~]$ virt-ls -a images/overcloud-full.qcow2 /etc/systemd/system/multi-user.target.wants/ NetworkManager.service atd.service auditd.service ceph-mon.target ceph-osd.target ceph-radosgw.target ceph.target chronyd.service cloud-config.service cloud-final.service cloud-init-local.service cloud-init.service crond.service dynamic-login.service irqbalance.service kdump.service ksm.service ksmtuned.service libvirtd.service mariadb.service mdmonitor.service netcf-transaction.service nfs-client.target openvswitch.service os-collect-config.service postfix.service remote-fs.target rhsmcertd.service rsyslog.service sshd.service tuned.service xinetd.service The initial description only show the overcloud install steps, it does not elaborate on how the image was built. Do you know how the image got generated? If we can reproduce we'll find out which component is wrongly enabling mariadb at startup.
The images are provided via rpm by the rhosp-director-images package in /usr/share/rhosp-director-images/ and I just take from there the overcloud-full.tar and ironic-python-agent.tar archives. I don't have details of the build process.
Image builds are a bit convoluted. They're built by running a script called tripleo-build-images from tripleo-common upstream. This script is basically a wrapper for diskimage-builder. The inputs to the script are the base and -rhel7 yaml files from [1]. The base image that diskimage-builder uses is something we build internally. It can be found at [2]. I've looked at that image and confirmed that it does not have mariadb or mysql installed or configured to start on boot. [1] https://github.com/openstack/tripleo-common/tree/master/image-yaml [2] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=512415
Created attachment 1200871 [details] overcloud image rebuild logs Logs captured after a manual rebuild of the overcloud image DIB settings: export DIB_CLOUD_INIT_ETC_HOSTS=false export DIB_LOCAL_IMAGE=/home/stack/dciabrin/images/director-input-10.0-20160907.1.x86_64.qcow2 export DIB_YUM_REPO_CONF=/etc/yum.repos.d/rhos-release-10.repo /etc/yum.repos.d/rhos-release-rhel-7.3.repo export NO_SOURCE_REPOSITORIES=1 export RHOS=1 export USE_DELOREAN_TRUNK=0 rebuild command: tripleo-build-images --image-config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images.yaml --image-config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-rhel7.yaml --verbose --debug
So after rebuilding an overcloud image manually with tripleo-build-images as per comment #9, I can confirm that mariadb is enabled at startup in the resulting image: $ virt-ls -a overcloud-full.qcow2 /etc/systemd/system/multi-user.target.wants NetworkManager.service atd.service auditd.service ceph-mon.target ceph-osd.target ceph-radosgw.target ceph.target chronyd.service cloud-config.service cloud-final.service cloud-init-local.service cloud-init.service crond.service dynamic-login.service irqbalance.service kdump.service ksm.service ksmtuned.service libvirtd.service mariadb.service mdmonitor.service netcf-transaction.service nfs-client.target openvswitch.service os-collect-config.service postfix.service remote-fs.target rhsmcertd.service rsyslog.service sshd.service tuned.service xinetd.service I've tried to narrow down the problem by creating a new VM out of base image director-input-10.0-20160907.1.x86_64.qcow2, and only do the "yum install" part of the image creation. I've installed the 609 packages found in the attached build log (see comment #9) and right after this installation mariadb is _not_ enabled at startup: [cloud-user@localhost ~]$ systemctl status mysqld ● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled) Active: inactive (dead) [cloud-user@localhost ~]$ ls -1 /etc/systemd/system/multi-user.target.wants/ atd.service auditd.service ceph-mon.target ceph-osd.target ceph-radosgw.target ceph.target chronyd.service cloud-config.service cloud-final.service cloud-init-local.service crond.service irqbalance.service kdump.service ksm.service ksmtuned.service libvirtd.service mdmonitor.service netcf-transaction.service nfs-client.target postfix.service remote-fs.target rhsmcertd.service rsyslog.service sshd.service tuned.service xinetd.service So at this stage, I would say that during the image creation process, there are steps which are executed by dib-run-parts after package installation, and those are enabling mariadb at startup while they shouldn't.
Actually when installing mariadb-galera packages directly into the base image mentionned in comment #8, I can reproduce the unwanted behaviour: [stack@undercloud images]$ virt-ls -l -a director-input-10.0-20160907.1.x86_64.qcow2 /etc/systemd/system/multi-user.target.wants | grep mariadb [stack@undercloud images]$ virt-customize -a director-input-10.0-20160907.1.x86_64.qcow2 --selinux-relabel --run-command "rpm -ivh http://download.eng.bos.redhat.com/brewroot/packages/mariadb-galera/5.5.42/3.el7ost/x86_64/mariadb-galera-server-5.5.42-3.el7ost.x86_64.rpm --nodeps"[ 0.0] Examining the guest ... [ 8.0] Setting a random seed [ 8.0] Running: rpm -ivh http://download.eng.bos.redhat.com/brewroot/packages/mariadb-galera/5.5.42/3.el7ost/x86_64/mariadb-galera-server-5.5.42-3.el7ost.x86_64.rpm --nodeps [ 16.0] SELinux relabelling [ 16.0] Finishing off [stack@undercloud images]$ virt-ls -l -a director-input-10.0-20160907.1.x86_64.qcow2 /etc/systemd/system/multi-user.target.wants | grep mariadblrwxrwxrwx 1 root root 39 Sep 15 12:11 mariadb.service -> /usr/lib/systemd/system/mariadb.service So something is misbehaving in package mariadb-galera. Investigating...
we have a theory on why this is happening, regarding the behavior of the "systemctl is-enabled" command on rhel7.3 in this environment - the specfile uses this to determine if it should automatically re-enable mariadb-galera on installation and the return code may be being misinterpreted. Damien is looking into it and we'll produce a patch soon.
ready for testing, please try it out in your env over there so we can see it's doing the right thing now.
I manually updated the image with mariadb-galera-5.5.42-4 but I can still see the issue: [root@overcloud-controller-0 heat-admin]# rpm -qa | grep mariadb-galera mariadb-galera-common-5.5.42-4.el7ost.x86_64 mariadb-galera-server-5.5.42-4.el7ost.x86_64 [root@overcloud-controller-0 heat-admin]# systemctl status mysqld ● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2016-09-16 12:16:27 UTC; 15min ago Main PID: 4978 (mysqld_safe) CGroup: /system.slice/mariadb.service ├─4978 /bin/sh /usr/bin/mysqld_safe --basedir=/usr └─6190 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --wsrep-provider=none --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/var/lib/...
Note the new package should be installed on an image where mariadb is not already enabled, because the install scriptlets in mariadb-galera keep the existing settings when the package is installed. So one would need to either start from an image with no mariadb installed, or make sure service is disabled.
(In reply to Damien Ciabrini from comment #17) > Note the new package should be installed on an image where mariadb is not > already enabled, because the install scriptlets in mariadb-galera keep the > existing settings when the package is installed. > > So one would need to either start from an image with no mariadb installed, > or make sure service is disabled. Right, I was updating the overcloud-full image which already contained the old package installed. When installing the new package from scratch the service is indeed disabled.
OpenStack-10.0-RHEL-7 Puddle: 2016-09-22.2 [root@overcloud-controller-0 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: overcloud-controller-0 (version 1.1.15-9.el7-e174ec8) - partition with quorum Last updated: Sun Sep 25 04:01:25 2016 Last change: Thu Sep 22 15:40:19 2016 by root via cibadmin on overcloud-controller-0 3 nodes and 19 resources configured Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Full list of resources: ip-10.35.180.19 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.18.0.14 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 Clone Set: haproxy-clone [haproxy] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] ip-172.17.0.13 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 ip-192.0.2.9 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 Clone Set: rabbitmq-clone [rabbitmq] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-0 ] Slaves: [ overcloud-controller-1 overcloud-controller-2 ] ip-172.19.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 ip-172.17.0.15 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@overcloud-controller-0 ~]# systemctl status mysqld ● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled) Active: inactive (dead)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html