rubygem-staypuft: deployment doesn't complete, checking pcs status: galera_promote_0 on pcmk-maca25400702875 'unknown error' (1): Environment: openstack-foreman-installer-3.0.6-1.el7ost.noarch ruby193-rubygem-staypuft-0.5.5-1.el7ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch rhel-osp-installer-client-0.5.3-1.el7ost.noarch openstack-puppet-modules-2014.2.7-2.el7ost.noarch rhel-osp-installer-0.5.3-1.el7ost.noarch Steps to reproduce: 1. Install rhel-osp-installer 2. Run haneutron deployment. Result: The deployment doesn't complete for a long time. Trying to run the puppet agent manually. It completes on 2 controllers, but on the 3 controller it seems stuck after: Notice: /Stage[main]/Quickstack::Pacemaker::Galera/Exec[all-mysqlinit-nodes-are-up]/returns: executed successfully The output from "pcs status" shows: galera_promote_0 on pcmk-maca25400702875 'unknown error' (1): call=384, status=complete, last-rc-change='Mon Dec 15 17:19:36 2014', queued=205043ms, exec=0ms Expected result: no galera issues.
Created attachment 969325 [details] logs from controller1
Created attachment 969326 [details] logs from controller2
Created attachment 969327 [details] logs from controller3
The problem is both pacemaker and systemd are launching a galera instance at the same time. looking at the controller1 logs, I can see systemd attempting to start MariaDB while pacemaker is promoting the galera instance. This is wrong. systemd shouldn't be touching anything galera related when pacemaker is managing the database. In the galera logs I see a bunch of these errors right after systemd starts doing things. There are two instances of mariadb colliding. One under pacemaker control, one under systemd control. 141215 17:20:41 InnoDB: Completed initialization of buffer pool InnoDB: Unable to lock ./ibdata1, error: 11 InnoDB: Check that you do not already have another mysqld process InnoDB: using the same InnoDB data or log files. 141215 17:20:41 InnoDB: Retrying to lock the first data file InnoDB: Unable to lock ./ibdata1, error: 11 InnoDB: Check that you do not already have another mysqld process InnoDB: using the same InnoDB data or log files. InnoDB: Unable to lock ./ibdata1, error: 11 -- David
Sasha, do you know if mariadb was attempted to be started by hand at some point? The errors David refer to after a couple of puppet runs occurred and I'm not sure why mariadb would be running on its own.
Crag, I tried to manually start the mariab with the thought it might help the deployment. Thanks.
https://github.com/redhat-openstack/astapor/pull/433
merged
This BZ is now marked modified. Did the SSL change fix the problem? I am not convinced this is the root of the problem, but I am/was concerned that galera gets mildly confused if you specify ssl key/cert mut disable it.
(In reply to Ryan O'Hara from comment #17) > This BZ is now marked modified. Did the SSL change fix the problem? I am not > convinced this is the root of the problem, but I am/was concerned that > galera gets mildly confused if you specify ssl key/cert mut disable it. There were 2 things identified that needed attention: 1. ssl cert when ssl disabled (we enable it now) 2. galera starting in both systemd and pacemaker (this was determined to be user error, it was manually started by QE) I don't know if this is 100% resolved, but wanted to get it tested initially with these things fixed to see if that fixes the issue. If it fails, then we'll investigate again.
Verified: Environment: ruby193-rubygem-staypuft-0.5.12-1.el7ost.noarch openstack-puppet-modules-2014.2.8-1.el7ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch openstack-foreman-installer-3.0.10-2.el7ost.noarch rhel-osp-installer-0.5.5-1.el7ost.noarch rhel-osp-installer-client-0.5.5-1.el7ost.noarch The reported issue doesn't reproduce.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0156.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days