1174487 – rubygem-staypuft: deployment doesn't complete, checking pcs status: galera_promote_0 on pcmk-maca25400702875 'unknown error' (1):

Bug 1174487 - rubygem-staypuft: deployment doesn't complete, checking pcs status: galera_promote_0 on pcmk-maca25400702875 'unknown error' (1):

Summary: rubygem-staypuft: deployment doesn't complete, checking pcs status: galera_pr...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-foreman-installer
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	ga
Target Release:	Installer
Assignee:	Crag Wolfe
QA Contact:	Alexander Chuzhoy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1177026
TreeView+	depends on / blocked

Reported:	2014-12-15 22:30 UTC by Alexander Chuzhoy
Modified:	2023-09-14 02:56 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-foreman-installer-3.0.7-1.el7ost
Doc Type:	Bug Fix
Doc Text:	Prior to this update, Galera configuration was disabling SSL by default but still setting values for the SSL certificates. Consequently, related 'unknown' errors were reported by the system. This update addresses the issue by enabling SSL by default. In addition, certificate entries are not set if SSL is disabled in the configuration. As a result, the related 'unknown' error is no longer present.
Clone Of:
Environment:
Last Closed:	2015-02-09 15:18:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs from controller1 (7.34 MB, application/x-gzip) 2014-12-15 22:39 UTC, Alexander Chuzhoy	no flags	Details
logs from controller2 (7.36 MB, application/x-gzip) 2014-12-15 22:46 UTC, Alexander Chuzhoy	no flags	Details
logs from controller3 (8.41 MB, application/x-gzip) 2014-12-15 22:53 UTC, Alexander Chuzhoy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-28561	0	None	None	None	2023-09-14 02:56:36 UTC
Red Hat Product Errata	RHBA-2015:0156	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform Installer Bug Fix Advisory	2015-02-09 20:13:39 UTC

Description Alexander Chuzhoy 2014-12-15 22:30:37 UTC

rubygem-staypuft: deployment doesn't complete, checking pcs status: galera_promote_0 on pcmk-maca25400702875 'unknown error' (1):

Environment:
openstack-foreman-installer-3.0.6-1.el7ost.noarch
ruby193-rubygem-staypuft-0.5.5-1.el7ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch
rhel-osp-installer-client-0.5.3-1.el7ost.noarch
openstack-puppet-modules-2014.2.7-2.el7ost.noarch
rhel-osp-installer-0.5.3-1.el7ost.noarch


Steps to reproduce:
1. Install rhel-osp-installer
2. Run haneutron deployment.


Result:
The deployment doesn't complete for a long time.
Trying to run the puppet agent manually. It completes on 2 controllers, but on the 3 controller it seems stuck after:
Notice: /Stage[main]/Quickstack::Pacemaker::Galera/Exec[all-mysqlinit-nodes-are-up]/returns: executed successfully


The output from "pcs status" shows:
    galera_promote_0 on pcmk-maca25400702875 'unknown error' (1): call=384, status=complete, last-rc-change='Mon Dec 15 17:19:36 2014', queued=205043ms, exec=0ms

Expected result:
no galera issues.

Comment 1 Alexander Chuzhoy 2014-12-15 22:39:36 UTC

Created attachment 969325 [details]
logs from controller1

Comment 2 Alexander Chuzhoy 2014-12-15 22:46:35 UTC

Created attachment 969326 [details]
logs from controller2

Comment 4 Alexander Chuzhoy 2014-12-15 22:53:41 UTC

Created attachment 969327 [details]
logs from controller3

Comment 8 David Vossel 2014-12-16 17:17:14 UTC

The problem is both pacemaker and systemd are launching a galera instance at the same time.

looking at the controller1 logs, I can see systemd attempting to start MariaDB while pacemaker is promoting the galera instance. This is wrong. systemd shouldn't be touching anything galera related when pacemaker is managing the database.

In the galera logs I see a bunch of these errors right after systemd starts doing things. There are two instances of mariadb colliding. One under pacemaker control, one under systemd control.

141215 17:20:41 InnoDB: Completed initialization of buffer pool
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
141215 17:20:41  InnoDB: Retrying to lock the first data file
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11

-- David

Comment 10 Crag Wolfe 2014-12-16 17:45:50 UTC

Sasha, do you know if mariadb was attempted to be started by hand at some point?  The errors David refer to after a couple of puppet runs occurred and I'm not sure why mariadb would be running on its own.

Comment 12 Alexander Chuzhoy 2014-12-16 18:13:31 UTC

Crag,
I tried to manually start the mariab with the thought it might help the deployment.
Thanks.

Comment 14 Mike Burns 2014-12-16 18:42:38 UTC

https://github.com/redhat-openstack/astapor/pull/433

Comment 15 Jason Guiditta 2014-12-16 21:49:11 UTC

merged

Comment 17 Ryan O'Hara 2014-12-17 00:04:44 UTC

This BZ is now marked modified. Did the SSL change fix the problem? I am not convinced this is the root of the problem, but I am/was concerned that galera gets mildly confused if you specify ssl key/cert mut disable it.

Comment 18 Mike Burns 2014-12-17 00:17:14 UTC

(In reply to Ryan O'Hara from comment #17)
> This BZ is now marked modified. Did the SSL change fix the problem? I am not
> convinced this is the root of the problem, but I am/was concerned that
> galera gets mildly confused if you specify ssl key/cert mut disable it.

There were 2 things identified that needed attention:

1.  ssl cert when ssl disabled (we enable it now)
2.  galera starting in both systemd and pacemaker (this was determined to be user error, it was manually started by QE)

I don't know if this is 100% resolved, but wanted to get it tested initially with these things fixed to see if that fixes the issue.  If it fails, then we'll investigate again.

Comment 20 Alexander Chuzhoy 2015-01-16 20:29:52 UTC

Verified:
Environment:
ruby193-rubygem-staypuft-0.5.12-1.el7ost.noarch
openstack-puppet-modules-2014.2.8-1.el7ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch
openstack-foreman-installer-3.0.10-2.el7ost.noarch
rhel-osp-installer-0.5.5-1.el7ost.noarch
rhel-osp-installer-client-0.5.5-1.el7ost.noarch

The reported issue doesn't reproduce.

Comment 22 errata-xmlrpc 2015-02-09 15:18:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0156.html

Comment 23 Red Hat Bugzilla 2023-09-14 02:52:25 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.