1366609 – M/N upgrade fail as the galera cluster doesn't restart.

Bug 1366609 - M/N upgrade fail as the galera cluster doesn't restart.

Summary: M/N upgrade fail as the galera cluster doesn't restart.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	beta
Target Release:	10.0 (Newton)
Assignee:	Sofer Athlan-Guyot
QA Contact:	Omri Hochman
Docs Contact:
URL:
Whiteboard:	AutomationBlocker
Depends On:
Blocks:	1337794
TreeView+	depends on / blocked

Reported:	2016-08-12 12:40 UTC by Sofer Athlan-Guyot
Modified:	2016-12-29 16:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.1.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-12-14 15:51:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1612642	None	None	None	2016-08-12 12:40:58 UTC
OpenStack gerrit	354713	None	None	None	2016-08-12 13:16:06 UTC
Red Hat Product Errata	RHEA-2016:2948	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 enhancement update	2016-12-14 19:55:27 UTC

Description Sofer Athlan-Guyot 2016-08-12 12:40:58 UTC

Description of problem:

See the upstream bug.

Comment 2 Sofer Athlan-Guyot 2016-08-29 21:14:22 UTC

Hi,

doing a OSP9 to OSP10 upgrade fails as the cluster doesn't restart.

We can see:

    + node_states=' galera (ocf::heartbeat:galera): Started overcloud-controller-0
    * galera_start_0 on overcloud-controller-2 '\''not installed'\'' (5): call=240, status=complete, exitreason='\''Datadir /var/lib/mysql doesn'\''t exist'\'',
    * galera_start_0 on overcloud-controller-1 '\''not installed'\'' (5): call=240, status=complete, exitreason='\''Datadir /var/lib/mysql doesn'\''t exist'\'','
    + echo ' galera (ocf::heartbeat:galera): Started overcloud-controller-0
    * galera_start_0 on overcloud-controller-2 '\''not installed'\'' (5): call=240, status=complete, exitreason='\''Datadir /var/lib/mysql doesn'\''t exist'\'',
    * galera_start_0 on overcloud-controller-1 '\''not installed'\'' (5): call=240, status=complete, exitreason='\''Datadir /var/lib/mysql doesn'\''t exist'\'','

And I can confirm that in the node other than bootstrap node, the
/var/lib/mysql directory has completely as the backup directory.

Comment 6 mlammon 2016-11-15 19:22:04 UTC

Deployed RHOS 9 latest
Upgraded to RHOS 10 with latest puddle (2016-11-14.1)

I no longer see this issue.

[stack@undercloud-0 ~]$ ssh heat-admin.2.10
Last login: Tue Nov 15 19:04:39 2016 from gateway
[heat-admin@controller-0 ~]$ sudo -i
[root@controller-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Tue Nov 15 19:08:40 2016		Last change: Tue Nov 15 01:10:37 2016 by root via crm_resource on controller-0

3 nodes and 19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-fd00.fd00.fd00.4000..10	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-192.0.2.6	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 ip-2620.52.0.13b8.5054.ff.fe3e.1	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-0 ]
     Slaves: [ controller-1 controller-2 ]
 ip-fd00.fd00.fd00.3000..10	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-fd00.fd00.fd00.2000..10	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-fd00.fd00.fd00.2000..11	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 8 errata-xmlrpc 2016-12-14 15:51:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.