Bug 1085447

Summary:	CTDB rebase to 2.5.x requires a cluster restart
Product:	Red Hat Enterprise Linux 6	Reporter:	Abhijith Das <adas>
Component:	ctdb	Assignee:	Sumit Bose <sbose>
Status:	CLOSED ERRATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	unspecified	Docs Contact:
Priority:	medium
Version:	6.6	CC:	adas, dpal, fdinitto, jpayne, mnavrati, sbradley, swhiteho
Target Milestone:	rc	Keywords:	Rebase
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ctdb-2.5.1-1.el6	Doc Type:	Release Note
Doc Text:	⁠CTDB Upgrade Red Hat Enterprise Linux 6.6 contains a new version of the CTDB agent, in which some internal operations have changed to improve stability and reliability. As a consequence, the new version cannot be mixed with older versions running in parallel in the same cluster. To update CTDB in an existing cluster, CTDB must be stopped on all nodes in the cluster before the upgrade starts, and the nodes can then be updated one by one and started again.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-10-14 06:47:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Abhijith Das 2014-04-08 15:47:07 UTC

Discussion about this rebase in 6.6 that requires the ctdb service to be shutdown, ctdb packages upgraded and the ctdb service brought back up.

Comment 1 Abhijith Das 2014-04-08 15:54:55 UTC

Here's the discussion I had with Sumit on IRC yesterday:

<sbose> abhi, hi, we are planning a rebase of ctdb in 6.6. I already talked with RHS and we think going to version 2.5.x would be the best move, but it will require a complete shutdown of a cluster to upgrade. Do you think this will work for you (gfs2/cluster team) as well?
<abhi> sbose, does it break an existing cluster if the package is upgraded and the cluster not restarted?
* fnarel|afk is now known as fnrel
* fnrel has quit (Quit: Peace Out!)
* fnarel (~fnarel.redhat.com) has joined #samba
<sbose> abhi, you mean the package is replaced on-disk, but the processes are not restarted?
<abhi> I think it should be ok, as long as the user is aware of this cluster restart requirement before upgrading
 sbose simo sir_nills|TL sorenson sprabhu sputhenp|pto sreber|gone 
<abhi> sbose, yeah
<sbose> abhi, I guess it would work, the old binary will call some new scripts on shutdown but this should work (but needs to be tested). Nevertheless I would recommend in the release note to shut down the cluster first, then upgrade the packages and restart the clusters. Since you have to take down all node I think you won't save much time by updating the packages first.
<abhi> sbose, ok... that sounds reasonable to me
<sbose> abhi, ok, thanks, other benefits of the rebase would be then the versions in 6.6 and 7.0 are nearly in sync and we are working on the same tree as upstream, additionally all reported RHEL-6 issues are fixed in 2.5.x.
<crh> abhi: sbose:  I'm in Westford, and just spoke to Ira a short while ago about going with 2.5.x.  We'll be moving to 2.5 in RHS, so it seems. 
<sbose> crh, yes, that's what I heard from Jose as well. 
<crh> Good news if you're planning on moving that way too.  It will make things easier for all, in the long run.
<abhi> sbose, wouldn't simply doing a 'service ctdb stop', followed by upgrade and 'service ctdb start' do the trick?
<sbose> abhi, 'service ctdb stop' on all nodes.
<abhi> yeah.
<sbose> abhi, then you can upgrade and start one after the other or does all upgrades first and then restart all again.
<abhi> sbose, I'm thinking...1. "service ctdb stop" on all nodes, 2. upgrade ctdb on all nodes, 3. "service ctdb start" on all nodes
<sbose> abhi, yes, sorry, by cluster I only meant the ctdb part of a cluster, not everything running a the cluster.
<abhi> sbose, out of curiosity, what's new in 2.5.x?
<sbose> abhi, it mostly performance and stability improvements. There are a bit of chaos with different trees in 1.x. There was the 1.0.114.x, the 1.2. 1.10 and some others. The new maintainer did a herculean task to merge all the good in the branches together to get 2.0. Some of the more visible improvements are better systemd integration than my crude patches, pcp support, improved man pages and self tests.
<abhi> sbose, having to restart ctdb is not ideal for our customers, but if it's critical for this new version to go into 6.6, we must document the upgrade procedure appropriately and make sure GSS is aware of this because they'll be the ones fielding customer calls.
<sbose> abhi, yes I understand, but given the lifetime of RHEL6 I think we have to do this step sooner or later. Btw with the next major release of samba (4.2) ctdb will be part of samba and not stand alone anymore.
<abhi> sbose, yeah... crh and jarrpa told me about that (ctdb integrated into samba) last week

Sumit, in talking with Steve and Shane, it looks like this rebase is going to interfere with our ability to support rolling upgrades:
https://access.redhat.com/site/articles/40753
https://access.redhat.com/site/solutions/39903

Questions:
Are the fixes you mentioned above critical enough to warrant this rebase?

What would happen if a customer is unaware of the procedure and performs a rolling upgrade? i.e. upgrades one node at a time?

What happens if one node is upgraded and rebooted (or cluster restarted). Will ctdb break?

What happens if the packages are updated on all the nodes and the cluster is NOT restarted before or after the upgrade? Will ctdb still function correctly?

Comment 4 Justin Payne 2014-06-30 17:21:20 UTC

Verified SanityOnly in ctdb-2.5.1-1.el6 from RHEL-6.6-candidate-20140526 via upgrade from
6.5 to 6.6:

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i rpm -q ctdb; done
ctdb-1.0.114.5-3.el6.x86_64
ctdb-1.0.114.5-3.el6.x86_64
ctdb-1.0.114.5-3.el6.x86_64

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i wget -P
/etc/yum.repos.d
http://sts.lab.msp.redhat.com/dist/brewroot/repos/RHEL-6.6-candidate-20140526.repo;
done

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i yum -y update
ctdb; done

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i rpm -q ctdb; done
ctdb-2.5.1-1.el6.x86_64
ctdb-2.5.1-1.el6.x86_64
ctdb-2.5.1-1.el6.x86_64

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i clustat; done
Cluster Status for dash @ Tue May 27 12:00:40 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online, Local
 dash-02                                     2 Online
 dash-03                                     3 Online

Cluster Status for dash @ Tue May 27 12:00:40 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online
 dash-02                                     2 Online, Local
 dash-03                                     3 Online

Cluster Status for dash @ Tue May 27 12:00:40 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online
 dash-02                                     2 Online
 dash-03                                     3 Online, Local

Comment 5 errata-xmlrpc 2014-10-14 06:47:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1488.html