1085447 – CTDB rebase to 2.5.x requires a cluster restart

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1085447 - CTDB rebase to 2.5.x requires a cluster restart

Summary: CTDB rebase to 2.5.x requires a cluster restart

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	ctdb
Sub Component:
Version:	6.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Sumit Bose
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-08 15:47 UTC by Abhijith Das
Modified:	2014-10-14 06:47 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ctdb-2.5.1-1.el6
Doc Type:	Release Note
Doc Text:	⁠CTDB Upgrade Red Hat Enterprise Linux 6.6 contains a new version of the CTDB agent, in which some internal operations have changed to improve stability and reliability. As a consequence, the new version cannot be mixed with older versions running in parallel in the same cluster. To update CTDB in an existing cluster, CTDB must be stopped on all nodes in the cluster before the upgrade starts, and the nodes can then be updated one by one and started again.
Clone Of:
Environment:
Last Closed:	2014-10-14 06:47:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:1488	0	normal	SHIPPED_LIVE	ctdb bug fix and enhancement update	2014-10-14 01:28:29 UTC

Description Abhijith Das 2014-04-08 15:47:07 UTC

Discussion about this rebase in 6.6 that requires the ctdb service to be shutdown, ctdb packages upgraded and the ctdb service brought back up.

Comment 1 Abhijith Das 2014-04-08 15:54:55 UTC

Here's the discussion I had with Sumit on IRC yesterday:

<sbose> abhi, hi, we are planning a rebase of ctdb in 6.6. I already talked with RHS and we think going to version 2.5.x would be the best move, but it will require a complete shutdown of a cluster to upgrade. Do you think this will work for you (gfs2/cluster team) as well?
<abhi> sbose, does it break an existing cluster if the package is upgraded and the cluster not restarted?
* fnarel|afk is now known as fnrel
* fnrel has quit (Quit: Peace Out!)
* fnarel (~fnarel.redhat.com) has joined #samba
<sbose> abhi, you mean the package is replaced on-disk, but the processes are not restarted?
<abhi> I think it should be ok, as long as the user is aware of this cluster restart requirement before upgrading
 sbose simo sir_nills|TL sorenson sprabhu sputhenp|pto sreber|gone 
<abhi> sbose, yeah
<sbose> abhi, I guess it would work, the old binary will call some new scripts on shutdown but this should work (but needs to be tested). Nevertheless I would recommend in the release note to shut down the cluster first, then upgrade the packages and restart the clusters. Since you have to take down all node I think you won't save much time by updating the packages first.
<abhi> sbose, ok... that sounds reasonable to me
<sbose> abhi, ok, thanks, other benefits of the rebase would be then the versions in 6.6 and 7.0 are nearly in sync and we are working on the same tree as upstream, additionally all reported RHEL-6 issues are fixed in 2.5.x.
<crh> abhi: sbose:  I'm in Westford, and just spoke to Ira a short while ago about going with 2.5.x.  We'll be moving to 2.5 in RHS, so it seems. 
<sbose> crh, yes, that's what I heard from Jose as well. 
<crh> Good news if you're planning on moving that way too.  It will make things easier for all, in the long run.
<abhi> sbose, wouldn't simply doing a 'service ctdb stop', followed by upgrade and 'service ctdb start' do the trick?
<sbose> abhi, 'service ctdb stop' on all nodes.
<abhi> yeah.
<sbose> abhi, then you can upgrade and start one after the other or does all upgrades first and then restart all again.
<abhi> sbose, I'm thinking...1. "service ctdb stop" on all nodes, 2. upgrade ctdb on all nodes, 3. "service ctdb start" on all nodes
<sbose> abhi, yes, sorry, by cluster I only meant the ctdb part of a cluster, not everything running a the cluster.
<abhi> sbose, out of curiosity, what's new in 2.5.x?
<sbose> abhi, it mostly performance and stability improvements. There are a bit of chaos with different trees in 1.x. There was the 1.0.114.x, the 1.2. 1.10 and some others. The new maintainer did a herculean task to merge all the good in the branches together to get 2.0. Some of the more visible improvements are better systemd integration than my crude patches, pcp support, improved man pages and self tests.
<abhi> sbose, having to restart ctdb is not ideal for our customers, but if it's critical for this new version to go into 6.6, we must document the upgrade procedure appropriately and make sure GSS is aware of this because they'll be the ones fielding customer calls.
<sbose> abhi, yes I understand, but given the lifetime of RHEL6 I think we have to do this step sooner or later. Btw with the next major release of samba (4.2) ctdb will be part of samba and not stand alone anymore.
<abhi> sbose, yeah... crh and jarrpa told me about that (ctdb integrated into samba) last week

Sumit, in talking with Steve and Shane, it looks like this rebase is going to interfere with our ability to support rolling upgrades:
https://access.redhat.com/site/articles/40753
https://access.redhat.com/site/solutions/39903

Questions:
Are the fixes you mentioned above critical enough to warrant this rebase?

What would happen if a customer is unaware of the procedure and performs a rolling upgrade? i.e. upgrades one node at a time?

What happens if one node is upgraded and rebooted (or cluster restarted). Will ctdb break?

What happens if the packages are updated on all the nodes and the cluster is NOT restarted before or after the upgrade? Will ctdb still function correctly?

Comment 4 Justin Payne 2014-06-30 17:21:20 UTC

Verified SanityOnly in ctdb-2.5.1-1.el6 from RHEL-6.6-candidate-20140526 via upgrade from
6.5 to 6.6:

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i rpm -q ctdb; done
ctdb-1.0.114.5-3.el6.x86_64
ctdb-1.0.114.5-3.el6.x86_64
ctdb-1.0.114.5-3.el6.x86_64

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i wget -P
/etc/yum.repos.d
http://sts.lab.msp.redhat.com/dist/brewroot/repos/RHEL-6.6-candidate-20140526.repo;
done

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i yum -y update
ctdb; done

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i rpm -q ctdb; done
ctdb-2.5.1-1.el6.x86_64
ctdb-2.5.1-1.el6.x86_64
ctdb-2.5.1-1.el6.x86_64

[root@host-033 ~]# for i in `seq 1 3`; do qarsh root@dash-0$i clustat; done
Cluster Status for dash @ Tue May 27 12:00:40 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online, Local
 dash-02                                     2 Online
 dash-03                                     3 Online

Cluster Status for dash @ Tue May 27 12:00:40 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online
 dash-02                                     2 Online, Local
 dash-03                                     3 Online

Cluster Status for dash @ Tue May 27 12:00:40 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online
 dash-02                                     2 Online
 dash-03                                     3 Online, Local

Comment 5 errata-xmlrpc 2014-10-14 06:47:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1488.html

Note You need to log in before you can comment on or make changes to this bug.