Bug 1035013

Summary: Document disabling of automatic updates for cluster
Product: Red Hat Enterprise Linux 6 Reporter: Tuomo Soini <tis>
Component: doc-Cluster_AdministrationAssignee: Steven J. Levine <slevine>
Status: CLOSED CURRENTRELEASE QA Contact: ecs-bugs
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: abeekhof, cluster-maint, dvossel, fdinitto, slevine, toracat
Target Milestone: rcKeywords: Documentation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-24 21:40:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
patch to spec file to correct this issue
none
Alternate patch which makes sure cluster is restart on update none

Description Tuomo Soini 2013-11-26 20:49:00 UTC
Created attachment 829468 [details]
patch to spec file to correct this issue

Since 1.1.10-1.el6_4.4 there has been this %preun scriptlet:

----
%preun
/sbin/service pacemaker stop || :
if [ $1 -eq 0 ]; then
    # Package removal, not upgrade
    /sbin/chkconfig --del pacemaker || :
fi
----

and

---
%preun remote
/sbin/service pacemaker_remote stop &>/dev/null || :
if [ $1 -eq 0 ]; then
    # Package removal, not upgrade
    /sbin/chkconfig --del pacemaker_remote || :
fi
---

These cause pacemaker cluster to unconditionally stop in case of package udpate.

Correct fix is to move service stop command inside if check.

Comment 1 Tuomo Soini 2013-11-26 20:50:19 UTC
Note: to fix this issue to running clusters there need to be some trigger magics or next update will take all clusters down.

Comment 3 Andrew Beekhof 2013-11-26 21:14:31 UTC
No, the unconditional stop is intentional.
Trying to use a new client with an old daemon is unsupported.

I'd agree that it should come back up though

Comment 4 Tuomo Soini 2013-11-26 21:18:40 UTC
Created attachment 829493 [details]
Alternate patch which makes sure cluster is restart on update

Comment 5 RHEL Program Management 2013-11-29 23:47:05 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 Fabio Massimo Di Nitto 2013-11-30 08:04:22 UTC
RHEL documentation explicitly requires the cluster to be stopped on a given node before an upgrade and then started again manually after the upgrade.

There are corner cases that even cman/pacemaker init / rpm spec might not catch/handle correctly due to the order in which rpms are updated on the OS that could potentially leave the system in a much worst state.

I think the only correct solution is to have a global wrapper to stop/start all cluster components and that every package involved need to poke into during the upgrade (something similar to triggers). Tho this would be a rather intrusive change across the board.

David can you please do some investigation and see if it's reasonable?

Comment 7 Tuomo Soini 2013-11-30 08:24:12 UTC
If documentation clearly states cluster must be stopped you should be confortable with condrestart in pacemaker - that would handle most of update cases where cluster was runing when software was updated.

Always taking whole cluster down to knees if automatic updates are forgotten on and not starting cluster again is worse. When documentation clearly states cluster must be stopped on update I wouldn't even consider adding triggers to my last patch.

service pacemaker condrestart does very good work in handling all parts of cluster startup - and in case of yum update all files are already updated on the disk when %postun scripts are being run. So all cluster components are updated when this happens.

corosync or cman don't have condrestart after update so pacemaker is clearly the place to handle condrestart of the cluster on update.

Comment 8 Fabio Massimo Di Nitto 2013-11-30 08:52:30 UTC
(In reply to Tuomo Soini from comment #7)
> If documentation clearly states cluster must be stopped you should be
> confortable with condrestart in pacemaker - that would handle most of update
> cases where cluster was runing when software was updated.

Just to be clear, it's only on the node being updated that needs to be stopped.

stop on node A -> upgrade node A -> start on node A..

repeat for all cluster nodes.

> 
> Always taking whole cluster down to knees if automatic updates are forgotten
> on and not starting cluster again is worse. When documentation clearly
> states cluster must be stopped on update I wouldn't even consider adding
> triggers to my last patch.

A trigger would eventually stop the stack on that node at the first cluster package being updated by yum and started after the last one is done.

> 
> service pacemaker condrestart does very good work in handling all parts of
> cluster startup - and in case of yum update all files are already updated on
> the disk when %postun scripts are being run. So all cluster components are
> updated when this happens.
> 
> corosync or cman don't have condrestart after update so pacemaker is clearly
> the place to handle condrestart of the cluster on update.

That's the incorrect assumption.

The case I am talking about is:

pacemaker stops all
pacemaker is updated on disk
...
pacemaker starts again
cman is updated on disk

^^^ at this point you have an old running cman in memory with a new version on disk, and you believe that you are running latest and greatest.

rpm/yum Requires: foo guarantees that foo is installed but not necessarily the latest version. rpm/yum dependency on foo is already satisfied by the old installed version basically.

Now, add a Requires: foo >= $ver is suicidal because it means rebuild the whole stack for every single change (as you can understand yourself).

Comment 9 Andrew Beekhof 2014-01-14 05:15:15 UTC
All things considered, I think leaving the cluster stopped is the best/least fragile approach.

Comment 10 David Vossel 2014-01-21 18:52:54 UTC
(In reply to Andrew Beekhof from comment #9)
> All things considered, I think leaving the cluster stopped is the best/least
> fragile approach.

I also think our best option here is to leave the cluster stopped.  It is very complex to correctly support any other method for the rpm update. The resource-agents pacemaker manages use pacemaker client tools. If we update, then restart pacemaker, we run the risk that those updated client tools are going to be used by resource-agents against an old version of pacemaker before the restart completes. We don't support or test new client tools interacting with outdated server components. This can cause resource failures and other unexpected behaviors.

If the documentation does not already state so, we should strongly discourage automated updates in a cluster environment.

-- Vossel

Comment 11 David Vossel 2014-02-05 15:35:10 UTC
I'm reassigning this to documentation.  This could require nothing on the documentation side, and in that case the issue can be cloned.

If we don't already, somewhere in the documentation we need to explicitly say "Disable automatic updates" as the cluster will be disabled during a pacemaker rpm update.

-- Vossel