Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1035013

Summary:

Document disabling of automatic updates for cluster

Product:

Red Hat Enterprise Linux 6

Reporter:

Tuomo Soini <tis>

Component:

doc-Cluster_Administration

Assignee:

Steven J. Levine <slevine>

Status:

CLOSED CURRENTRELEASE

QA Contact:

ecs-bugs

Severity:

high

Docs Contact:

Priority:

high

Version:

6.5

CC:

abeekhof, cluster-maint, dvossel, fdinitto, slevine, toracat

Target Milestone:

Keywords:

Documentation

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-10-24 21:40:42 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
patch to spec file to correct this issue	none
Alternate patch which makes sure cluster is restart on update	none

Description Tuomo Soini 2013-11-26 20:49:00 UTC

Created attachment 829468 [details]
patch to spec file to correct this issue

Since 1.1.10-1.el6_4.4 there has been this %preun scriptlet:

----
%preun
/sbin/service pacemaker stop || :
if [ $1 -eq 0 ]; then
    # Package removal, not upgrade
    /sbin/chkconfig --del pacemaker || :
fi
----

and

---
%preun remote
/sbin/service pacemaker_remote stop &>/dev/null || :
if [ $1 -eq 0 ]; then
    # Package removal, not upgrade
    /sbin/chkconfig --del pacemaker_remote || :
fi
---

These cause pacemaker cluster to unconditionally stop in case of package udpate.

Correct fix is to move service stop command inside if check.

Comment 1 Tuomo Soini 2013-11-26 20:50:19 UTC

Note: to fix this issue to running clusters there need to be some trigger magics or next update will take all clusters down.

Comment 3 Andrew Beekhof 2013-11-26 21:14:31 UTC

No, the unconditional stop is intentional.
Trying to use a new client with an old daemon is unsupported.

I'd agree that it should come back up though

Comment 4 Tuomo Soini 2013-11-26 21:18:40 UTC

Created attachment 829493 [details]
Alternate patch which makes sure cluster is restart on update

Comment 5 RHEL Program Management 2013-11-29 23:47:05 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 Fabio Massimo Di Nitto 2013-11-30 08:04:22 UTC

RHEL documentation explicitly requires the cluster to be stopped on a given node before an upgrade and then started again manually after the upgrade.

There are corner cases that even cman/pacemaker init / rpm spec might not catch/handle correctly due to the order in which rpms are updated on the OS that could potentially leave the system in a much worst state.

I think the only correct solution is to have a global wrapper to stop/start all cluster components and that every package involved need to poke into during the upgrade (something similar to triggers). Tho this would be a rather intrusive change across the board.

David can you please do some investigation and see if it's reasonable?

Comment 7 Tuomo Soini 2013-11-30 08:24:12 UTC

If documentation clearly states cluster must be stopped you should be confortable with condrestart in pacemaker - that would handle most of update cases where cluster was runing when software was updated.

Always taking whole cluster down to knees if automatic updates are forgotten on and not starting cluster again is worse. When documentation clearly states cluster must be stopped on update I wouldn't even consider adding triggers to my last patch.

service pacemaker condrestart does very good work in handling all parts of cluster startup - and in case of yum update all files are already updated on the disk when %postun scripts are being run. So all cluster components are updated when this happens.

corosync or cman don't have condrestart after update so pacemaker is clearly the place to handle condrestart of the cluster on update.

Comment 8 Fabio Massimo Di Nitto 2013-11-30 08:52:30 UTC

(In reply to Tuomo Soini from comment #7)
> If documentation clearly states cluster must be stopped you should be
> confortable with condrestart in pacemaker - that would handle most of update
> cases where cluster was runing when software was updated.

Just to be clear, it's only on the node being updated that needs to be stopped.

stop on node A -> upgrade node A -> start on node A..

repeat for all cluster nodes.

> 
> Always taking whole cluster down to knees if automatic updates are forgotten
> on and not starting cluster again is worse. When documentation clearly
> states cluster must be stopped on update I wouldn't even consider adding
> triggers to my last patch.

A trigger would eventually stop the stack on that node at the first cluster package being updated by yum and started after the last one is done.

> 
> service pacemaker condrestart does very good work in handling all parts of
> cluster startup - and in case of yum update all files are already updated on
> the disk when %postun scripts are being run. So all cluster components are
> updated when this happens.
> 
> corosync or cman don't have condrestart after update so pacemaker is clearly
> the place to handle condrestart of the cluster on update.

That's the incorrect assumption.

The case I am talking about is:

pacemaker stops all
pacemaker is updated on disk
...
pacemaker starts again
cman is updated on disk

^^^ at this point you have an old running cman in memory with a new version on disk, and you believe that you are running latest and greatest.

rpm/yum Requires: foo guarantees that foo is installed but not necessarily the latest version. rpm/yum dependency on foo is already satisfied by the old installed version basically.

Now, add a Requires: foo >= $ver is suicidal because it means rebuild the whole stack for every single change (as you can understand yourself).

Comment 9 Andrew Beekhof 2014-01-14 05:15:15 UTC

All things considered, I think leaving the cluster stopped is the best/least fragile approach.

Comment 10 David Vossel 2014-01-21 18:52:54 UTC

(In reply to Andrew Beekhof from comment #9)
> All things considered, I think leaving the cluster stopped is the best/least
> fragile approach.

I also think our best option here is to leave the cluster stopped.  It is very complex to correctly support any other method for the rpm update. The resource-agents pacemaker manages use pacemaker client tools. If we update, then restart pacemaker, we run the risk that those updated client tools are going to be used by resource-agents against an old version of pacemaker before the restart completes. We don't support or test new client tools interacting with outdated server components. This can cause resource failures and other unexpected behaviors.

If the documentation does not already state so, we should strongly discourage automated updates in a cluster environment.

-- Vossel

Comment 11 David Vossel 2014-02-05 15:35:10 UTC

I'm reassigning this to documentation.  This could require nothing on the documentation side, and in that case the issue can be cloned.

If we don't already, somewhere in the documentation we need to explicitly say "Disable automatic updates" as the cluster will be disabled during a pacemaker rpm update.

-- Vossel