| Summary: | Document disabling of automatic updates for cluster | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Tuomo Soini <tis> | ||||||
| Component: | doc-Cluster_Administration | Assignee: | Steven J. Levine <slevine> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ecs-bugs | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 6.5 | CC: | abeekhof, cluster-maint, dvossel, fdinitto, slevine, toracat | ||||||
| Target Milestone: | rc | Keywords: | Documentation | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-10-24 21:40:42 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Note: to fix this issue to running clusters there need to be some trigger magics or next update will take all clusters down. No, the unconditional stop is intentional. Trying to use a new client with an old daemon is unsupported. I'd agree that it should come back up though Created attachment 829493 [details]
Alternate patch which makes sure cluster is restart on update
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. RHEL documentation explicitly requires the cluster to be stopped on a given node before an upgrade and then started again manually after the upgrade. There are corner cases that even cman/pacemaker init / rpm spec might not catch/handle correctly due to the order in which rpms are updated on the OS that could potentially leave the system in a much worst state. I think the only correct solution is to have a global wrapper to stop/start all cluster components and that every package involved need to poke into during the upgrade (something similar to triggers). Tho this would be a rather intrusive change across the board. David can you please do some investigation and see if it's reasonable? If documentation clearly states cluster must be stopped you should be confortable with condrestart in pacemaker - that would handle most of update cases where cluster was runing when software was updated. Always taking whole cluster down to knees if automatic updates are forgotten on and not starting cluster again is worse. When documentation clearly states cluster must be stopped on update I wouldn't even consider adding triggers to my last patch. service pacemaker condrestart does very good work in handling all parts of cluster startup - and in case of yum update all files are already updated on the disk when %postun scripts are being run. So all cluster components are updated when this happens. corosync or cman don't have condrestart after update so pacemaker is clearly the place to handle condrestart of the cluster on update. (In reply to Tuomo Soini from comment #7) > If documentation clearly states cluster must be stopped you should be > confortable with condrestart in pacemaker - that would handle most of update > cases where cluster was runing when software was updated. Just to be clear, it's only on the node being updated that needs to be stopped. stop on node A -> upgrade node A -> start on node A.. repeat for all cluster nodes. > > Always taking whole cluster down to knees if automatic updates are forgotten > on and not starting cluster again is worse. When documentation clearly > states cluster must be stopped on update I wouldn't even consider adding > triggers to my last patch. A trigger would eventually stop the stack on that node at the first cluster package being updated by yum and started after the last one is done. > > service pacemaker condrestart does very good work in handling all parts of > cluster startup - and in case of yum update all files are already updated on > the disk when %postun scripts are being run. So all cluster components are > updated when this happens. > > corosync or cman don't have condrestart after update so pacemaker is clearly > the place to handle condrestart of the cluster on update. That's the incorrect assumption. The case I am talking about is: pacemaker stops all pacemaker is updated on disk ... pacemaker starts again cman is updated on disk ^^^ at this point you have an old running cman in memory with a new version on disk, and you believe that you are running latest and greatest. rpm/yum Requires: foo guarantees that foo is installed but not necessarily the latest version. rpm/yum dependency on foo is already satisfied by the old installed version basically. Now, add a Requires: foo >= $ver is suicidal because it means rebuild the whole stack for every single change (as you can understand yourself). All things considered, I think leaving the cluster stopped is the best/least fragile approach. (In reply to Andrew Beekhof from comment #9) > All things considered, I think leaving the cluster stopped is the best/least > fragile approach. I also think our best option here is to leave the cluster stopped. It is very complex to correctly support any other method for the rpm update. The resource-agents pacemaker manages use pacemaker client tools. If we update, then restart pacemaker, we run the risk that those updated client tools are going to be used by resource-agents against an old version of pacemaker before the restart completes. We don't support or test new client tools interacting with outdated server components. This can cause resource failures and other unexpected behaviors. If the documentation does not already state so, we should strongly discourage automated updates in a cluster environment. -- Vossel I'm reassigning this to documentation. This could require nothing on the documentation side, and in that case the issue can be cloned. If we don't already, somewhere in the documentation we need to explicitly say "Disable automatic updates" as the cluster will be disabled during a pacemaker rpm update. -- Vossel |
Created attachment 829468 [details] patch to spec file to correct this issue Since 1.1.10-1.el6_4.4 there has been this %preun scriptlet: ---- %preun /sbin/service pacemaker stop || : if [ $1 -eq 0 ]; then # Package removal, not upgrade /sbin/chkconfig --del pacemaker || : fi ---- and --- %preun remote /sbin/service pacemaker_remote stop &>/dev/null || : if [ $1 -eq 0 ]; then # Package removal, not upgrade /sbin/chkconfig --del pacemaker_remote || : fi --- These cause pacemaker cluster to unconditionally stop in case of package udpate. Correct fix is to move service stop command inside if check.