Bug 212121 - rgmanager stops the resources in wrong order
Summary: rgmanager stops the resources in wrong order
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
: 231411 (view as bug list)
Depends On: 232139
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-10-25 07:57 UTC by Falk Hackenberger
Modified: 2009-04-16 20:21 UTC (History)
5 users (show)

Fixed In Version: RHBA-2007-0149
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-10 21:19:12 UTC
Embargoed:


Attachments (Terms of Use)
Patch. (8.12 KB, patch)
2007-03-08 22:28 UTC, Lon Hohberger
no flags Details | Diff
Patch (21.88 KB, patch)
2007-03-19 18:29 UTC, Lon Hohberger
no flags Details | Diff
Incremental patch against 150405 which fixes incorrect start problem (1.64 KB, text/x-patch)
2007-03-22 22:14 UTC, Lon Hohberger
no flags Details
Incremental patch which fixes the following case (776 bytes, patch)
2007-05-02 22:37 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0149 0 normal SHIPPED_LIVE rgmanager bug fix update 2007-05-10 21:16:41 UTC

Description Falk Hackenberger 2006-10-25 07:57:01 UTC
the cluster.conf is:
...
<resources>
...
 <fs device="/dev/data/mt-daten" force_fsck="0" force_unmount="1" fstype="ext3"
mountpoint="/exports.smb/mt-daten" name="mt-daten" options="acl" self_fence="1"/>
 <fs device="/dev/data/zMuell" force_fsck="0" force_unmount="1" fsid="17217" fsty
pe="ext3" mountpoint="/exports.smb/mt-daten/zMuell" name="zMuell" options="acl"
self_fence="1"/>
...
</resources>
<service autostart="1" domain="storage" exclusive="1" name="storage"
recovery="restart">
...
<fs ref="mt-daten"/>
<fs ref="zMuell"/>
...
</service>
...


if I stop the rgmanger he try
to umount <fs ref="mt-daten"/> before he umounts <fs ref="zMuell"/>
that is not posible.
so he reboot the host.

the correct behavior is to umount <fs ref="zMuell"/> before <fs ref="mt-daten"/>

if he starts the rgmanager do the rigth thing:
he mounts <fs ref="mt-daten"/> befor he umounts <fs ref="zMuell"/>

Comment 1 Lon Hohberger 2006-10-25 15:58:32 UTC
The ordering is currently not guaranteed for a list of like-typed resources at
this point.  If you have an ordering dependency between two <fs> resources, the
way to guarantee it (right now) is:

   <service>
     <fs name="foo">
       <fs name="bar"/>
     </fs>
   </service>

If you structure your service this way, bar will always be started after foo but
stopped before foo.

Now, the historical reason for this non-guarantee was the idea that it might be
possible in the future to branch during starting/stopping of complex services -
i.e. perform operations on multiple non-codependent resources in parallel.  For
example, consider a service where two non-codependent scripts are needed which,
although not I/O or CPU intensive, each take five minutes to complete:

  <service>
    <script name="foo"/>
    <script name="bar"/>
  </service>

We could start foo and bar simultaneously, saving just about 5 minutes. 
However, the actual, *practical* use of this is very limited.  More importantly,
however, is the fact that implementation of this functionality is very likely
destabilizing.  Additionally, it would very probably break existing
start-ordering behaviors upon which, no doubt, people have already developed an
expectency.

Additionally, the practical uses of having implicit ordering guarantees vastly
exceed the theoretical "performance gain" which might (at some point) have been
attained by starting resources in parallel.

Therefore, I think we should implement implicit ordering guarantees as described.

Comment 4 Lon Hohberger 2007-01-30 14:54:23 UTC
Falk,

Did you intentionally file this against RHCS 5, or was it supposed to be against
RHCS4?

Comment 5 Falk Hackenberger 2007-01-30 15:23:35 UTC
wrong version you are rigth... correctet now

Comment 6 Lon Hohberger 2007-02-01 16:47:37 UTC
Ok, I know how to fix this, but it requires a surprising amount of code change
to make it work correctly.

Comment 7 Lon Hohberger 2007-03-08 22:28:00 UTC
Created attachment 149651 [details]
Patch.

Comment 8 Lon Hohberger 2007-03-09 16:58:33 UTC
*** Bug 231411 has been marked as a duplicate of this bug. ***

Comment 9 Kiersten (Kerri) Anderson 2007-03-09 19:02:43 UTC
Devel ACK for 4.5.

Comment 11 Lon Hohberger 2007-03-13 22:55:03 UTC
The patch attached only ensures ordering within a given type (i.e. file systems
or scripts).  It does not fix ordering in the case that a user has mixed
resource types, for example:

   <fs name="a"/>
   <script name="1"/>
   <fs name="b"/>
   <script name="2"/>

The patch only ensures that a starts before b (and the reverse on stopping), and
that 1 starts before 2, but it does not ensure that a starts before 1.

Addressing this requires fixing #232139, which is a bug in ccsd

Comment 13 Lon Hohberger 2007-03-13 22:59:38 UTC
rgmanager currently searches for children by known resource types.  In order to
blindly search and discover children based on the content of cluster.conf, it is
required that ccsd return information even if the tag has no child nodes - which
is addressed with this patch: 

https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=149999

Comment 14 Lon Hohberger 2007-03-19 18:29:48 UTC
Created attachment 150405 [details]
Patch

* Includes the functionality of previous patch 149651 (e.g. preserve ordering
by defined resource types).  That is, it allows ordering of <fs/> children of
<service> to be started in their order based in cluster.conf (and stopped in
reverse order).  Example:

<service>
  <ip address="10.1.1.2"/>
  <fs name="a"/>
  <script name="1"/>
  <fs name="b"/>
  <script name="2"/>
</service>

Because scripts are ordered after file systems in the service.sh meta-data (and
IPs are started after fs, but before script), the order of start in this block
becomes a, b, 10.1.1.2, 1, 2; and stop is the reverse (2, 1, 10.1.1.2, b, a).


* Preserves ordering of all undefined child resource types in the order they
appear in cluster.conf.  For example:

<service>
   <ip address="10.1.1.2">
      <fs name="a"/>
      <script name="1"/>
      <fs name="b"/>
      <script name="2"/>
   </ip>
</service>

Because "fs" and "script" are not defined children in the ip.sh meta-data,
their ordering is preserved verbatim.  I.E. on start: ip 10.1.1.2, fs a, script
1, fs b, script 2 in start; exactly reversed on stop (2, b, 1, a, 10.1.1.2).

All defined children's ordering is preserved by type:

Comment 15 Lon Hohberger 2007-03-19 18:31:29 UTC
(scratch the last line of the prev. comment)

Comment 18 Lon Hohberger 2007-03-20 19:47:47 UTC
fixes in CVS

Comment 19 Lon Hohberger 2007-03-22 22:00:52 UTC
Fails QA.  Children of other services are started (incorrectly).

Comment 20 Lon Hohberger 2007-03-22 22:14:41 UTC
Created attachment 150702 [details]
Incremental patch against 150405 which fixes incorrect start problem

Comment 21 Lon Hohberger 2007-03-23 00:08:46 UTC
Incremental patch in CVS (along with automated test cases).

Comment 23 Lon Hohberger 2007-05-02 22:37:46 UTC
Created attachment 153997 [details]
Incremental patch which fixes the following case

	<service ref="test1">
		<script ref="initscript">
			<clusterfs ref="argle"/>
		</script>
		<fs ref="mount1">
			<nfsexport ref="Dummy Export">
				<nfsclient ref="Admin group"/>
				<nfsclient ref="User group"/>
				<nfsclient ref="red"/>
			</nfsexport>
		</fs>
	</service>
	<service ref="test2">
		<script ref="initscript">
			<clusterfs ref="argle"/>
			<ip ref="192.168.1.3"/>
			<fs ref="mount2">
				<nfsexport ref="Dummy Export">
					<nfsclient ref="Admin group"/>
					<nfsclient ref="User group"/>
					<nfsclient ref="red"/>
				</nfsexport>
			</fs>
			<script ref="script2"/>
			<ip ref="192.168.1.4"/>
		</script>
		<script ref="script3"/>
	</service>

With the current code, the clusterfs ref in the test2 service is duplicated due
to the old code which added child types when found.  Since we look for untyped
children explicitly in the new code, adding untyped children would cause the
clusterfs resource to be duplicated.

Comment 26 Red Hat Bugzilla 2007-05-10 21:19:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0149.html



Note You need to log in before you can comment on or make changes to this bug.