492828 – RFE: priorities for services/virtual machines

Bug 492828 - RFE: priorities for services/virtual machines

Summary: RFE: priorities for services/virtual machines

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rgmanager
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	714671 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-30 08:16 UTC by Marc Grimme
Modified:	2018-10-20 04:04 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-02 11:05:03 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Add priority attribute to the service definition (565 bytes, patch) 2009-04-03 05:56 UTC, Marc Grimme	no flags	Details \| Diff
Optional patch for default_event_script.sl (5.96 KB, patch) 2009-04-03 05:57 UTC, Marc Grimme	no flags	Details \| Diff
priority_service.sl is the standalone implementation (6.61 KB, text/plain) 2009-04-03 05:59 UTC, Marc Grimme	no flags	Details
Patch against current rhel5 branch (6.46 KB, patch) 2009-04-29 17:29 UTC, Lon Hohberger	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1339	0	normal	SHIPPED_LIVE	Low: rgmanager security, bug fix, and enhancement update	2009-09-01 10:42:29 UTC

Description Marc Grimme 2009-03-30 08:16:02 UTC

Description of problem:
It might be useful when having multiple services/virtual machines under clustercontrol to fail them over in case a node goes down in a userdefined ordered way.

That means that those services that are more critical get failed over when more resources are available.

This usecase can best be described with virtual machines but could also be extended to services.

Let's say you have a cluster of two nodes with different virtual machines running on each node. When one machine goes down you will be sure that those virtual machine get start that are the most important ones. The others should only be successfully failed over when enough resources are still available.

Version-Release number of selected component (if applicable):
n.a.

How reproducible:
When having multiple vms it is not predictable in what order they are failed over in case of problems.

Steps to Reproduce:
1.
2.
3.

Actual results:
The fail over behavior is not persistently predictable

Expected results:

Additional info:
As I have a customer who is requesting that feature and as I'm aware of the RIND implementation for rgmanager. I think it can "easily" be implemented with extending the actual fail over policy.

My idea would be to being able to give every services/virtual machine a priority attribute and in case of NODE_DOWN event fail over the services in a that order (lowerst priority first).

As I have to implement it nevertheless I wanted to make this feature officially available. And discuss if you think this could be a good way.

Marc.

Comment 1 Marc Grimme 2009-03-30 08:17:24 UTC

Lon, as you are on this bug what do you think?

Regards Marc.

Comment 2 Lon Hohberger 2009-03-30 13:15:44 UTC

In the simplest case, we could add an attribute to the service and sort based on this attribute before running through the service list in the event handler(s).

Comment 3 Marc Grimme 2009-03-30 13:56:50 UTC

Yes that was my idea as well. I'll add the patch when I'm done. Ok?

Comment 4 Lon Hohberger 2009-04-02 17:19:10 UTC

Sure. :)

Comment 5 Marc Grimme 2009-04-03 05:54:02 UTC

So this is my first try. I've tested it with two nodes and different services. With and without priorities. When I trigger a NODE_EVENT the services get failed over in an ordered manner. If no priority is specified the service list stays constant as before.

The idea is as if no priority is specified 0 is supposed. The services are ordered with lowest priority first. This means if no priority is specified this service/vm will always get "highest" priority.

The relevant parts of the cluster.conf look as follows:

  <rm central_processing="1" log_facility="local4" log_level="8">
    <events>
       <event name="node" class="node">
          notice("Event node triggered!");
          evalfile("/usr/local/cluster/priority_services.sl");
       </event>
    </events>
    <failoverdomains>
       <failoverdomain name="all">
          <failoverdomainnaode name="axqa03-1" priority="1"/>
          <failoverdomainnaode name="axqa03-1" priority="1"/>
       </failoverdomain>
    </failoverdomains>
    <service name="test1" domain="all" autostart="0" priority="5">
       <script name="/usr/local/test/test1.sh"/>
    </service>
    <service name="test2" domain="all" autostart="0" priority="4">
       <script name="/usr/local/test/test2.sh"/>
    </service>
    <service name="test3" domain="all" autostart="0" priority="3">
       <script name="/usr/local/test/test3.sh"/>
    </service>
    <service name="test4" domain="all" autostart="0" priority="2">
       <script name="/usr/local/test/test4.sh"/>
    </service>
    <service name="test5" domain="all" autostart="0" priority="1">
       <script name="/usr/local/test/test5.sh"/>
    </service>
    <vm name="axqad101_2" path="/etc/xen" domain="all" autostart="0"/>
    <resources/>
  </rm>

Patches follow.

Comment 6 Marc Grimme 2009-04-03 05:56:05 UTC

Created attachment 337979 [details]
Add priority attribute to the service definition

This patch adds the priority attribute to the /usr/share/cluster/service.sh in order to make it available to rgmanager and the service_property slang function.

Comment 7 Marc Grimme 2009-04-03 05:57:43 UTC

Created attachment 337980 [details]
Optional patch for default_event_script.sl

This patch is optional. If you want to make this concept available to the default behavior you can apply this to the default_event_script.sl.

Comment 8 Marc Grimme 2009-04-03 05:59:26 UTC

Created attachment 337981 [details]
priority_service.sl is the standalone implementation

This file can be used as stand alone implementation of this concept. As described in previous Comment.

Comment 9 Marc Grimme 2009-04-03 06:00:57 UTC

If you use this implementation in default_event_script.sl you might want to have a relevant cluster.conf part as follows:

  <rm central_processing="1" log_facility="local4" log_level="8">
    <failoverdomains>
       <failoverdomain name="all">
          <failoverdomainnaode name="axqa03-1" priority="1"/>
          <failoverdomainnaode name="axqa03-1" priority="1"/>
       </failoverdomain>
    </failoverdomains>
    <service name="test1" domain="all" autostart="0" priority="5">
       <script name="/usr/local/test/test1.sh"/>
    </service>
    <service name="test2" domain="all" autostart="0" priority="4">
       <script name="/usr/local/test/test2.sh"/>
    </service>
    <service name="test3" domain="all" autostart="0" priority="3">
       <script name="/usr/local/test/test3.sh"/>
    </service>
    <service name="test4" domain="all" autostart="0" priority="2">
       <script name="/usr/local/test/test4.sh"/>
    </service>
    <service name="test5" domain="all" autostart="0" priority="1">
       <script name="/usr/local/test/test5.sh"/>
    </service>
    <vm name="axqad101_2" path="/etc/xen" domain="all" autostart="0"/>
    <resources/>
  </rm>

What do you think?

Marc.

Comment 10 Lon Hohberger 2009-04-03 13:11:48 UTC

Patch nuked freeze/unfreeze:

@@ -296,15 +434,7 @@
 
 		ret = service_stop(service_name);
 
-	} else if (user_request == USER_FREEZE) {
-
-		ret = service_freeze(service_name);
-
-	} else if (user_request == USER_UNFREEZE) {
-
-		ret = service_unfreeze(service_name);
-
-	}
+	} 
 
 	%
 	% todo - migrate

Aside from that, I'd say it's pretty good.  I'll try to test it today.

Comment 11 Lon Hohberger 2009-04-09 21:28:43 UTC

I haven't tested this yet due to other priorities.  My apologies.  I will test it as soon as I return from vacation.

Comment 12 Lon Hohberger 2009-04-29 17:16:19 UTC

I:

* added back the bits relating to FREEZE/UNFREEZE)

* changed the description of the 'priority' field to the following to note that it only has an effect with central_processing turned on:

   Priority for the service.  In a failover scenario, this
   indicates the ordering of the service (1 is processed
   first, 2 is processed second, etc.).  This overrides the
   order presented in cluster.conf.  This option only has
   an effect if central processing within rgmanager is turned
   on.

* changed <content type="string"... to <content type="integer"... in the new service attribute

Testing went fine.  Note that administrators can achieve the same goal by sorting the services the way they want in cluster.conf directly.

Comment 13 Lon Hohberger 2009-04-29 17:29:56 UTC

Created attachment 341796 [details]
Patch against current rhel5 branch

Comment 14 Lon Hohberger 2009-05-21 14:40:58 UTC

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=4d9b91ea4c230c9e10d0e510a68b3e3898132de7

Comment 18 errata-xmlrpc 2009-09-02 11:05:03 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1339.html

Comment 21 Lon Hohberger 2011-06-21 13:43:05 UTC

*** Bug 714671 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.