Description of problem: We'd like to make a feature request for a "restart" option in cluster suite for script resources. The way it is now Cluster Suite can issue “Start”, “Stop” and “Status” options to scripts but no “Restart”. Cluster suite treats restarts as “Stop” and then “Start”. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
There are a couple of things which may exist now which could implement the desired feature: (1) __independent_subtree tag on nodes in a service tree: <script name="my_script" file="/usr/share/do-this" __independent_subtree="1"/> https://bugzilla.redhat.com/show_bug.cgi?id=239594 This allows rgmanager to recover parts of a service without affecting others. If the script or any of its children fail, the service is restarted from the closest parent which has __independent_subtree specified. Note that if a parent fails, its children are *always* restarted in typical order. Examples: <script name="1" __independent_subtree="1"/> <script name="2" __independent_subtree="1"> <script name="2.1" __independent_subtree="1"/> <script name="2.2" __independent_subtree="0"> <script name="2.2.1" __independent_subtree="0"/> </script> </script> <script name="3" __independent_subtree="0"> <script name="3.1" __independent_subtree="1"/> <script name="3.2" __independent_subtree="0"> <script name="3.2.1" __independent_subtree="0"/> </script> </script> On a failure of... ... these are stop/started 1 1 2 2, 2.1, 2.2, 2.2.1 2.1 2.1 2.2 2.1, 2.2, 2.2.1 2.2.1 2.1, 2.2, 2.2.1 3 (*WHOLE SERVICE*) 3.1 3.1 3.2 (*WHOLE SERVICE*) 3.2.1 (*WHOLE SERVICE*) (2) Adding a special "recover" action as a child of the script, for example: <script name="my_script" file="/usr/share/do-this"> <action name="recover"/> </script> Ordinarily, the <script> agent does not provide the "recover" option; only stop/start semantics. Adding it in cluster.conf will cause the script agent to receive the "recover" argument after a status check failure. In our example, it would be the same as doing: /usr/share/do-this recover This differs from __independent_subtree in that if a "recover" action is called, child resources are not affected.
The Oracle 10g failover agent here: http://people.redhat.com/lhh/oracle-rhcs4-notes-0.5/oracle-notes.html ... uses internal check-restarts for some components of the system. That's one other option that's always available.