Bug 1030680
| Summary: | depend_mode="soft" is not honoured in vm.sh (also, a typo) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Madison Kelly <mkelly> |
| Component: | rgmanager | Assignee: | Ryan McCabe <rmccabe> |
| Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.4 | CC: | agk, cluster-maint, fdinitto |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-02-28 14:50:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Ryan, did we ever supported depend_mode for VM at all? AFAIR it's only exposed for services. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. RHEL never supported service interdependency. I strongly recommend to use pacemaker to address this issue. |
Description of problem: When a VM server is set to soft-depend on another service, stopping that service causes the VM to stop as well, even it running on another node. According to vm.sh's meta-data: ==== <parameter name="depend_mode"> <longdesc lang="en"> Service dependency mode. hard - This service is stopped/started if its dependency is stopped/started soft - This service only depends on the other service for initial startip. If the other service stops, this service is not stopped. </longdesc> ==== note: s/startip/startup/ Here's the relevant cluster.conf entries; ==== <service autostart="1" domain="only_n01" exclusive="0" name="libvirtd_n01" recovery="restart"> <script ref="libvirtd"/> </service> <service autostart="1" domain="only_n02" exclusive="0" name="libvirtd_n02" recovery="restart"> <script ref="libvirtd"/> </service> <vm autostart="1" depend="service:storage_n01" depend_mode="soft" domain="primary_n01" exclusive="0" max_restarts="2" name="vm01-win2008" path="/shared/definitions/" recovery="restart" restart_expire_time="600"> <action name="stop" timeout="30m"/> </vm> <vm autostart="1" depend="service:storage_n02" depend_mode="soft" domain="primary_n02" exclusive="0" max_restarts="2" name="vm02-win2012" path="/shared/definitions/" recovery="restart" restart_expire_time="600"> <action name="stop" timeout="30m"/> </vm> ==== When I stop 'rgmanager' on 'an-c05n01' while 'vm01-win2008' is running on 'an-c05n02', it shuts down the server, despite 'depend_mode="soft"'. ==== Nov 14 17:43:16 an-c05n01 rgmanager[2963]: Service service:storage_n01 is stopped Nov 14 17:43:16 an-c05n01 rgmanager[2963]: Shutting down Nov 14 17:43:17 an-c05n01 rgmanager[2963]: Disconnecting from CMAN Nov 14 17:43:17 an-c05n01 rgmanager[2963]: Exiting Nov 14 17:43:19 an-c05n01 kernel: dlm: closing connection to node 2 Nov 14 17:43:19 an-c05n01 kernel: dlm: closing connection to node 1 --- Nov 14 17:43:16 an-c05n02 rgmanager[14414]: Member 1 shutting down Nov 14 17:43:17 an-c05n02 rgmanager[14414]: Marking service:storage_n01 as stopped: Restricted domain unavailable Nov 14 17:43:17 an-c05n02 rgmanager[14414]: Marking service:libvirtd_n01 as stopped: Restricted domain unavailable Nov 14 17:43:19 an-c05n02 corosync[14181]: [QUORUM] Members[1]: 2 Nov 14 17:43:19 an-c05n02 corosync[14181]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Nov 14 17:43:19 an-c05n02 corosync[14181]: [CPG ] chosen downlist: sender r(0) ip(10.20.50.2) ; members(old:2 left:1) Nov 14 17:43:19 an-c05n02 corosync[14181]: [MAIN ] Completed service synchronization, ready to provide service. Nov 14 17:43:19 an-c05n02 kernel: dlm: closing connection to node 1 Nov 14 17:43:20 an-c05n02 kernel: vbr2: port 2(vnet0) entering disabled state Nov 14 17:43:20 an-c05n02 kernel: device vnet0 left promiscuous mode Nov 14 17:43:20 an-c05n02 kernel: vbr2: port 2(vnet0) entering disabled state Nov 14 17:43:21 an-c05n02 rgmanager[14414]: Service vm:vm01-win2008 is stopped Nov 14 17:43:22 an-c05n02 ntpd[2318]: Deleting interface #15 vnet0, fe80::fc54:ff:fe8e:6732#123, interface stats: received=0, sent=0, dropped=0, active_time=22 secs ==== This is a pretty big problem, as it makes 'depend' useless. Version-Release number of selected component (if applicable): Fully updated RHEL 6.4. [root@an-c05n01 ~]# rpm -q rgmanager cman resource-agents rgmanager-3.0.12.1-17.el6.x86_64 cman-3.0.12.1-49.el6_4.2.x86_64 resource-agents-3.9.2-21.el6_4.8.x86_64 [root@an-c05n02 ~]# rpm -q rgmanager cman resource-agents rgmanager-3.0.12.1-17.el6.x86_64 cman-3.0.12.1-49.el6_4.2.x86_64 resource-agents-3.9.2-21.el6_4.8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Setup a VM service dependent on another service in a restricted failover domain. 2. Stop rgmanager or disable the dependent service. 3. soft-dependent service stops Actual results: Dependent service stops, despite soft dependency. Expected results: Once the service starts, change in state of the dependent service should no longer impact the running service. Additional info: Full cluster.conf: ==== <?xml version="1.0"?> <cluster config_version="13" name="an-cluster-05"> <cman expected_votes="1" two_node="1"/> <clusternodes> <clusternode name="an-c05n01.alteeve.ca" nodeid="1"> <fence> <method name="ipmi"> <device action="reboot" delay="15" name="ipmi_n01"/> </method> <method name="pdu"> <device action="reboot" name="pdu1" port="1"/> <device action="reboot" name="pdu2" port="1"/> </method> </fence> </clusternode> <clusternode name="an-c05n02.alteeve.ca" nodeid="2"> <fence> <method name="ipmi"> <device action="reboot" name="ipmi_n02"/> </method> <method name="pdu"> <device action="reboot" name="pdu1" port="2"/> <device action="reboot" name="pdu2" port="2"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_ipmilan" ipaddr="an-c05n01.ipmi" login="admin" name="ipmi_n01" passwd="secret"/> <fencedevice agent="fence_ipmilan" ipaddr="an-c05n02.ipmi" login="admin" name="ipmi_n02" passwd="secret"/> <fencedevice agent="fence_apc_snmp" ipaddr="an-p01.alteeve.ca" name="pdu1"/> <fencedevice agent="fence_apc_snmp" ipaddr="an-p02.alteeve.ca" name="pdu2"/> </fencedevices> <fence_daemon post_join_delay="30"/> <totem rrp_mode="none" secauth="off"/> <rm log_level="5"> <resources> <script file="/etc/init.d/drbd" name="drbd"/> <script file="/etc/init.d/clvmd" name="clvmd"/> <script file="/etc/init.d/libvirtd" name="libvirtd"/> <clusterfs device="/dev/an-c05n01_vg0/shared" force_unmount="1" fstype="gfs2" mountpoint="/shared" name="sharedfs"/> </resources> <failoverdomains> <failoverdomain name="only_n01" nofailback="1" ordered="0" restricted="1"> <failoverdomainnode name="an-c05n01.alteeve.ca"/> </failoverdomain> <failoverdomain name="only_n02" nofailback="1" ordered="0" restricted="1"> <failoverdomainnode name="an-c05n02.alteeve.ca"/> </failoverdomain> <failoverdomain name="primary_n01" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="an-c05n01.alteeve.ca" priority="1"/> <failoverdomainnode name="an-c05n02.alteeve.ca" priority="2"/> </failoverdomain> <failoverdomain name="primary_n02" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="an-c05n01.alteeve.ca" priority="2"/> <failoverdomainnode name="an-c05n02.alteeve.ca" priority="1"/> </failoverdomain> </failoverdomains> <service autostart="1" domain="only_n01" exclusive="0" name="storage_n01" recovery="restart"> <script ref="drbd"> <script ref="clvmd"> <clusterfs ref="sharedfs"/> </script> </script> </service> <service autostart="1" domain="only_n02" exclusive="0" name="storage_n02" recovery="restart"> <script ref="drbd"> <script ref="clvmd"> <clusterfs ref="sharedfs"/> </script> </script> </service> <service autostart="1" domain="only_n01" exclusive="0" name="libvirtd_n01" recovery="restart"> <script ref="libvirtd"/> </service> <service autostart="1" domain="only_n02" exclusive="0" name="libvirtd_n02" recovery="restart"> <script ref="libvirtd"/> </service> <vm autostart="1" depend="service:storage_n01" depend_mode="soft" domain="primary_n01" exclusive="0" max_restarts="2" name="vm01-win2008" path="/shared/definitions/" recovery="restart" restart_expire_time="600"> <action name="stop" timeout="30m"/> </vm> <vm autostart="1" depend="service:storage_n02" depend_mode="soft" domain="primary_n02" exclusive="0" max_restarts="2" name="vm02-win2012" path="/shared/definitions/" recovery="restart" restart_expire_time="600"> <action name="stop" timeout="30m"/> </vm> </rm> </cluster> ====