Bug 570373
Summary: | fencing virtual machine using fence_xvm/fence_xvmd fails when physical host for the virtual machine goes down | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Bernard Chew <bernardchew> |
Component: | cman | Assignee: | Ryan McCabe <rmccabe> |
Status: | CLOSED WORKSFORME | QA Contact: | Chris Mackowski <cmackows> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | cluster-maint, cmackows, djansa, iannis, jentrena, mnovacek |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-10-11 21:07:31 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 807971 |
Description
Bernard Chew
2010-03-04 03:43:00 UTC
I'm not sure if this is your bug or not, Lon. The way fence_xvmd is designed: 1) store VMs in checkpoints 2) fence_xvm uses multicast, so all hosts receive the fencing request 3) when a fencing request comes in, low node ID reads the checkpoint 4) If the last-known-owner is dead (and fenced), then fence_xvmd on the low node ID responds to the original host with a successful fencing operation. 5) if the last-known-owner is alive (or not fenced), then fence_xvmd does nothing 6) if _we_ receive the packet and _we_ are the owner of the VM, then we take fencing action (virDomainDestroy()) My guess is that (2) is not working: all hosts are not receiving the request. Consequently, can you run: # fence_xvmd -fddddddddddddddddddddd &> fence_xvmd.log on all nodes, reproduce, and upload the log file when the fencing request fails? Sorry for the late reply guys. I can't perform the tests now as these are production servers. But I did run the command before, and the outputs indicated communication between nodes are taking place. With regards to multicast settings, I have the below settings in cluster.conf in both physical host and virtual guest clusters (below is taken from physical host cluster.conf). <clusternode name="node_name.domain_name" nodeid="5" votes="1"> <fence> <method name="1"> <device modulename="" name="drac5"/> </method> </fence> <multicast addr="225.0.0.12" interface="eth3"/> </clusternode> <cman> <multicast addr="225.0.0.12"/> </cman> In addition, a static route for multicast is also added to ensure the traffic goes through the private ethernet interface eth3. Firewall (iptables) is also configured to allow such traffic to pass. Lastly, I remembered when I encountered this situation previously I had to run the below command in one of the remaining virtual guests for the cluster to continue operation; "echo failed_virtual_node_name > /var/run/cluster/fenced_overrride" Regards, Bernard Chew This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. We've attempted unsuccessfully to reproduce this bug many times. |