Hide Forgot
Description of problem: fence_xvm and fence_virtd both have options for setting hashes for verification. These (according to man pages) should include "none", sha1, sha256 and sha512. I'm setting them in pairs sha256/sha256 for auth and signing, and the sha256 seems to be the only one working. More, there are some combinations which I'd expect to fail. For example, setting sha256/sha256 in fence_virtd and sha256(auth)/sha512 signing in fence_xvm. And last but not least, one combination (sha256/sha256 in fence_virtd, (sha256(auth)/sha512)) fails when fence_xvm is called manually (Remote failed challenge) but it succeeds via fence_node (i.e. when cluster decides to fence the node). This is the worst case, as we can't pretend something was fenced, it must really be dead :). Version-Release number of selected component (if applicable): fence-virt-0.2.1-8.el6.x86_64 How reproducible: 100% Steps to Reproduce (auth hash first): 1. setup correct pairs on both sides and see that it does not work 2. setup fence_virtd with sha256/sha256 and fence_xvm with sha256/sha512 and see it rebooting the domain 3. setup fence virtd with sha256/sha256 and see fence_node returning success while the domain is not rebooted Actual results: see above Expected results: 1. matching pairs should always work 2. fencing with different signing key is not supposed to work (maybe just my expectation differs here, please clarify) 3. return success if and only if the domain is really rebooting Additional info: ######### host node fence_virt.conf ########## [root@marathon-03:~]$ cat /etc/fence_virt.conf fence_virtd { listener = "multicast"; backend = "libvirt"; } listeners { multicast { key_file = "/etc/cluster/fence_xvm.key"; address = "225.0.0.12"; hash="sha256"; auth="sha256"; interface="virbr1"; } } backends { libvirt { uri = "qemu:///system"; } } ########## virtual node01 cluster.conf ############# [root@node01:~]$ cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="9" name="STSRHTS14642"> <cman/> <fence_daemon clean_start="0" post_join_delay="20"/> <clusternodes> <clusternode name="node01" nodeid="1" votes="1"> <fence> <method name="virt"> <device domain="node01" name="xvm"/> </method> </fence> </clusternode> <clusternode name="node02" nodeid="2" votes="1"> <fence> <method name="virt"> <device domain="node02" name="xvm"/> </method> </fence> </clusternode> <clusternode name="node03" nodeid="3" votes="1"> <fence> <method name="virt"> <device domain="node03" name="xvm"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_xvm" auth="sha512" hash="sha512" key_file="/etc/cluster/fence_xvm.key" name="xvm" timeout="5"/> </fencedevices> </cluster> fence_virtd started as: fence_virtd -F -d 99 -f /etc/fence_virt.conf
fake fencing example: $ fence_virtd -F -d 99 -f /etc/fence_virt.conf Background mode disabled Using /etc/fence_virt.conf Debugging threshold is now 99 backends { libvirt { uri = "qemu:///system"; } } listeners { multicast { interface = "virbr1"; auth = "sha256"; hash = "sha256"; address = "225.0.0.12"; key_file = "/etc/cluster/fence_xvm.key"; } } fence_virtd { debug = "99"; backend = "libvirt"; listener = "multicast"; } Backend plugin: libvirt Listener plugin: multicast Searching /usr/lib64/fence-virt for plugins... Searching for plugins in /usr/lib64/fence-virt Loading plugin from /usr/lib64/fence-virt/multicast.so Failed to map backend_plugin_version Registered listener plugin multicast 1.0 Loading plugin from /usr/lib64/fence-virt/checkpoint.so Registered backend plugin checkpoint 0.8 Loading plugin from /usr/lib64/fence-virt/libvirt.so Registered backend plugin libvirt 0.1 3 plugins found Available backends: checkpoint 0.8 libvirt 0.1 Available listeners: multicast 1.0 Debugging threshold is now 99 Using qemu:///system Debugging threshold is now 99 Got /etc/cluster/fence_xvm.key for key_file Got sha256 for hash Got sha256 for auth Got 225.0.0.12 for address Got virbr1 for interface Reading in key file /etc/cluster/fence_xvm.key into 0x206eb80 (4096 max size) Stopped reading @ 17 bytes Actual key length = 17 bytes Setting up ipv4 multicast receive (225.0.0.12:1229) Joining multicast group ipv4_recv_sk: success, fd = 6 Request 2 seqno 776444 domain node02 Plain TCP request ipv4_connect: Connecting to client ipv4_connect: Success; fd = 7 Hash mismatch: C = 8f5a3ddc9bdaff8440cb8f2c0482bc3b2f3e0be7a9bbdfbab6dc0097ab1e0739f383934161f463ccbd4785ebacaad8aee418c53204d10353906599158949e5b9 H = b8761d8db7bbb78c196d18c274517756ac289962a7c20a836a041f23b02982490000000000000000000000000000000000000000000000000000000000000000 R = 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Remote failed challenge Could call back for fence request: Connection reset by peer Manual attempt: $ fence_xvm -H node02 -C sha512 -c sha512 -o reboot -t 10 -dddd -k /etc/cluster/fence_xvm.key Debugging threshold is now 4 -- args @ 0x7fff24072060 -- args->domain = node02 args->op = 2 args->net.key_file = /etc/cluster/fence_xvm.key args->net.hash = 3 args->net.addr = 225.0.0.12 args->net.auth = 3 args->net.port = 1229 args->net.ifindex = 0 args->net.family = 2 args->timeout = 10 args->retr_time = 20 args->flags = 0 args->debug = 4 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0x7fff24070f30 (4096 max size) Stopped reading @ 17 bytes Actual key length = 17 bytes Adding IP 127.0.0.1 to list (family 2) Adding IP 192.168.122.116 to list (family 2) Adding IP 192.168.100.101 to list (family 2) ipv4_listen: Setting up ipv4 listen socket ipv4_listen: Success; fd = 3 Setting up ipv4 multicast send (225.0.0.12:1229) Joining IP Multicast group (pass 1) Joining IP Multicast group (pass 2) Setting TTL to 2 for fd4 ipv4_send_sk: success, fd = 4 Opening /dev/urandom Sending to 225.0.0.12 via 127.0.0.1 Setting up ipv4 multicast send (225.0.0.12:1229) Joining IP Multicast group (pass 1) Joining IP Multicast group (pass 2) Setting TTL to 2 for fd4 ipv4_send_sk: success, fd = 4 Opening /dev/urandom Sending to 225.0.0.12 via 192.168.122.116 Setting up ipv4 multicast send (225.0.0.12:1229) Joining IP Multicast group (pass 1) Joining IP Multicast group (pass 2) Setting TTL to 2 for fd4 ipv4_send_sk: success, fd = 4 Opening /dev/urandom Sending to 225.0.0.12 via 192.168.100.101 Waiting for connection from XVM host daemon. Issuing TCP challenge Hash mismatch: C = b3f9592dc1322003f9dffc01e18d30bec7d0a4cffe5b7e86966d85fd06eb994280c9c603a2729d120fdd587261c7e2ac23179ec4d4ba285c35b3dcc5c4e1a0e2 H = 335822bbe84f125c9e2951e8cc090d9585d053c878085441a5844fd38a0db14cb4f8660ae5dfaa99abd4d0fda6448c5237a6d789151d3836050cc073c1a0b4b3 R = 762029f2195976077950a59d5ba12e56f318aaa51441d8ce2f81841d7e013d900000000000000000000000000000000000000000000000000000000000000000 Invalid response to challenge (12:46:02) [root@node01:~]$ echo $? 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Maybe this is the problem right here! $ cat /etc/cluster/cluster.conf | grep fence_xvm <fencedevice agent="fence_xvm" auth="sha512" hash="sha512" key_file="/etc/cluster/fence_xvm.key" name="xvm" timeout="5"/> $ fence_node node02 fence node02 success (fence_virtd output as above) And now fence_node:
fence_xvm must not return success on hash mismatches.
Upstream fix: http://fence-virt.git.sourceforge.net/git/gitweb.cgi?p=fence-virt/fence-virt;a=commit;h=19858bfbded34cb9923d67d572c83244d2e567d7
With fix from upstream: Hash mismatch: C = 10c57b3860e26cd131c66c78c6521efa15525f57b77144d2849c247267ca98164e353406fa179cce36f732d872548532c5517e0de9c9c042fbb9105e0d3093b1 H = 32ba316f796adaf839e2954aaf50ddedada9b60f0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 R = 4d4dc594d49f77ad540d60191ef99a3d6a346ac297f0c155cdeb1ab17046cd370000000000000000000000000000000000000000000000000000000000000000 Invalid response to challenge Operation failed [root@ayanami client]# echo $? 1
Created attachment 516527 [details] Full logs from test run
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: A hash handling mismatch. Consequence: False success for fencing in some cases. This has the potential to cause data corruption in live-hang scenarios. Fix: Correct hash handling mismatch. Result: No more false successes for fencing, thereby preserving data integrity.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1566.html