Bug 720767

Summary: fence_xvm + fence_virtd hash-handling mismatch leads to fake fencing
Product: Red Hat Enterprise Linux 6 Reporter: Jaroslav Kortus <jkortus>
Component: fence-virtAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: cluster-maint, djansa, mgrac
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: fence-virt-0.2.3-2.el6 Doc Type: Bug Fix
Doc Text:
Cause: A hash handling mismatch. Consequence: False success for fencing in some cases. This has the potential to cause data corruption in live-hang scenarios. Fix: Correct hash handling mismatch. Result: No more false successes for fencing, thereby preserving data integrity.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:38:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Full logs from test run none

Description Jaroslav Kortus 2011-07-12 17:44:33 UTC
Description of problem:
fence_xvm and fence_virtd both have options for setting hashes for verification. These (according to man pages) should include "none", sha1, sha256 and sha512.

I'm setting them in pairs sha256/sha256 for auth and signing, and the sha256 seems to be the only one working.

More, there are some combinations which I'd expect to fail. For example, setting sha256/sha256 in fence_virtd and sha256(auth)/sha512 signing in fence_xvm.

And last but not least, one combination (sha256/sha256 in fence_virtd, (sha256(auth)/sha512)) fails when fence_xvm is called manually (Remote failed challenge) but it succeeds via fence_node (i.e. when cluster decides to fence the node). This is the worst case, as we can't pretend something was fenced, it must really be dead :).

Version-Release number of selected component (if applicable):
fence-virt-0.2.1-8.el6.x86_64

How reproducible:
100%

Steps to Reproduce (auth hash first):
1. setup correct pairs on both sides and see that it does not work
2. setup fence_virtd with sha256/sha256 and fence_xvm with sha256/sha512 and see it rebooting the domain
3. setup fence virtd with sha256/sha256 and see fence_node returning success while the domain is not rebooted
  
Actual results:
see above

Expected results:
1. matching pairs should always work
2. fencing with different signing key is not supposed to work (maybe just my expectation differs here, please clarify)
3. return success if and only if the domain is really rebooting


Additional info:

######### host node fence_virt.conf ##########
[root@marathon-03:~]$ cat /etc/fence_virt.conf 
fence_virtd {
	listener = "multicast";
	backend = "libvirt";
}

listeners {
	multicast {
		key_file = "/etc/cluster/fence_xvm.key";
		address = "225.0.0.12";
		hash="sha256";
		auth="sha256";
		interface="virbr1";
	}
}

backends {
	libvirt { 
		uri = "qemu:///system";
	}
}


########## virtual node01 cluster.conf #############

[root@node01:~]$ cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster config_version="9" name="STSRHTS14642">
	<cman/>
	<fence_daemon clean_start="0" post_join_delay="20"/>
	<clusternodes>
		<clusternode name="node01" nodeid="1" votes="1">
			<fence>
				<method name="virt">
					<device domain="node01" name="xvm"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="node02" nodeid="2" votes="1">
			<fence>
				<method name="virt">
					<device domain="node02" name="xvm"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="node03" nodeid="3" votes="1">
			<fence>
				<method name="virt">
					<device domain="node03" name="xvm"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice agent="fence_xvm" auth="sha512" hash="sha512" key_file="/etc/cluster/fence_xvm.key" name="xvm" timeout="5"/>
	</fencedevices>
</cluster>


fence_virtd started as:
fence_virtd  -F -d 99 -f /etc/fence_virt.conf

Comment 1 Jaroslav Kortus 2011-07-12 17:48:22 UTC
fake fencing example:

$ fence_virtd  -F -d 99 -f /etc/fence_virt.conf 
Background mode disabled
Using /etc/fence_virt.conf
Debugging threshold is now 99
backends {
	libvirt {
		uri = "qemu:///system";
	}

}

listeners {
	multicast {
		interface = "virbr1";
		auth = "sha256";
		hash = "sha256";
		address = "225.0.0.12";
		key_file = "/etc/cluster/fence_xvm.key";
	}

}

fence_virtd {
	debug = "99";
	backend = "libvirt";
	listener = "multicast";
}

Backend plugin: libvirt
Listener plugin: multicast
Searching /usr/lib64/fence-virt for plugins...
Searching for plugins in /usr/lib64/fence-virt
Loading plugin from /usr/lib64/fence-virt/multicast.so
Failed to map backend_plugin_version
Registered listener plugin multicast 1.0
Loading plugin from /usr/lib64/fence-virt/checkpoint.so
Registered backend plugin checkpoint 0.8
Loading plugin from /usr/lib64/fence-virt/libvirt.so
Registered backend plugin libvirt 0.1
3 plugins found
Available backends:
    checkpoint 0.8
    libvirt 0.1
Available listeners:
    multicast 1.0
Debugging threshold is now 99
Using qemu:///system
Debugging threshold is now 99
Got /etc/cluster/fence_xvm.key for key_file
Got sha256 for hash
Got sha256 for auth
Got 225.0.0.12 for address
Got virbr1 for interface
Reading in key file /etc/cluster/fence_xvm.key into 0x206eb80 (4096 max size)
Stopped reading @ 17 bytes
Actual key length = 17 bytes
Setting up ipv4 multicast receive (225.0.0.12:1229)
Joining multicast group
ipv4_recv_sk: success, fd = 6
Request 2 seqno 776444 domain node02
Plain TCP request
ipv4_connect: Connecting to client
ipv4_connect: Success; fd = 7
Hash mismatch:
C = 8f5a3ddc9bdaff8440cb8f2c0482bc3b2f3e0be7a9bbdfbab6dc0097ab1e0739f383934161f463ccbd4785ebacaad8aee418c53204d10353906599158949e5b9
H = b8761d8db7bbb78c196d18c274517756ac289962a7c20a836a041f23b02982490000000000000000000000000000000000000000000000000000000000000000
R = 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Remote failed challenge
Could call back for fence request: Connection reset by peer




Manual attempt:
$ fence_xvm -H node02 -C sha512 -c sha512 -o reboot -t 10 -dddd -k /etc/cluster/fence_xvm.key
Debugging threshold is now 4
-- args @ 0x7fff24072060 --
  args->domain = node02
  args->op = 2
  args->net.key_file = /etc/cluster/fence_xvm.key
  args->net.hash = 3
  args->net.addr = 225.0.0.12
  args->net.auth = 3
  args->net.port = 1229
  args->net.ifindex = 0
  args->net.family = 2
  args->timeout = 10
  args->retr_time = 20
  args->flags = 0
  args->debug = 4
-- end args --
Reading in key file /etc/cluster/fence_xvm.key into 0x7fff24070f30 (4096 max size)
Stopped reading @ 17 bytes
Actual key length = 17 bytes
Adding IP 127.0.0.1 to list (family 2)
Adding IP 192.168.122.116 to list (family 2)
Adding IP 192.168.100.101 to list (family 2)
ipv4_listen: Setting up ipv4 listen socket
ipv4_listen: Success; fd = 3
Setting up ipv4 multicast send (225.0.0.12:1229)
Joining IP Multicast group (pass 1)
Joining IP Multicast group (pass 2)
Setting TTL to 2 for fd4
ipv4_send_sk: success, fd = 4
Opening /dev/urandom
Sending to 225.0.0.12 via 127.0.0.1
Setting up ipv4 multicast send (225.0.0.12:1229)
Joining IP Multicast group (pass 1)
Joining IP Multicast group (pass 2)
Setting TTL to 2 for fd4
ipv4_send_sk: success, fd = 4
Opening /dev/urandom
Sending to 225.0.0.12 via 192.168.122.116
Setting up ipv4 multicast send (225.0.0.12:1229)
Joining IP Multicast group (pass 1)
Joining IP Multicast group (pass 2)
Setting TTL to 2 for fd4
ipv4_send_sk: success, fd = 4
Opening /dev/urandom
Sending to 225.0.0.12 via 192.168.100.101
Waiting for connection from XVM host daemon.
Issuing TCP challenge
Hash mismatch:
C = b3f9592dc1322003f9dffc01e18d30bec7d0a4cffe5b7e86966d85fd06eb994280c9c603a2729d120fdd587261c7e2ac23179ec4d4ba285c35b3dcc5c4e1a0e2
H = 335822bbe84f125c9e2951e8cc090d9585d053c878085441a5844fd38a0db14cb4f8660ae5dfaa99abd4d0fda6448c5237a6d789151d3836050cc073c1a0b4b3
R = 762029f2195976077950a59d5ba12e56f318aaa51441d8ce2f81841d7e013d900000000000000000000000000000000000000000000000000000000000000000
Invalid response to challenge

(12:46:02) [root@node01:~]$ echo $?
0

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Maybe this is the problem right here!



$ cat /etc/cluster/cluster.conf | grep fence_xvm
		<fencedevice agent="fence_xvm" auth="sha512" hash="sha512" key_file="/etc/cluster/fence_xvm.key" name="xvm" timeout="5"/>
$ fence_node node02
fence node02 success
(fence_virtd output as above)


And now fence_node:

Comment 4 Lon Hohberger 2011-08-02 21:09:16 UTC
fence_xvm must not return success on hash mismatches.

Comment 6 Lon Hohberger 2011-08-03 14:26:57 UTC
With fix from upstream:


Hash mismatch:
C = 10c57b3860e26cd131c66c78c6521efa15525f57b77144d2849c247267ca98164e353406fa179cce36f732d872548532c5517e0de9c9c042fbb9105e0d3093b1
H = 32ba316f796adaf839e2954aaf50ddedada9b60f0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
R = 4d4dc594d49f77ad540d60191ef99a3d6a346ac297f0c155cdeb1ab17046cd370000000000000000000000000000000000000000000000000000000000000000
Invalid response to challenge
Operation failed
[root@ayanami client]# echo $?
1

Comment 7 Lon Hohberger 2011-08-03 14:32:17 UTC
Created attachment 516527 [details]
Full logs from test run

Comment 9 Lon Hohberger 2011-10-26 22:19:29 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: A hash handling mismatch.

Consequence: False success for fencing in some cases.  This has the potential to cause data corruption in live-hang scenarios.

Fix: Correct hash handling mismatch.

Result: No more false successes for fencing, thereby preserving data integrity.

Comment 11 errata-xmlrpc 2011-12-06 11:38:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1566.html