Created attachment 354249 [details] Fix Description of problem: Using STABLE2 latest release simulating a node failure in a two node cluster results in fencing successful (using impilan agent), but cman status is not updated, resulting in rgmanager waiting for the fence operation to finish (thus services are not relocated). cman_tool -f nodes says that victim node has not been fenced since it went offline, clustat shows the node offline but its services are still in the started state on that node, while in the logs: fenced[9592]: node2 not a cluster member after 0 sec post_fail_delay fenced[9592]: fencing node "node2" fenced[9592]: can't get node number for node <garbage_here> fenced[9592]: fence "node2" success where <garbage_here> are random chars. I tried to trace the problem in the code, and found that in cluster-2.03.11/fence/fenced/agent.c 313 if (ccs_lookup_nodename(cd, victim, &victim_nodename) == 0) 314 victim = victim_nodename; then on line 358 victim_nodename is freed 357 if (victim_nodename) 358 free(victim_nodename); and than update_cman is called with "victim" as node name, failing as the nodeid could not be retrieved (and garbage printed to syslog) 361 if (!error) { 362 update_cman(victim, good_device); 363 break; I admit that I miss why ccs_lookup_nodename returns 0, but delaying the free call after the update_cman call makes everything works, services relocate to the other node and when node2 comes back and rejoins the cluster they migrate back to the original node, as expected. very simple patch (fixes all for me) attached. Version-Release number of selected component (if applicable): cluster-2.03.11 How reproducible: Always Steps to Reproduce: 1. simulate a failure on a node 2. wait for fence to finish 3. Actual results: Cman node status not updated and services relocated Expected results: Cman node status updated and service being relocated Additional info:
Your patch is correct, but this was already fixed in the STABLE2 branch in March: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=aee97b180e80c9f8b90b8fca63004afe3b289962