Description of problem: At nearly the same time as node3 was fenced, clustat was being run on node1 and segfault'd. Version-Release number of selected component (if applicable): 1.9.46-1.3speed Actual results: May 15 19:18:45 sqaone01 kernel: CMAN: removing node sqaone03 from the cluster : Missed too many heartbeats May 15 19:18:45 sqaone01 kernel: clustat[22394]: segfault at 000000000000002a rip 0000003765bb1463 rsp 0000007fbffffa90 error 4 May 15 19:18:46 sqaone01 fenced: sqaone03 not a cluster member after 0 sec post_fail_delay May 15 19:18:46 sqaone01 fenced: fencing node "sqaone03" May 15 19:18:46 sqaone01 fenced: fence "sqaone03" success Expected results: no segfault
Did ccsd die as well?
Nothing else died. The node was still running OK after the segfault.
Thank you -- I will keep looking; so far, I have not been able to reporduce it, so I think it is a timing issue of some sort (eg - getting a member list while cman is handling the transition).
I think the new rgmanager (now using 1.9.46-1.4.2x) has fixed this.