Hide Forgot
Created attachment 505774 [details] Configuration Description of problem: Hi All. This will be a little long one but hope that you will gat the point. I have a testing environment that i am working on. 4 Nodes with latest packages of RH cluster suite. I am testing clvmd behavior on different situations. Each of the nodes are hosting 4 virtual machines made on kvm. Now the problem occures. When one node loses network connection the openais reaches it's timeout and tryes to fance the node. I have a manual fencing set up because i do not want that node to go down as i have to migrate those vms to another node. The problem is that when ais token times out hole clvmd hangs. When i try to run fance_ack_manual it wont allow me to fence that node as it shows that fifo file does not exists. It works when i put the node down but not when it is running. I want to tell the cluster that this node is out of the cluster but not to power it down. Version-Release number of selected component (if applicable): Linux 2.6.18-238.12.1.el5 #1 SMP Tue May 31 13:22:04 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux lvm2-2.02.74-5.el5_6.1 lvm2-cluster-2.02.74-3.el5_6.1 How reproducible: Always Steps to Reproduce: 1. Got working cluster with 4 nodes participating. 2. Cutting off network connection on one of the nodes. 3. Clvmd hangs. Actual results: On working nodes: service clvmd status clvmd (pid 6527) is running... And it hangs at this point. On cut off node: clvmd dead but pid file exists Expected results: service clvmd status clvmd (pid 6527) is running... Clustered Volume Groups: vmvg isovg cfgvg Active clustered Logical Volumes: /dev/vmvg/shittest /dev/isovg/isolv /dev/cfgvg/cfglv Additional info: I would like to manually fance this cut off node with fence_ack_manual. The clvmd unhangs after node reboot. I do not want to reboot it just to cut it off from cluster and then rejoin it. One more. When i cut off the node from network the vms still working fine, it is just the clvmd not showing anything. Just hangs.
Hi again. I have done some further testing. After cutting off network on node4 i was able to fance_ack_manual the node4 but the problem with clvmd still persists on this node. Then i re-enabled the network and used cman_tool join -c V5 and it joined. Other nodes reported clustered volumes without a problem but node4 printed out this: service clvmd status clvmd dead but pid file exists vgs connect() failed on local socket: Connection refused Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Skipping clustered volume group vmvg Skipping clustered volume group isovg Skipping clustered volume group cfgvg VG #PV #LV #SN Attr VSize VFree sysvg 1 2 0 wz--n- 11.06G 0 lvs connect() failed on local socket: Connection refused Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Skipping clustered volume group vmvg Skipping volume group vmvg Skipping clustered volume group isovg Skipping clustered volume group cfgvg LV VG Attr LSize Origin Snap% Move Log Copy% Convert rootlv sysvg -wi-ao 10.06G swaplv sysvg -wi-ao 1.00G pvs connect() failed on local socket: Connection refused Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Skipping clustered volume group vmvg Skipping volume group vmvg Skipping clustered volume group isovg Skipping volume group isovg Skipping clustered volume group cfgvg Skipping volume group cfgvg PV VG Fmt Attr PSize PFree /dev/mpath/mpath0p2 sysvg lvm2 a- 11.06G 0
Manual fencing is not supported way of fencing in RHEL cluster. Please note that Red Hat Bugzilla is not an avenue for technical assistance or support, but simply a bug tracking system. As such there are no service level agreements or other guarantees associated with defects reported in Bugzilla. If you have active support entitlements for the systems mentioned in this report please file a technical support case with Red Hat Global Support Services either via your normal support representative or via the customer portal located at the following URL: https://access.redhat.com/support/ This will enable a Red Hat technical support engineer to follow up on the problems reported here directly.