Bug 714881 - OpenAis hangs clvmd deamon when the connection is lost to cluster.
Summary: OpenAis hangs clvmd deamon when the connection is lost to cluster.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: lvm2-cluster
Version: 4
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: LVM and device-mapper development team
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-21 08:01 UTC by Martin
Modified: 2011-10-21 16:35 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-21 16:35:21 UTC


Attachments (Terms of Use)
Configuration (7.15 KB, text/plain)
2011-06-21 08:01 UTC, Martin
no flags Details

Description Martin 2011-06-21 08:01:53 UTC
Created attachment 505774 [details]
Configuration

Description of problem:

 Hi All. This will be a little long one but hope that you will gat the point. I have a testing environment that i am working on. 4 Nodes with latest packages of RH cluster suite. I am testing clvmd behavior on different situations. Each of the nodes are hosting 4 virtual machines made on kvm. Now the problem occures. When one node loses network connection the openais reaches it's timeout and tryes to fance the node. I have a manual fencing set up because i do not want that node to go down as i have to
 migrate those vms to another node. The problem is that when ais token times out hole clvmd hangs. When i try to run fance_ack_manual it wont allow me to fence that node as it shows that fifo file does not exists. It works when i put the node down but not when it is running. 
 I want to tell the cluster that this node is out of the cluster but not to power it down.


Version-Release number of selected component (if applicable):

Linux 2.6.18-238.12.1.el5 #1 SMP Tue May 31 13:22:04 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

lvm2-2.02.74-5.el5_6.1
lvm2-cluster-2.02.74-3.el5_6.1


How reproducible:

Always

Steps to Reproduce:

1. Got working cluster with 4 nodes participating. 
2. Cutting off network connection on one of the nodes. 
3. Clvmd hangs.
  
Actual results:
On working nodes: 
service clvmd status
clvmd (pid  6527) is running...

And it hangs at this point.

On cut off node:
clvmd dead but pid file exists


Expected results:
 service clvmd status
clvmd (pid  6527) is running...
Clustered Volume Groups: vmvg isovg cfgvg
Active clustered Logical Volumes: /dev/vmvg/shittest /dev/isovg/isolv /dev/cfgvg/cfglv

Additional info:
I would like to manually fance this cut off node with fence_ack_manual. The clvmd unhangs after node reboot. I do not want to reboot it just to cut it off from cluster and then rejoin it. One more. When i cut off the node from network the vms still working fine, it is just the clvmd not showing anything. Just hangs.

Comment 1 Martin 2011-06-21 08:27:46 UTC
Hi again.

I have done some further testing. After cutting off network on node4 i was able to fance_ack_manual the node4 but the problem with clvmd still persists on this node. 

Then i re-enabled the network and used cman_tool join -c V5 and it joined.

Other nodes reported clustered volumes without a problem but node4 printed out this:

service clvmd status
clvmd dead but pid file exists

 vgs
  connect() failed on local socket: Connection refused
  Internal cluster locking initialisation failed.
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  Skipping clustered volume group vmvg
  Skipping clustered volume group isovg
  Skipping clustered volume group cfgvg
  VG    #PV #LV #SN Attr   VSize  VFree
  sysvg   1   2   0 wz--n- 11.06G    0

lvs
  connect() failed on local socket: Connection refused
  Internal cluster locking initialisation failed.
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  Skipping clustered volume group vmvg
  Skipping volume group vmvg
  Skipping clustered volume group isovg
  Skipping clustered volume group cfgvg
  LV     VG    Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  rootlv sysvg -wi-ao 10.06G
  swaplv sysvg -wi-ao  1.00G

 pvs
  connect() failed on local socket: Connection refused
  Internal cluster locking initialisation failed.
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  Skipping clustered volume group vmvg
  Skipping volume group vmvg
  Skipping clustered volume group isovg
  Skipping volume group isovg
  Skipping clustered volume group cfgvg
  Skipping volume group cfgvg
  PV                  VG    Fmt  Attr PSize  PFree
  /dev/mpath/mpath0p2 sysvg lvm2 a-   11.06G    0

Comment 2 Milan Broz 2011-10-21 16:35:21 UTC
Manual fencing is not supported way of fencing in RHEL cluster.

Please note that Red Hat Bugzilla is not an avenue for technical assistance or
support, but simply a bug tracking system. As such there are no service level
agreements or other guarantees associated with defects reported in Bugzilla.

If you have active support entitlements for the systems mentioned in this
report please file a technical support case with Red Hat Global Support
Services either via your normal support representative or via the customer
portal located at the following URL:

  https://access.redhat.com/support/

This will enable a Red Hat technical support engineer to follow up on the
problems reported here directly.


Note You need to log in before you can comment on or make changes to this bug.