Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 714881

Summary:

OpenAis hangs clvmd deamon when the connection is lost to cluster.

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Martin <mjakmarcin>

Component:

lvm2-cluster

Assignee:

LVM and device-mapper development team <lvm-team>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

CC:

agk, ccaulfie, dwysocha, edamato, jbrassow, mbroz, prockai

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-10-21 16:35:21 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Configuration	none

Description Martin 2011-06-21 08:01:53 UTC

Created attachment 505774 [details]
Configuration

Description of problem:

 Hi All. This will be a little long one but hope that you will gat the point. I have a testing environment that i am working on. 4 Nodes with latest packages of RH cluster suite. I am testing clvmd behavior on different situations. Each of the nodes are hosting 4 virtual machines made on kvm. Now the problem occures. When one node loses network connection the openais reaches it's timeout and tryes to fance the node. I have a manual fencing set up because i do not want that node to go down as i have to
 migrate those vms to another node. The problem is that when ais token times out hole clvmd hangs. When i try to run fance_ack_manual it wont allow me to fence that node as it shows that fifo file does not exists. It works when i put the node down but not when it is running. 
 I want to tell the cluster that this node is out of the cluster but not to power it down.


Version-Release number of selected component (if applicable):

Linux 2.6.18-238.12.1.el5 #1 SMP Tue May 31 13:22:04 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

lvm2-2.02.74-5.el5_6.1
lvm2-cluster-2.02.74-3.el5_6.1


How reproducible:

Always

Steps to Reproduce:

1. Got working cluster with 4 nodes participating. 
2. Cutting off network connection on one of the nodes. 
3. Clvmd hangs.
  
Actual results:
On working nodes: 
service clvmd status
clvmd (pid  6527) is running...

And it hangs at this point.

On cut off node:
clvmd dead but pid file exists


Expected results:
 service clvmd status
clvmd (pid  6527) is running...
Clustered Volume Groups: vmvg isovg cfgvg
Active clustered Logical Volumes: /dev/vmvg/shittest /dev/isovg/isolv /dev/cfgvg/cfglv

Additional info:
I would like to manually fance this cut off node with fence_ack_manual. The clvmd unhangs after node reboot. I do not want to reboot it just to cut it off from cluster and then rejoin it. One more. When i cut off the node from network the vms still working fine, it is just the clvmd not showing anything. Just hangs.

Comment 1 Martin 2011-06-21 08:27:46 UTC

Hi again.

I have done some further testing. After cutting off network on node4 i was able to fance_ack_manual the node4 but the problem with clvmd still persists on this node.

Then i re-enabled the network and used cman_tool join -c V5 and it joined.

Other nodes reported clustered volumes without a problem but node4 printed out this:

service clvmd status
clvmd dead but pid file exists

vgs
connect() failed on local socket: Connection refused
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group vmvg
Skipping clustered volume group isovg
Skipping clustered volume group cfgvg
VG #PV #LV #SN Attr VSize VFree
sysvg 1 2 0 wz--n- 11.06G 0

lvs
connect() failed on local socket: Connection refused
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group vmvg
Skipping volume group vmvg
Skipping clustered volume group isovg
Skipping clustered volume group cfgvg
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
rootlv sysvg -wi-ao 10.06G
swaplv sysvg -wi-ao 1.00G

pvs
connect() failed on local socket: Connection refused
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group vmvg
Skipping volume group vmvg
Skipping clustered volume group isovg
Skipping volume group isovg
Skipping clustered volume group cfgvg
Skipping volume group cfgvg
PV VG Fmt Attr PSize PFree
/dev/mpath/mpath0p2 sysvg lvm2 a- 11.06G 0

Comment 2 Milan Broz 2011-10-21 16:35:21 UTC

Manual fencing is not supported way of fencing in RHEL cluster.

Please note that Red Hat Bugzilla is not an avenue for technical assistance or
support, but simply a bug tracking system. As such there are no service level
agreements or other guarantees associated with defects reported in Bugzilla.

If you have active support entitlements for the systems mentioned in this
report please file a technical support case with Red Hat Global Support
Services either via your normal support representative or via the customer
portal located at the following URL:

  https://access.redhat.com/support/

This will enable a Red Hat technical support engineer to follow up on the
problems reported here directly.