Bug 1009341

Summary: clvmd no longer works when nodes are offline
Product: Red Hat Enterprise Linux 7 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: urgent    
Priority: urgent CC: agk, cmarthal, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac
Version: 7.0Keywords: Regression, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.102-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 10:50:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lvremove -vvvv on rhel6
none
lvremove -vvvv on rhel7
none
clvmd debugging logs from node1 (node2 was poweroff)
none
rhel7 logs with syslog=1 loglevel debug
none
another attempt to capture logs none

Description Fabio Massimo Di Nitto 2013-09-18 09:02:09 UTC
Description of problem:

Regression from RHEL6, makes it impossible to use clvmd in rhel7 when cluster nodes are offline.

Version-Release number of selected component (if applicable):

lvm2-cluster-2.02.99-1.el7.x86_64

How reproducible:

always

Steps to Reproduce:
1. start a 2 node cluster with clvmd and create a simple clustered vg/lv
2. clean shutdown one node (poweroff is fine)
   [root@rhel7-ha-node2 ~]# systemctl stop corosync
   (for example)
3. verify cluster node has left the membership (important bit)
4. try to remove the clustered lv.

Actual results:

[root@rhel7-ha-node1 ~]# lvremove /dev/cluster_vg/cluster_lv 
Do you really want to remove active clustered logical volume cluster_lv? [y/n]: y
  cluster request failed: Host is down
  Unable to deactivate logical volume "cluster_lv"
  cluster request failed: Host is down

lv is not removed

Expected results:

Similar behaviour as rhel6:


[root@rhel6-ha-node1 ~]# lvremove /dev/cluster_vg/cluster_lv 
Do you really want to remove active clustered logical volume cluster_lv? [y/n]: y
  Logical volume "cluster_lv" successfully removed

Additional info:

Comment 3 Fabio Massimo Di Nitto 2013-09-18 10:45:05 UTC
on Agk request:

it's not a regression in 6.5, it's a regression observed between rhel6.* and rhel7.

lvm.conf is default with the only exception of locking_type set to 3.

Comment 4 Fabio Massimo Di Nitto 2013-09-18 11:12:45 UTC
Created attachment 799327 [details]
lvremove -vvvv on rhel6

lvremove -vvvv on rhel6

Comment 5 Fabio Massimo Di Nitto 2013-09-18 11:13:18 UTC
Created attachment 799328 [details]
lvremove -vvvv on rhel7

lvremove -vvvv on rhel7

Comment 6 Fabio Massimo Di Nitto 2013-09-18 11:14:16 UTC
rhel7 is with latest nightly build of lvm2 lvm2-cluster-2.02.101-0.157.el7.x86_64

Comment 7 Fabio Massimo Di Nitto 2013-09-18 11:22:23 UTC
Created attachment 799333 [details]
clvmd debugging logs from node1 (node2 was poweroff)

Comment 8 Alasdair Kergon 2013-09-18 11:31:38 UTC
lvm client side:

Successful case:

#locking/cluster_locking.c:502       Locking LV EQ4qhf7TgdAMeBaCOgZ0M57mqiIBTXIEhUdwleLadJmtgkYMEFu0Doqrw7k9OsAb NL (LV|NONBLOCK|CLUSTER) (0x98)


Failure case:

#locking/cluster_locking.c:502       Locking LV yDC7vdTMn3TGdEdEBGD3DPBcTFzHdR0tnKBwNY62WULjrIf9fUZ6vvFvcSb7gwO7 NL (LV|NONBLOCK|CLUSTER) (0x98)
#locking/cluster_locking.c:161   cluster request failed: Host is down

Comment 9 Fabio Massimo Di Nitto 2013-09-18 13:02:34 UTC
Created attachment 799356 [details]
rhel7 logs with syslog=1 loglevel debug

Comment 10 Fabio Massimo Di Nitto 2013-09-18 13:29:08 UTC
Created attachment 799383 [details]
another attempt to capture logs

Comment 13 Christine Caulfield 2013-09-23 13:02:16 UTC
commit 431eda63cc0ebff7c62dacb313cabcffbda6573a
Author: Christine Caulfield <ccaulfie>
Date:   Mon Sep 23 13:23:00 2013 +0100

    clvmd: Fix node up/down handing in corosync module

Comment 14 Alasdair Kergon 2013-09-23 15:03:18 UTC
In release 2.02.102.

Comment 16 Nenad Peric 2013-11-20 15:19:18 UTC
As long as cluster is quorate, there are no issues removing the clustered LV.
tested and verified with lvm2-2.02.103-5.el7

[root@virt-002 pacemaker]# lvremove clustered/mirror
Do you really want to remove active clustered logical volume mirror? [y/n]: y
  Logical volume "mirror" successfully removed


[root@virt-002 pacemaker]# pcs status
Cluster name: STSRHTS10638
Last updated: Wed Nov 20 15:20:31 2013
Last change: Wed Nov 20 14:41:21 2013 via cibadmin on virt-002.cluster-qe.lab.eng.brq.redhat.com
Stack: corosync
Current DC: virt-002.cluster-qe.lab.eng.brq.redhat.com (1) - partition with quorum
Version: 1.1.10-20.el7-368c726
3 Nodes configured
1 Resources configured


Online: [ virt-002.cluster-qe.lab.eng.brq.redhat.com ]
OFFLINE: [ virt-003.cluster-qe.lab.eng.brq.redhat.com virt-004.cluster-qe.lab.eng.brq.redhat.com ]

Comment 17 Ludek Smid 2014-06-13 10:50:18 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.