Bug 1009341 - clvmd no longer works when nodes are offline
clvmd no longer works when nodes are offline
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.0
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Christine Caulfield
Cluster QE
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-18 05:02 EDT by Fabio Massimo Di Nitto
Modified: 2014-06-17 21:19 EDT (History)
10 users (show)

See Also:
Fixed In Version: lvm2-2.02.102-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:50:18 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lvremove -vvvv on rhel6 (30.39 KB, text/plain)
2013-09-18 07:12 EDT, Fabio Massimo Di Nitto
no flags Details
lvremove -vvvv on rhel7 (12.39 KB, text/plain)
2013-09-18 07:13 EDT, Fabio Massimo Di Nitto
no flags Details
clvmd debugging logs from node1 (node2 was poweroff) (38.18 KB, application/x-bzip)
2013-09-18 07:22 EDT, Fabio Massimo Di Nitto
no flags Details
rhel7 logs with syslog=1 loglevel debug (15.86 KB, application/x-bzip)
2013-09-18 09:02 EDT, Fabio Massimo Di Nitto
no flags Details
another attempt to capture logs (10.58 KB, application/x-bzip)
2013-09-18 09:29 EDT, Fabio Massimo Di Nitto
no flags Details

  None (edit)
Description Fabio Massimo Di Nitto 2013-09-18 05:02:09 EDT
Description of problem:

Regression from RHEL6, makes it impossible to use clvmd in rhel7 when cluster nodes are offline.

Version-Release number of selected component (if applicable):

lvm2-cluster-2.02.99-1.el7.x86_64

How reproducible:

always

Steps to Reproduce:
1. start a 2 node cluster with clvmd and create a simple clustered vg/lv
2. clean shutdown one node (poweroff is fine)
   [root@rhel7-ha-node2 ~]# systemctl stop corosync
   (for example)
3. verify cluster node has left the membership (important bit)
4. try to remove the clustered lv.

Actual results:

[root@rhel7-ha-node1 ~]# lvremove /dev/cluster_vg/cluster_lv 
Do you really want to remove active clustered logical volume cluster_lv? [y/n]: y
  cluster request failed: Host is down
  Unable to deactivate logical volume "cluster_lv"
  cluster request failed: Host is down

lv is not removed

Expected results:

Similar behaviour as rhel6:


[root@rhel6-ha-node1 ~]# lvremove /dev/cluster_vg/cluster_lv 
Do you really want to remove active clustered logical volume cluster_lv? [y/n]: y
  Logical volume "cluster_lv" successfully removed

Additional info:
Comment 3 Fabio Massimo Di Nitto 2013-09-18 06:45:05 EDT
on Agk request:

it's not a regression in 6.5, it's a regression observed between rhel6.* and rhel7.

lvm.conf is default with the only exception of locking_type set to 3.
Comment 4 Fabio Massimo Di Nitto 2013-09-18 07:12:45 EDT
Created attachment 799327 [details]
lvremove -vvvv on rhel6

lvremove -vvvv on rhel6
Comment 5 Fabio Massimo Di Nitto 2013-09-18 07:13:18 EDT
Created attachment 799328 [details]
lvremove -vvvv on rhel7

lvremove -vvvv on rhel7
Comment 6 Fabio Massimo Di Nitto 2013-09-18 07:14:16 EDT
rhel7 is with latest nightly build of lvm2 lvm2-cluster-2.02.101-0.157.el7.x86_64
Comment 7 Fabio Massimo Di Nitto 2013-09-18 07:22:23 EDT
Created attachment 799333 [details]
clvmd debugging logs from node1 (node2 was poweroff)
Comment 8 Alasdair Kergon 2013-09-18 07:31:38 EDT
lvm client side:

Successful case:

#locking/cluster_locking.c:502       Locking LV EQ4qhf7TgdAMeBaCOgZ0M57mqiIBTXIEhUdwleLadJmtgkYMEFu0Doqrw7k9OsAb NL (LV|NONBLOCK|CLUSTER) (0x98)


Failure case:

#locking/cluster_locking.c:502       Locking LV yDC7vdTMn3TGdEdEBGD3DPBcTFzHdR0tnKBwNY62WULjrIf9fUZ6vvFvcSb7gwO7 NL (LV|NONBLOCK|CLUSTER) (0x98)
#locking/cluster_locking.c:161   cluster request failed: Host is down
Comment 9 Fabio Massimo Di Nitto 2013-09-18 09:02:34 EDT
Created attachment 799356 [details]
rhel7 logs with syslog=1 loglevel debug
Comment 10 Fabio Massimo Di Nitto 2013-09-18 09:29:08 EDT
Created attachment 799383 [details]
another attempt to capture logs
Comment 13 Christine Caulfield 2013-09-23 09:02:16 EDT
commit 431eda63cc0ebff7c62dacb313cabcffbda6573a
Author: Christine Caulfield <ccaulfie@redhat.com>
Date:   Mon Sep 23 13:23:00 2013 +0100

    clvmd: Fix node up/down handing in corosync module
Comment 14 Alasdair Kergon 2013-09-23 11:03:18 EDT
In release 2.02.102.
Comment 16 Nenad Peric 2013-11-20 10:19:18 EST
As long as cluster is quorate, there are no issues removing the clustered LV.
tested and verified with lvm2-2.02.103-5.el7

[root@virt-002 pacemaker]# lvremove clustered/mirror
Do you really want to remove active clustered logical volume mirror? [y/n]: y
  Logical volume "mirror" successfully removed


[root@virt-002 pacemaker]# pcs status
Cluster name: STSRHTS10638
Last updated: Wed Nov 20 15:20:31 2013
Last change: Wed Nov 20 14:41:21 2013 via cibadmin on virt-002.cluster-qe.lab.eng.brq.redhat.com
Stack: corosync
Current DC: virt-002.cluster-qe.lab.eng.brq.redhat.com (1) - partition with quorum
Version: 1.1.10-20.el7-368c726
3 Nodes configured
1 Resources configured


Online: [ virt-002.cluster-qe.lab.eng.brq.redhat.com ]
OFFLINE: [ virt-003.cluster-qe.lab.eng.brq.redhat.com virt-004.cluster-qe.lab.eng.brq.redhat.com ]
Comment 17 Ludek Smid 2014-06-13 06:50:18 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.