Bug 1009341 - clvmd no longer works when nodes are offline
Summary: clvmd no longer works when nodes are offline
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-18 09:02 UTC by Fabio Massimo Di Nitto
Modified: 2021-09-08 20:47 UTC (History)
10 users (show)

Fixed In Version: lvm2-2.02.102-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-13 10:50:18 UTC
Target Upstream Version:


Attachments (Terms of Use)
lvremove -vvvv on rhel6 (30.39 KB, text/plain)
2013-09-18 11:12 UTC, Fabio Massimo Di Nitto
no flags Details
lvremove -vvvv on rhel7 (12.39 KB, text/plain)
2013-09-18 11:13 UTC, Fabio Massimo Di Nitto
no flags Details
clvmd debugging logs from node1 (node2 was poweroff) (38.18 KB, application/x-bzip)
2013-09-18 11:22 UTC, Fabio Massimo Di Nitto
no flags Details
rhel7 logs with syslog=1 loglevel debug (15.86 KB, application/x-bzip)
2013-09-18 13:02 UTC, Fabio Massimo Di Nitto
no flags Details
another attempt to capture logs (10.58 KB, application/x-bzip)
2013-09-18 13:29 UTC, Fabio Massimo Di Nitto
no flags Details

Description Fabio Massimo Di Nitto 2013-09-18 09:02:09 UTC
Description of problem:

Regression from RHEL6, makes it impossible to use clvmd in rhel7 when cluster nodes are offline.

Version-Release number of selected component (if applicable):

lvm2-cluster-2.02.99-1.el7.x86_64

How reproducible:

always

Steps to Reproduce:
1. start a 2 node cluster with clvmd and create a simple clustered vg/lv
2. clean shutdown one node (poweroff is fine)
   [root@rhel7-ha-node2 ~]# systemctl stop corosync
   (for example)
3. verify cluster node has left the membership (important bit)
4. try to remove the clustered lv.

Actual results:

[root@rhel7-ha-node1 ~]# lvremove /dev/cluster_vg/cluster_lv 
Do you really want to remove active clustered logical volume cluster_lv? [y/n]: y
  cluster request failed: Host is down
  Unable to deactivate logical volume "cluster_lv"
  cluster request failed: Host is down

lv is not removed

Expected results:

Similar behaviour as rhel6:


[root@rhel6-ha-node1 ~]# lvremove /dev/cluster_vg/cluster_lv 
Do you really want to remove active clustered logical volume cluster_lv? [y/n]: y
  Logical volume "cluster_lv" successfully removed

Additional info:

Comment 3 Fabio Massimo Di Nitto 2013-09-18 10:45:05 UTC
on Agk request:

it's not a regression in 6.5, it's a regression observed between rhel6.* and rhel7.

lvm.conf is default with the only exception of locking_type set to 3.

Comment 4 Fabio Massimo Di Nitto 2013-09-18 11:12:45 UTC
Created attachment 799327 [details]
lvremove -vvvv on rhel6

lvremove -vvvv on rhel6

Comment 5 Fabio Massimo Di Nitto 2013-09-18 11:13:18 UTC
Created attachment 799328 [details]
lvremove -vvvv on rhel7

lvremove -vvvv on rhel7

Comment 6 Fabio Massimo Di Nitto 2013-09-18 11:14:16 UTC
rhel7 is with latest nightly build of lvm2 lvm2-cluster-2.02.101-0.157.el7.x86_64

Comment 7 Fabio Massimo Di Nitto 2013-09-18 11:22:23 UTC
Created attachment 799333 [details]
clvmd debugging logs from node1 (node2 was poweroff)

Comment 8 Alasdair Kergon 2013-09-18 11:31:38 UTC
lvm client side:

Successful case:

#locking/cluster_locking.c:502       Locking LV EQ4qhf7TgdAMeBaCOgZ0M57mqiIBTXIEhUdwleLadJmtgkYMEFu0Doqrw7k9OsAb NL (LV|NONBLOCK|CLUSTER) (0x98)


Failure case:

#locking/cluster_locking.c:502       Locking LV yDC7vdTMn3TGdEdEBGD3DPBcTFzHdR0tnKBwNY62WULjrIf9fUZ6vvFvcSb7gwO7 NL (LV|NONBLOCK|CLUSTER) (0x98)
#locking/cluster_locking.c:161   cluster request failed: Host is down

Comment 9 Fabio Massimo Di Nitto 2013-09-18 13:02:34 UTC
Created attachment 799356 [details]
rhel7 logs with syslog=1 loglevel debug

Comment 10 Fabio Massimo Di Nitto 2013-09-18 13:29:08 UTC
Created attachment 799383 [details]
another attempt to capture logs

Comment 13 Christine Caulfield 2013-09-23 13:02:16 UTC
commit 431eda63cc0ebff7c62dacb313cabcffbda6573a
Author: Christine Caulfield <ccaulfie>
Date:   Mon Sep 23 13:23:00 2013 +0100

    clvmd: Fix node up/down handing in corosync module

Comment 14 Alasdair Kergon 2013-09-23 15:03:18 UTC
In release 2.02.102.

Comment 16 Nenad Peric 2013-11-20 15:19:18 UTC
As long as cluster is quorate, there are no issues removing the clustered LV.
tested and verified with lvm2-2.02.103-5.el7

[root@virt-002 pacemaker]# lvremove clustered/mirror
Do you really want to remove active clustered logical volume mirror? [y/n]: y
  Logical volume "mirror" successfully removed


[root@virt-002 pacemaker]# pcs status
Cluster name: STSRHTS10638
Last updated: Wed Nov 20 15:20:31 2013
Last change: Wed Nov 20 14:41:21 2013 via cibadmin on virt-002.cluster-qe.lab.eng.brq.redhat.com
Stack: corosync
Current DC: virt-002.cluster-qe.lab.eng.brq.redhat.com (1) - partition with quorum
Version: 1.1.10-20.el7-368c726
3 Nodes configured
1 Resources configured


Online: [ virt-002.cluster-qe.lab.eng.brq.redhat.com ]
OFFLINE: [ virt-003.cluster-qe.lab.eng.brq.redhat.com virt-004.cluster-qe.lab.eng.brq.redhat.com ]

Comment 17 Ludek Smid 2014-06-13 10:50:18 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.