Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1377342

Summary:

clvmd fails to stop after VG is extended

Product:

Red Hat Enterprise Linux 6

Reporter:

Josef Zimek <pzimek>

Component:

lvm2

Assignee:

Peter Rajnoha <prajnoha>

lvm2 sub component:

Clustering / clvmd (RHEL6)

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED CURRENTRELEASE

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac

Version:

6.6

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-09-22 11:08:36 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
blk-availability debug	none
blk-availability debug node 05	none

Description Josef Zimek 2016-09-19 13:14:28 UTC

Created attachment 1202480 [details]
blk-availability debug

Description of problem:


In 2 node cluster with clvmd after adding new LUN to server, re-scanning scsi, creating PV and extending existing VG to newly added PV, reboot fails - node fails to stop clvmd which causes cluster stac to be running while other daemons are stopping (including networkd). This results in corosync trying to send multicast messaes despite fact that network is already down. This ends in node being fenced instead of gracefully rebooting.


Why clvmd fails to stop after extending VG? All subsequent reboots (after 1st failed reboot) work as expected. 

Observations from testing:

* If order of init scripts is changed that K75blk-availability is run before K76clvmd then reboot works fine even after extending VG.

* When cluster is stopped manually (service <rgmanager, gfs2, clvmd, cman> stop) before rebooting - reboot works fine even after extending VG.


So it looks that extending VG creates some conditions which then affect behaviour of K75blk-availability.




Version-Release number of selected component (if applicable):

RHEL 6.6
lvm2-2.02.111-2.el6.x86_64   
lvm2-cluster-2.02.111-2.el6.x86_64               



How reproducible:
always - in customer's PROD cluster
In customer's test cluster which is running same version of packages as PROD cluster this behaviour is non-reproducible


Steps to Reproduce:
1) extend vg
2) service blk-availability stop
3) vgs
4) sleep 180
5) vgs
6) service clvmd stop [FAILED]


Actual results:
served is fenced during graceful reboot because cluster fails to stop

Expected results:
cluster stops without issues and server reboots gracefully


Additional info:

Attaching debug output of manual blk-availability script

Comment 1 Josef Zimek 2016-09-19 13:29:28 UTC

Created attachment 1202482 [details]
blk-availability debug node 05

Comment 4 Peter Rajnoha 2016-09-20 09:30:52 UTC

I've managed to reproduce. The problem here is that LVM doesn't have up-to-date view of VGs which are available after PVs are removed from the system.

[root@rhel6-b ~]# rpm -q lvm2
lvm2-2.02.111-2.el6.x86_64

[root@rhel6-b ~]# lvm dumpconfig --type diff
global {
	locking_type=3
}
devices {
	preferred_names=["^/dev/mpath/", "^/dev/mapper/mpath", "^/dev/[hs]d"]
	filter=["a|/dev/mapper|", "r|.*|"]
}

[root@rhel6-b ~]# lsblk -s
NAME                    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
mpath_dev1 (dm-2)       253:2    0    4G  0 mpath 
|-sda                     8:0    0    4G  0 disk  
`-sdd                     8:48   0    4G  0 disk  
vg-lvol0 (dm-4)         253:4    0    4M  0 lvm   
`-mpath_dev2 (dm-3)     253:3    0    4G  0 mpath 
  |-sdb                   8:16   0    4G  0 disk  
  `-sdc                   8:32   0    4G  0 

[root@rhel6-b ~]# blkdeactivate -u -l wholevg
Deactivating block devices:
  [DM]: deactivating mpath device mpath_dev1 (dm-2)... done
  [LVM]: deactivating Volume Group vg... done
  [DM]: deactivating mpath device mpath_dev2 (dm-3)... done

[root@rhel6-b ~]# vgs
  Couldn't find device with uuid 4ybKz5-93UJ-kNFk-a3iS-NH3j-gLf4-aJxNpf.
  VG   #PV #LV #SN Attr   VSize VFree
  vg     2   1   0 wz-pnc 7.99g 7.99g

[root@rhel6-b ~]# vgs   
  No volume groups found


That also means:

[root@rhel6-b ~]# service blk-availability stop
Stopping block device availability: Deactivating block devices:
  [DM]: deactivating mpath device mpath_dev1 (dm-2)... done
  [LVM]: deactivating Volume Group vg... done
  [DM]: deactivating mpath device mpath_dev2 (dm-3)... done
                                                           [  OK  ]
[root@rhel6-b ~]# service clvmd stop
Deactivating clustered VG(s):   Volume group "vg" not found
  Skipping volume group vg
                                                           [FAILED]

The "vgs" (or vgdisplay) is used within clvmd init script to collect all clustered VGs to deactivate and then it passed this list of VGs to vgchange -an command. However, the VGs are already deactivated and the underlying PVs are gone too.

As visible from the example above, this is some form of caching issue because first "vgs" still displays the VG (while it shouldn't), though the second "vgs" doesn't see it anymore (because the cache has been updated).

========

We've fixed several caching issues in z-stream versions - the last one is lvm2-2.02.111-2.el6_6.6 - with this build, the problem is fixed already:

[root@rhel6-b ~]# rpm -q lvm2
lvm2-2.02.111-2.el6_6.6.x86_64

[root@rhel6-b ~]# blkdeactivate -u -l wholevg
Deactivating block devices:
  [DM]: deactivating mpath device mpath_dev1 (dm-2)... done
  [LVM]: deactivating Volume Group vg... done
  [DM]: deactivating mpath device mpath_dev2 (dm-3)... done

[root@rhel6-b ~]# vgs
  No volume groups found


That also means:

[root@rhel6-b ~]# service blk-availability stop
Stopping block device availability: Deactivating block devices:
  [DM]: deactivating mpath device mpath_dev1 (dm-2)... done
  [LVM]: deactivating Volume Group vg... done
  [DM]: deactivating mpath device mpath_dev2 (dm-3)... done
                                                           [  OK  ]
[root@rhel6-b ~]# service clvmd stop
Signaling clvmd to exit                                    [  OK  ]
clvmd terminated                                           [  OK  ]


Please, update to latest 6.6.z lvm2 release (lvm2-2.02.111-2.el6_6.6) and let me know if this resolves the issue. If yes, we'll close this bug as CURRENTRELEASE.

Comment 5 Peter Rajnoha 2016-09-20 09:41:22 UTC

(In reply to Peter Rajnoha from comment #4)
> Please, update to latest 6.6.z lvm2 release (lvm2-2.02.111-2.el6_6.6) and
> let me know if this resolves the issue.

(In RHEL6.7 release, this is also a package released via z-stream as lvm2-2.02.118-3.el6_7.3 and higher, in RHEL6.8 it's already resolved within main release.)