Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 960396

Summary:	pvmove on clustered VG should check the modules and the service on all the nodes before proceeding
Product:	Red Hat Enterprise Linux 6	Reporter:	Pierguido Lambri <plambri>
Component:	lvm2	Assignee:	Jonathan Earl Brassow <jbrassow>
lvm2 sub component:	Clustering / clvmd (RHEL6)	QA Contact:	Cluster QE <mspqa-list>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	agk, cmarthal, dwysocha, heinzm, jbrassow, kvanwess, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac
Version:	6.5
Target Milestone:	pre-dev-freeze
Target Release:	6.6
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	lvm2-2.02.107-1.el6	Doc Type:	Bug Fix
Doc Text:	No documentation needed.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-10-14 08:24:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1075263

Comment 8 Jonathan Earl Brassow 2014-04-29 20:13:29 UTC

I have a cluster of three nodes with shared storage. The last node (bp-03) does not have 'cmirrord' running.

Attempting 'pvmove' from node3 gives:
---------------------------
[root@bp-03 ~]# devices vg
LV Attr Cpy%Sync Devices
lv -wi-a----- /dev/sdb1(0)
[root@bp-03 ~]# pvmove /dev/sdb1 /dev/sdc1
Cannot move in clustered VG vg, clustered mirror (cmirror) not detected and LVs are activated non-exclusively.
---------------------------
The error message could be cleaned up a bit, but no action is performed and the system continues to operate just fine.

Attempting 'pvmove' from node1 gives:
---------------------------
[root@bp-01 ~]# pvmove /dev/sdb1 /dev/sdc1
Error locking on node bp-03: device-mapper: reload ioctl on failed: Invalid argument
Failed to suspend lv
ABORTING: Temporary pvmove mirror activation failed.
---------------------------
This is a bit more cryptic, but we can see the problem was on bp-03. Going there and looking at the logs, we find:
---------------------------
device-mapper: dm-log-userspace: Unable to send3
device-mapper: dm-log-userspace: Userspace log server not found
device-mapper: table: 253:4: mirror: Error creating mirror dirty log
device-mapper: ioctl: error adding target to table
---------------------------

So far, I don't think this is too much for the user to handle; but the real problem is state the machines are left in. Nodes that do not have 'cmirrord' running on them are left with a residual DM device (vg-pvmove0 in this case) that LVM is unaware of. You can only discover this if you use 'dmsetup ls' and check all the nodes. What's worse, cleaning up that device, starting 'cmirrord' on bp-03, and attempting pvmove again causes cluster mirror requests to retry indefinitely with no way to kill pvmove.

The error path in this case must be cleaned up - even though it is unlikely that cmirrord would be running on only a subset of nodes.

Comment 9 Jonathan Earl Brassow 2014-04-30 17:12:02 UTC

(In reply to Jonathan Earl Brassow from comment #8)
> What's worse, cleaning up that
> device, starting 'cmirrord' on bp-03, and attempting pvmove again causes
> cluster mirror requests to retry indefinitely with no way to kill pvmove.

This is not true.  After cleaning up the residual DM device and starting 'cmirrord' on bp-03, the pvmove executes just fine.  I had conflicting config files due to debugging a different problem with this cluster earlier.

It still holds that we should clean-up the error path so that the residual DM device is not there, if possible.

Comment 10 Jonathan Earl Brassow 2014-05-01 15:36:20 UTC

pvmove does try to revert the VG after the error occurs and is successful on all the nodes except for the one where the problem originates.  This is because the node not running cmirrord fails to load the table for the pvmove mirror.  This means that the LV that would contain the pvmove segment does not get a new (inactive) table.

The revert attempts to clear any inactive tables.  The nodes that have 'cmirrord' running have inactive tables that point to the pvmove segment, so they properly clear the inactive tables and remove the pvmove segment.  The remaining nodes have no inactive tables to clear and thus no pointers to the pvmove segment (which has been created but has no table).  So, the pvmove DM device stays around.

I've tried to clean this up by tweaking the code in _lv_resume when laopts->revert is set.  However, this didn't work because there are no links to the pvmove segment from any inactive tables.  I'm going to move on and see if there is something that can be done to remove a DM device if a table fails to load - that will hopefully get rid of the residual device.

Comment 12 Jonathan Earl Brassow 2014-05-02 20:41:49 UTC

In dm_tree_preload_children, when a _create_node succeeds but a _load_node fails, I've added (essentially) a call to '_deactivate_node' to remove the tableless device.  dm_tree_preload_children still returns error in this case as it always has.

When "update_metadata -> _suspend_lvs -> suspend_lvs -> suspend_lv" is called to load the pvmove segment, the above change ensures that the node that fails to load the table (due to cmirrord not running) will remove the associated device.  For a reason I can't explain, this causes the subsequent operation (vg_revert + revert_lv) to have no effect in removing the pvmove segment on the nodes that /are/ running cmirrord - something it did just fine before the above changes were made.

I can't see anything that would cause this change in behavior.  The initial 'suspend_lv' gets an error from the node not running cmirrord just as it did before.  The state of the DM devices are exactly the same before the 'revert_lv' is called with the exception that the pvmove device with the empty table on the node(s) not running cmirrord is not present anymore.

Comment 13 Jonathan Earl Brassow 2014-05-02 21:35:35 UTC

one difference.  When revert_lv is called and clvmd gets down to calling dm_task_run(dmt->type == DM_DEVICE_REMOVE) -> _do_dm_ioctl:

This one works (unpatched code):
1753            if (ioctl(_control_fd, command, dmi) < 0 &&
(gdb) p *dmi
$1 = {version = {4, 0, 0}, data_size = 16384, data_start = 312, 
  target_count = 0, open_count = 0, flags = 524, event_nr = 6327708, 
  padding = 0, dev = 64772, name = '\000' <repeats 127 times>, 
  uuid = '\000' <repeats 128 times>, data = "\000\000\000\000\000\000"}

And this one fails with errno=6 (No such device or address):
1753            if (ioctl(_control_fd, command, dmi) < 0 &&
(gdb) p *dmi
$3 = {version = {4, 0, 0}, data_size = 16384, data_start = 312, 
  target_count = 0, open_count = 0, flags = 524, event_nr = 6322248, 
  padding = 0, dev = 64772, name = "vg-pvmove0", '\000' <repeats 117 times>, 
  uuid = '\000' <repeats 128 times>, data = "\000\000\000\000\000\000"}

The difference seems to be that one is passing in a name.

Comment 14 Jonathan Earl Brassow 2014-05-02 21:59:26 UTC

Ok, I've got a working patch.  I'll clean it up before posting.

Comment 15 Jonathan Earl Brassow 2014-05-28 17:54:35 UTC

Fix committed upstream:
commit 442820aae3648e1846417d1248fa36030eba4bd8
Author: Jonathan Brassow <jbrassow>
Date:   Wed May 28 10:17:15 2014 -0500

Comment 16 Jonathan Earl Brassow 2014-05-28 20:33:34 UTC

The solution for this bug is to clean-up any residual device-mapper devices left over from the failed pvmove attempt.  The user is then able to check the logs for the reason for the failure and discover that cmirrord is not running on a sub-set of the nodes.  They can then start cmirrord nodes and try again.  They should /not/ need to reboot or clean-up residual device-mapper devices before trying again.

Previous steps that would reproduce the problem:
1) Setup cluster, but do not start cmirrord on a subset of nodes
2) Attempt pvmove on an LV from a node that /is/ running cmirrord
*) Previously, this would leave residual pvmove DM devices on the non-cmirrord nodes that would interfer with subsequent LVM operations.

Comment 19 Nenad Peric 2014-07-17 12:23:45 UTC

I executed a pvmove on a cluster node which had cmirrord running, but it reports that it is skipping mirror, and yet it moves the PV. 

This can be seen below. What does a Skipping mirror message mean?
The PV belonging to a mirror gets removed, so I do not understand this message. 


[root@virt-064 ~]# lvs -a -o+devices
  LV                VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log         Cpy%Sync Convert Devices                                                                    
  linear            cluster    -wi-a----- 100.00m                                                             /dev/sda(0)                                                                
  mirror            cluster    mwi-a-m--- 500.00m                                mirror_mlog 100.00           mirror_mimage_0(0),mirror_mimage_1(0),mirror_mimage_2(0),mirror_mimage_3(0)
  [mirror_mimage_0] cluster    iwi-aom--- 500.00m                                                             /dev/sda(25)                                                               
  [mirror_mimage_1] cluster    iwi-aom--- 500.00m                                                             /dev/sdd(0)                                                                
  [mirror_mimage_2] cluster    iwi-aom--- 500.00m                                                             /dev/sdi(0)                                                                
  [mirror_mimage_3] cluster    iwi-aom--- 500.00m                                                             /dev/sdh(0)                                                                
  [mirror_mlog]     cluster    lwi-aom---   4.00m                                                             /dev/sdh(125)                                                              
  lv_root           vg_virt064 -wi-ao----   6.71g                                                             /dev/vda2(0)                                                               
  lv_swap           vg_virt064 -wi-ao---- 816.00m                       



root@virt-064 ~]# vgextend cluster /dev/sdb
  Physical volume "/dev/sdb" successfully created
  Volume group "cluster" successfully extended
[root@virt-064 ~]# pvmove /dev/sdh /dev/sdb
  Skipping mirror LV mirror
  /dev/sdh: Moved: 0.8%
  /dev/sdh: Moved: 99.2%
  /dev/sdh: Moved: 100.0%


[root@virt-064 ~]# lvs -a -o+devices
  LV                VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log         Cpy%Sync Convert Devices                                                                    
  linear            cluster    -wi-a----- 100.00m                                                             /dev/sda(0)                                                                
  mirror            cluster    mwi-a-m--- 500.00m                                mirror_mlog 100.00           mirror_mimage_0(0),mirror_mimage_1(0),mirror_mimage_2(0),mirror_mimage_3(0)
  [mirror_mimage_0] cluster    iwi-aom--- 500.00m                                                             /dev/sda(25)                                                               
  [mirror_mimage_1] cluster    iwi-aom--- 500.00m                                                             /dev/sdd(0)                                                                
  [mirror_mimage_2] cluster    iwi-aom--- 500.00m                                                             /dev/sdi(0)                                                                
  [mirror_mimage_3] cluster    iwi-aom--- 500.00m                                                             /dev/sdb(0)                                                                
  [mirror_mlog]     cluster    lwi-aom---   4.00m                                                             /dev/sdb(125)                                                              
  lv_root           vg_virt064 -wi-ao----   6.71g                                                             /dev/vda2(0)                                                               
  lv_swap           vg_virt064 -wi-ao---- 816.00m                                                             /dev/vda2(1718)

Comment 21 Nenad Peric 2014-08-06 11:35:48 UTC

Opening a new bug for the message. 

Marking this one as VERIFIED with:

lvm2-2.02.108-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014
lvm2-libs-2.02.108-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014
lvm2-cluster-2.02.108-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014
udev-147-2.57.el6    BUILT: Thu Jul 24 15:48:47 CEST 2014
device-mapper-1.02.87-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014
device-mapper-libs-1.02.87-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014
device-mapper-event-1.02.87-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014
device-mapper-event-libs-1.02.87-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 15:43:06 CEST 2014
cmirror-2.02.108-1.el6    BUILT: Thu Jul 24 17:29:50 CEST 2014

Comment 22 Jonathan Earl Brassow 2014-08-26 14:51:46 UTC

please update this bug with the pointer to the new bug.

that strange message is a consequence of the structure of a mirror.  In the case of comment 19, it is not the LV "mirror" that gets moved, but rather its sub-LV "mirror_mimage_3".  The code just needs to do a better job with the messaging, as you say.

Comment 23 errata-xmlrpc 2014-10-14 08:24:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1387.html