Bug 1287972

Summary: vgimportclone fails because of duplicate PV
Product: Red Hat Enterprise Linux 6 Reporter: Shivasharan <sharan8989>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Scripts / lvmdump / vgimportclone (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: urgent    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, sharan8989, zkabelac
Version: 6.4   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 10:59:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vgimportclone output with -d and -vvvv options
none
lsblk command output
none
output of vgscan command
none
vgscan -vvvv output
none
vgimportclone output for mpathj
none
lsblk output when issue was seen on mpathj none

Description Shivasharan 2015-12-03 08:06:41 UTC
Description of problem:
This issue is similar to Bug 1012161. While trying to import vg using vgimportclone command, it throws following error:
Fatal: /dev/mapper/mpathjp1 is not in a VG.


Version-Release number of selected component (if applicable):
2.6.32-358.el6.x86_64

How reproducible:
By exposing two clones of same VG from one host to different host.

Steps to Reproduce:
1.Take two clones of a VG.
2.expose the clone devices on different host.
3.Try importing a VG with different name using vgimportclone using the duplicate PV names

Actual results:
vgimportclone on duplicate PV fails with following error:
Fatal: /dev/mapper/mpathjp1 is not in a VG.

Expected results:
vgimportclone should detect the PVID clash, as there are two sets of devices belonging to same source VG, and it should rename duplicate PVID and successfully import VG.

Additional info:
I saw similar bugs (697959 and 1012161). Both the bugs are raised on RHEL 5.x. And I am not sure if they made into RHEL 6.x versions.

Comment 2 Peter Rajnoha 2015-12-03 10:03:19 UTC
What's the version of lvm2 package? Also, please attach full output from the vgimportlone command while adding "-d -vvvv" options to it for debug output and also output from lsblk command. Thanks.

Comment 3 Shivasharan 2015-12-03 12:50:20 UTC
Here is lvm2 info:

LVM version:     2.02.98(2)-RHEL6 (2012-10-15)
  Library version: 1.02.77-RHEL6 (2012-10-15)
  Driver version:  4.23.6

I will provide you output of two commands shortly. Thanks.

Comment 4 Peter Rajnoha 2015-12-03 14:27:21 UTC
(In reply to Shivasharan from comment #3)
> Here is lvm2 info:
> 
> LVM version:     2.02.98(2)-RHEL6 (2012-10-15)
>   Library version: 1.02.77-RHEL6 (2012-10-15)
>   Driver version:  4.23.6
> 

So RHEL 6.4. The lvm2 version is quite old there - I think there were a few bugs in vgimportclone which we already fixed in 6.5 and higher. But I need to check what the bugs were about exactly, I don't remember now...

Comment 5 Peter Rajnoha 2015-12-03 15:16:33 UTC
(In reply to Peter Rajnoha from comment #4)
> (In reply to Shivasharan from comment #3)
> > Here is lvm2 info:
> > 
> > LVM version:     2.02.98(2)-RHEL6 (2012-10-15)
> >   Library version: 1.02.77-RHEL6 (2012-10-15)
> >   Driver version:  4.23.6
> > 
> 
> So RHEL 6.4. The lvm2 version is quite old there - I think there were a few
> bugs in vgimportclone which we already fixed in 6.5 and higher.

...nope, that was a bug in 6.6 which was then fixed in 6.6.z. So I'll wait for your debug info.

Comment 6 Shivasharan 2015-12-07 17:01:13 UTC
Created attachment 1103305 [details]
vgimportclone output with -d and -vvvv options

I attaching vgimportclone command output (executed with -d and -vvvv options).

Comment 7 Shivasharan 2015-12-07 17:02:21 UTC
Created attachment 1103307 [details]
lsblk command output

Attached another file containing lsblk output.

Comment 8 Peter Rajnoha 2015-12-08 08:18:17 UTC
Based on the logs, the PV you're referencing on command line (mpathkp1) does not exist. The vgimportclone log shows no pvs command output for /dev/mapper/mpathkp1 as well as lsblk not having mpathkp1 listed at all.

Are you referencing proper mpath device (the PV used in vgimportclone)?

Comment 9 Shivasharan 2015-12-09 11:37:37 UTC
Created attachment 1103831 [details]
output of vgscan command

Yea. mpathk is valid MPIO device. It is visible on host.

I have attached output of vgscan command which complains mpathkp1 having duplicate PV ID as that of mpathjp1. But unfortunately, I don't see mpathj as well in lsblk output. They are actually two copies of same VG (production VG is residing on different host and these are array based backups).

Comment 10 Peter Rajnoha 2015-12-09 12:37:07 UTC
Please, retry the vgscan with "vgscan -vvvv". Then try again with "devices/obtain_device_list_from_udev=0" in /etc/lvm/lvm.conf.

Comment 11 Peter Rajnoha 2015-12-09 13:02:59 UTC
(In reply to Peter Rajnoha from comment #10)
> Please, retry the vgscan with "vgscan -vvvv". Then try again with
> "devices/obtain_device_list_from_udev=0" in /etc/lvm/lvm.conf.

(And attach the output here please.)

Comment 12 Shivasharan 2015-12-17 13:08:06 UTC
Created attachment 1106718 [details]
vgscan -vvvv output

Attaching vgscan -vvvv output. Sorry for the delay.

I am also going to attach new lsblk and vgimportclone output as the error occurred with new PV this time.

Comment 13 Shivasharan 2015-12-17 13:08:47 UTC
Created attachment 1106719 [details]
vgimportclone output for mpathj

mpathj is the new device on which the error is seen.

Comment 14 Shivasharan 2015-12-17 13:09:39 UTC
Created attachment 1106720 [details]
lsblk output when issue was seen on mpathj

Attaching lsblk output when the issue was seen for mpathj device.

Comment 15 Shivasharan 2015-12-17 13:15:20 UTC
Note: devices/obtain_device_list_from_udev=0 was already present in /etc/lvm/lvm.conf file.

Comment 16 David Teigland 2015-12-17 16:11:00 UTC
I think that these commits fixed the vgimportclone regression caused by the process_each_pv rework:

b64da4d8b521 toollib: search for duplicate PVs only when needed
57d74a45a05e toollib: override the PV device with duplicates

Also, given the large number of duplicates reported by vgscan, I suspect that lvm is scanning mpath subdevices, which is probably a different issue.

Comment 17 Shivasharan 2015-12-18 10:19:09 UTC
Okay. Do you need any additional info?

Comment 18 Shivasharan 2015-12-18 13:13:06 UTC
I have raised the Severity. Can this issue be resolved during next week? Let me know if more info is required.

Comment 19 Shivasharan 2015-12-18 14:05:31 UTC
I have small query. Can I make one device (whose pvid is preceded by its duplicate) take precedence over its duplicate by executing some command? 

The goal is to make any PV (from 2 duplicate PVs) active when I wish using a command.

Comment 20 David Teigland 2015-12-18 15:12:26 UTC
You should always be able to use filters to work around issues like this.  Either accept only the devices you want to use, or reject the devices you don't want to use.

Comment 21 Shivasharan 2015-12-21 07:12:26 UTC
Do we have a command where I can include custom filter and run. Something analogous to - 

#pvs --config 'devices {filter =[...]}'

I want the control to shift from one PV to its duplicate PV. I tried pvscan with --config option. It seem to not work.

# pvscan --config 'devices { filter=["r|/dev/mapper/mpathbd|", "a|.*|" ]}'
Found duplicate PV G13Sc8dGbKy3RDA06cdaJ9nYuJkXYhTJ: using /dev/mapper/mpathbd not /dev/mapper/mpathbc
  PV /dev/mapper/mpathbd   VG sharan_vg       lvm2 [4.00 GiB / 4.00 GiB free]
  PV /dev/sda2             VG vg_lrmg054      lvm2 [67.88 GiB / 0    free]
  PV /dev/mapper/mpathd                       lvm2 [4.00 GiB]
  PV /dev/mapper/mpathc                       lvm2 [4.00 GiB]
  PV /dev/mapper/mpathe                       lvm2 [4.00 GiB]
  Total: 5 [83.87 GiB] / in use: 2 [71.87 GiB] / in no VG: 3 [12.00 GiB]

I still see mpathbd on the host even after filtering while scanning.

Thanks,
Sharan

Comment 22 David Teigland 2015-12-21 15:41:46 UTC
Try removing the final "a|.*|" entry in the filter line.

Comment 23 Shivasharan 2015-12-23 09:20:30 UTC
I should use filter in every command. Only then it works.

Do we know why vgimportclone fails from previous logs? Any lead?

Thanks,
Sharan

Comment 24 Peter Rajnoha 2016-01-07 14:29:03 UTC
(In reply to Shivasharan from comment #23)
> I should use filter in every command. Only then it works.
> 
> Do we know why vgimportclone fails from previous logs? Any lead?
> 

There are inconsistencies in the logs:

  - lsblks doesn't display lots of mpath devices (including mpathjp1 used for vgimportclone)
  - vgimportclone is called with a device that it doesn't see either (just like lsblk)
  - vgscan sees devices which lsblk and vgimportclone doesn't see

Note: there's no filtering in lsblk, so lsblk should be able to list all devices.
Based on this, it's hard to decide what's the exact setup then.

Also, looking at the vgscan log, it seems multipath component detection is not working correctly, for example:

(grep "Ignoring duplicate PV" vgscan.log)
#cache/lvmcache.c:1497       Ignoring duplicate PV PEdluaeBpyNmz7OQ6CE1A5lu7OLvQb21 on /dev/sdb1 
        - using dm /dev/mapper/mpathbp1 

#cache/lvmcache.c:1497       Ignoring duplicate PV S0Ld0yAFsNZvMpVkiFnvQvMwJv1wXd9L on /dev/sdah1
         - using dm /dev/mapper/mpathep1 

..and lots of others which are similar.

In this case, it seems the /dev/sd* are mulitpath componets while /dev/mapper/mpath* are multipath devices (of course, with same content).

At the same time, some of the devices are identified as multipath components correctly, for example:

(grep "Skipping mpath component" vgscan.log)
#filters/filter-mpath.c:163         /dev/sda: Skipping mpath component device 
#filters/filter-mpath.c:163         /dev/sdaw: Skipping mpath component device 
#filters/filter-mpath.c:163         /dev/sdr: Skipping mpath component device 
...

Were all the logs grabbed from exactly the same system run? If not, it seems the devices are changing very dynamically underneath...

Can you please try collecting the udev event log:

  udevadm monitor --udev --env (and saving the log to a file)

and then, while the udevadm monitor is running, call:

  lsblk
  vgscan -vvvv
  vgimportclone with -d and -vvvv
  lvmdump -u

And possibly making sure there's nothing else executed that could possibly work with those devices in parallel.

Comment 26 Jan Kurik 2017-12-06 10:59:58 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/