Bug 697959

Summary: vgimportclone fails because lvm query commands (pvs, pvdisplay) fail to report vgname when given cloned PV
Product: Red Hat Enterprise Linux 5 Reporter: Dave Wysochanski <dwysocha>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.6CC: agk, amote, dwysocha, fnadge, gservat, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.84-3.el5 Doc Type: Bug Fix
Doc Text:
vgimportclone script triggered a code path in lvm command which has been accessing already released memory when duplicated PV was found. By saving problematic strings to a temporal buffer this bug has been fixed.
Story Points: ---
Clone Of:
: 712147 1012161 (view as bug list) Environment:
Last Closed: 2011-07-21 10:51:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 707606, 712147, 1020442    
Attachments:
Description Flags
vgimportclone test to be put in upstream
none
Simpler test that reproduces the problem with a few commands none

Description Dave Wysochanski 2011-04-19 18:54:03 UTC
This appears to be a regression from RHEL5.5 (lvm2-2.02.56-8.el5) in RHEL5.6.z (lvm2-2.02.74-5.el5_6.1).

The initial failure was seen with vgimportclone fail like this:
# vgimportclone --basevgname BackupApp /dev/mpath/mpath16p1
Fatal: /dev/mpath/mpath16p1 is not in a VG.

The cloned PV was of course /dev/mapper/mpath16p1, and it was part of a volume group with a single PV, so extremely simple setup.

Once this failed, we see that the 'pvs' command fails by giving a blank vgname:
# pvs --noheadings -o vg_name /dev/mpath/mpath16p1
  Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath15p1 not /dev/mpath/mpath16p1
  get_pv_from_vg_by_id: vg_read_internal failed to read VG 
      
However a 'pvs' command with a filter applied shows the correct VG:
# pvs --config 'devices { filter = [ "a|mpath16|", "r|.*|" ] }' --noheadings -o vg_name /dev/mpath/mpath16p1
  DTI_apps


Steps to Reproduce:
1. Create a volume group with a single PV
2. Clone the PV and map to host
3. Run vgimportclone and/or pvs on the cloned PV.

Comment 1 Dave Wysochanski 2011-04-19 19:03:41 UTC
Note that upstream (2.02.84+) does not seem to have this problem:
# git log | head
commit 35c174671ba40252cc531f3d14400dba30c88592
Author: Petr Rockai <prokai>
Date:   Tue Apr 19 15:28:17 2011 +0000

    Always copy all tests to builddir regardless of T/S, since subsequent runs
    (possibly with different T or S) skip that step.

commit 5a2322eb7dd8c2db65cf7e119725f6406049d649
Author: Mike Snitzer <snitzer>
Date:   Wed Apr 13 18:26:39 2011 +0000
# tools/lvm pvs --noheadings -o vg_name /dev/mpath/mpath16p1
  Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath15p1 not /dev/mpath/mpath16p1
  DTI_apps

Comment 2 Dave Wysochanski 2011-04-19 19:11:05 UTC
I just tested the 5.7 candidates and they seem to still have this problem so maybe it has been fixed in some of the latest updates.

I tested lvm2-2.02.84-2.el5.x86_64.rpm + device-mapper-1.02.63-2.el5.x86_64.rpm by extracting via rpm2cpio and setting LD_LIBRARY_PATH, but not a full update of the machine.


# ./usr/sbin/lvm pvs --noheadings -o vg_name /dev/mapper/mpath16p1 
  Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath15p1 not /dev/mpath/mpath16p1
  get_pv_from_vg_by_id: vg_read_internal failed to read VG DTI_apps
      
[root@rhel5 lvm-dm-rhel5.7]# ./usr/sbin/lvm version
  LVM version:     2.02.84(2)-RHEL5 (2011-03-22)
  Library version: 1.02.63-RHEL5 (2011-03-04)
  Driver version:  4.11.5
[root@rhel5 lvm-dm-rhel5.7]# ldd ./usr/sbin/lvm 
	linux-vdso.so.1 =>  (0x00007fff6dffc000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000003b67e00000)
	libdevmapper-event.so.1.02 => /root/updates/lvm-dm-rhel5.7/lib64/libdevmapper-event.so.1.02 (0x00002b8e1e829000)
	libdevmapper.so.1.02 => /root/updates/lvm-dm-rhel5.7/lib64/libdevmapper.so.1.02 (0x00002b8e1ea2f000)
	libreadline.so.5 => /usr/lib64/libreadline.so.5 (0x0000003b69600000)
	libncurses.so.5 => /usr/lib64/libncurses.so.5 (0x0000003b7a000000)
	libc.so.6 => /lib64/libc.so.6 (0x0000003b67600000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003b67200000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003b68e00000)
	libsepol.so.1 => /lib64/libsepol.so.1 (0x0000003b69200000)

Comment 3 Dave Wysochanski 2011-04-19 19:34:03 UTC
Simple reproducer is to use loopback files and losetup on the same file twice.  I reproduced this easily with that method.

Comment 4 Dave Wysochanski 2011-04-19 19:45:00 UTC
Actually I am not sure it has been fixed upstream.  It may depend on what is in the cache / how many devices are duplicates.  When using loop files on upstream version it seems to have reproduced the problem:

# ldd tools/lvm
	linux-vdso.so.1 =>  (0x00007fffdbb34000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000003b67e00000)
	libdevmapper-event.so.1.02 => /root/lvmtest/code/lvm2/daemons/dmeventd/libdevmapper-event.so.1.02 (0x00002b3e3cab9000)
	libdevmapper.so.1.02 => /root/lvmtest/code/lvm2/libdm/libdevmapper.so.1.02 (0x00002b3e3ccbf000)
	libreadline.so.5 => /usr/lib64/libreadline.so.5 (0x0000003b69600000)
	libncurses.so.5 => /usr/lib64/libncurses.so.5 (0x0000003b7a000000)
	libc.so.6 => /lib64/libc.so.6 (0x0000003b67600000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003b67200000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003b68e00000)
	libsepol.so.1 => /lib64/libsepol.so.1 (0x0000003b69200000)
[root@rhel5 lvm2]# tools/lvm pvs --noheadings -o vg_name /dev/loop2
  Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/sda1 not /dev/loop2
  Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/loop3 not /dev/sda1
  Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath16p1 not /dev/mpath/mpath15p1
  get_pv_from_vg_by_id: vg_read_internal failed to read VG DTI_apps
      
  You have a memory leak (not released memory pool):
   [0x1ba51ca0] format_instance
  You have a memory leak (not released memory pool):
   [0x1ba51ca0] format_instance

This also shows a free bug - I traced this to _vg_read() code - that one is fairly obvious (look for "fid = fmt->ops->create_instance").

However when I use a different duplicate, it succeeds:

# tools/lvm pvs --noheadings -o vg_name /dev/mapper/mpath16p1
  Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath15p1 not /dev/mpath/mpath16p1
  DTI_apps

Comment 5 Dave Wysochanski 2011-04-19 22:04:53 UTC
Created attachment 493299 [details]
vgimportclone test to be put in upstream

Attached a testcase for this bz.  This should be able to run in upstream nightly lvm2 test.

I tested it on lvm git 2.02.56, and it passed.  Then I switched branches to 2.02.74 and it failed on that version.

Comment 6 Alasdair Kergon 2011-04-20 01:50:24 UTC
vgimportclone is just a script.

So can someone please determine a minimum necessary sequence of LVM commands to reproduce this problem?

IOW expand out the script inside your test case and then try deleting as much as you can to find out what the *underlying* problem is.

Comment 7 Dave Wysochanski 2011-04-20 14:43:57 UTC
Oh, sorry I wasn't clear.  The pvs command in the test script will fail to print the vgname.  I'll attach another script that is not in the nightly test and does not include the vgimportclone command.

Comment 8 Zdenek Kabelac 2011-04-20 14:47:27 UTC
Assuming it might be related to my proposal patch:

http://www.redhat.com/archives/lvm-devel/2011-April/msg00039.html
(with minor change in http://www.redhat.com/archives/lvm-devel/2011-April/msg00040.html)

Waits for review.

Comment 9 Dave Wysochanski 2011-04-20 15:00:28 UTC
Created attachment 493510 [details]
Simpler test that reproduces the problem with a few commands

New test script which shows the failure of 'pvs' to find the vgname, when duplicate PV is attached to the system.

On RHEL5.5, pvs works correctly, showing the vgname every time (though get_pv_from_vg_by_id still fails with an odd output):

# rpm -q lvm2
lvm2-2.02.56-8.el5
# grep filter /etc/lvm/lvm.conf | grep -v \#
    filter = [ "a/loop/", "r/.*/" ]
# source /tmp/test.sh 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.009567 seconds, 110 MB/s
  No physical volume label read from /dev/loop0
  Physical volume "/dev/loop0" successfully created
  Volume group "vgtest" successfully created
Querying /dev/loop0 before duplicate PV is added
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop0 vgtest lvm2 a-   1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
  Found duplicate PV REwmWCulCPNjShq6XHqqGGg7hN06o2ci: using /dev/loop1 not /dev/loop0
  get_pv_from_vg_by_id: vg_read_internal failed to read VG fy95
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop1 vgtest lvm2 a-   1.00M 1.00M
Querying /dev/loop1 after duplicate PV is added
  Found duplicate PV REwmWCulCPNjShq6XHqqGGg7hN06o2ci: using /dev/loop0 not /dev/loop1
  get_pv_from_vg_by_id: vg_read_internal failed to read VG fy95
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop0 vgtest lvm2 a-   1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
  Found duplicate PV REwmWCulCPNjShq6XHqqGGg7hN06o2ci: using /dev/loop1 not /dev/loop0
  get_pv_from_vg_by_id: vg_read_internal failed to read VG fy95
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop1 vgtest lvm2 a-   1.00M 1.00M


Now on RHEL5.6 some of the queries fail.  And strangely, pvs returns the vgname on the first query of /dev/loop0 after adding the duplicate PV /dev/loop1, but on the second query of /dev/loop0 it fails.
 
# rpm -q lvm2
lvm2-2.02.74-5.el5_6.1
# grep filter /etc/lvm/lvm.conf | grep -v \#
    filter = [ "a/loop/", "r/.*/" ]
# source test.sh 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.005961 seconds, 176 MB/s
  No physical volume label read from /dev/loop0
  Physical volume "/dev/loop0" successfully created
  Volume group "vgtest" successfully created
Querying /dev/loop0 before duplicate PV is added
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop0 vgtest lvm2 a-   1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop0 vgtest lvm2 a-   1.00M 1.00M
Querying /dev/loop1 after duplicate PV is added
  Found duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop0 not /dev/loop1
  get_pv_from_vg_by_id: vg_read_internal failed to read VG pC
  PV         VG   Fmt  Attr PSize PFree
  /dev/loop0      lvm2 a-   1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
  Found duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop1 not /dev/loop0
  get_pv_from_vg_by_id: vg_read_internal failed to read VG -ROM_QM00003
  PV         VG   Fmt  Attr PSize PFree
  /dev/loop1      lvm2 a-   1.00M 1.00M

Comment 10 Zdenek Kabelac 2011-04-20 15:10:26 UTC
ound duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop0 not
> /dev/loop1
>   get_pv_from_vg_by_id: vg_read_internal failed to read VG pC
>   PV         VG   Fmt  Attr PSize PFree
>   /dev/loop0      lvm2 a-   1.00M 1.00M
> Querying /dev/loop0 after duplicate PV is added
>   Found duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop1 not
> /dev/loop0
>   get_pv_from_vg_by_id: vg_read_internal failed to read VG -ROM_QM00003
>   PV         VG   Fmt  Attr PSize PFree
>   /dev/loop1      lvm2 a-   1.00M 1.00M

Indeed - that should be addressed by my patch from comment 8.

Comment 11 Dave Wysochanski 2011-04-21 16:02:08 UTC
Thanks Zdenek for the quick fix!

Looks like this has fixed the problem.  I've submitted a patch to lvm-devel for the nightly test.  It now passes, but before your patches it failed.

Can we get a new RHEL5.7 build I can give to the customer?

Comment 12 Milan Broz 2011-04-21 17:01:43 UTC
Fixed in upstream cvs, setting POST fo now.

Comment 13 Milan Broz 2011-04-27 08:40:34 UTC
Fixed in lvm2-2.02.84-3.el5.

Comment 16 Dave Wysochanski 2011-04-27 11:32:09 UTC
I have tested this fix using a script that included vgimportclone and verified it.  Will give this to customer to test as well.

Comment 17 Dave Wysochanski 2011-05-09 20:23:05 UTC
Fix has been verified by customer.

Comment 18 Corey Marthaler 2011-05-11 15:53:25 UTC
Marking verified based on comments #16 and #17.

Comment 20 Florian Nadge 2011-05-26 15:32:20 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This field is the basis of the errata or release note for this bug. It can also be used for change logs.

The Technical Note template, known as CCFR, is as follows:

Cause
    What actions or circumstances cause this bug to present.
Consequence
    What happens when the bug presents.
Fix
    What was done to fix the bug.
Result
    What now happens when the actions or circumstances above occur.
    Note: this is not the same as the bug doesn’t present anymore.

Comment 21 Zdenek Kabelac 2011-06-17 12:23:32 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,13 +1 @@
-This field is the basis of the errata or release note for this bug. It can also be used for change logs.
+vgimportclone script triggered a code path in lvm command which has been accessing already released memory when duplicated PV was found. By saving problematic strings to a temporal buffer this bug has been fixed.-
-The Technical Note template, known as CCFR, is as follows:
-
-Cause
-    What actions or circumstances cause this bug to present.
-Consequence
-    What happens when the bug presents.
-Fix
-    What was done to fix the bug.
-Result
-    What now happens when the actions or circumstances above occur.
-    Note: this is not the same as the bug doesn’t present anymore.

Comment 22 errata-xmlrpc 2011-07-21 10:51:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html

Comment 23 errata-xmlrpc 2011-07-21 12:29:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html

Comment 24 Gonzalo Servat 2011-08-03 07:49:50 UTC
When will this fix make it into RHEL 6.1? I just ran into the bug.