Bug 697959
Summary: | vgimportclone fails because lvm query commands (pvs, pvdisplay) fail to report vgname when given cloned PV | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Dave Wysochanski <dwysocha> | ||||||
Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.6 | CC: | agk, amote, dwysocha, fnadge, gservat, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac | ||||||
Target Milestone: | rc | Keywords: | Regression | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | lvm2-2.02.84-3.el5 | Doc Type: | Bug Fix | ||||||
Doc Text: |
vgimportclone script triggered a code path in lvm command which has been accessing already released memory when duplicated PV was found. By saving problematic strings to a temporal buffer this bug has been fixed.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 712147 1012161 (view as bug list) | Environment: | |||||||
Last Closed: | 2011-07-21 10:51:56 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 707606, 712147, 1020442 | ||||||||
Attachments: |
|
Description
Dave Wysochanski
2011-04-19 18:54:03 UTC
Note that upstream (2.02.84+) does not seem to have this problem: # git log | head commit 35c174671ba40252cc531f3d14400dba30c88592 Author: Petr Rockai <prokai> Date: Tue Apr 19 15:28:17 2011 +0000 Always copy all tests to builddir regardless of T/S, since subsequent runs (possibly with different T or S) skip that step. commit 5a2322eb7dd8c2db65cf7e119725f6406049d649 Author: Mike Snitzer <snitzer> Date: Wed Apr 13 18:26:39 2011 +0000 # tools/lvm pvs --noheadings -o vg_name /dev/mpath/mpath16p1 Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath15p1 not /dev/mpath/mpath16p1 DTI_apps I just tested the 5.7 candidates and they seem to still have this problem so maybe it has been fixed in some of the latest updates. I tested lvm2-2.02.84-2.el5.x86_64.rpm + device-mapper-1.02.63-2.el5.x86_64.rpm by extracting via rpm2cpio and setting LD_LIBRARY_PATH, but not a full update of the machine. # ./usr/sbin/lvm pvs --noheadings -o vg_name /dev/mapper/mpath16p1 Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath15p1 not /dev/mpath/mpath16p1 get_pv_from_vg_by_id: vg_read_internal failed to read VG DTI_apps [root@rhel5 lvm-dm-rhel5.7]# ./usr/sbin/lvm version LVM version: 2.02.84(2)-RHEL5 (2011-03-22) Library version: 1.02.63-RHEL5 (2011-03-04) Driver version: 4.11.5 [root@rhel5 lvm-dm-rhel5.7]# ldd ./usr/sbin/lvm linux-vdso.so.1 => (0x00007fff6dffc000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003b67e00000) libdevmapper-event.so.1.02 => /root/updates/lvm-dm-rhel5.7/lib64/libdevmapper-event.so.1.02 (0x00002b8e1e829000) libdevmapper.so.1.02 => /root/updates/lvm-dm-rhel5.7/lib64/libdevmapper.so.1.02 (0x00002b8e1ea2f000) libreadline.so.5 => /usr/lib64/libreadline.so.5 (0x0000003b69600000) libncurses.so.5 => /usr/lib64/libncurses.so.5 (0x0000003b7a000000) libc.so.6 => /lib64/libc.so.6 (0x0000003b67600000) /lib64/ld-linux-x86-64.so.2 (0x0000003b67200000) libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003b68e00000) libsepol.so.1 => /lib64/libsepol.so.1 (0x0000003b69200000) Simple reproducer is to use loopback files and losetup on the same file twice. I reproduced this easily with that method. Actually I am not sure it has been fixed upstream. It may depend on what is in the cache / how many devices are duplicates. When using loop files on upstream version it seems to have reproduced the problem: # ldd tools/lvm linux-vdso.so.1 => (0x00007fffdbb34000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003b67e00000) libdevmapper-event.so.1.02 => /root/lvmtest/code/lvm2/daemons/dmeventd/libdevmapper-event.so.1.02 (0x00002b3e3cab9000) libdevmapper.so.1.02 => /root/lvmtest/code/lvm2/libdm/libdevmapper.so.1.02 (0x00002b3e3ccbf000) libreadline.so.5 => /usr/lib64/libreadline.so.5 (0x0000003b69600000) libncurses.so.5 => /usr/lib64/libncurses.so.5 (0x0000003b7a000000) libc.so.6 => /lib64/libc.so.6 (0x0000003b67600000) /lib64/ld-linux-x86-64.so.2 (0x0000003b67200000) libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003b68e00000) libsepol.so.1 => /lib64/libsepol.so.1 (0x0000003b69200000) [root@rhel5 lvm2]# tools/lvm pvs --noheadings -o vg_name /dev/loop2 Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/sda1 not /dev/loop2 Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/loop3 not /dev/sda1 Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath16p1 not /dev/mpath/mpath15p1 get_pv_from_vg_by_id: vg_read_internal failed to read VG DTI_apps You have a memory leak (not released memory pool): [0x1ba51ca0] format_instance You have a memory leak (not released memory pool): [0x1ba51ca0] format_instance This also shows a free bug - I traced this to _vg_read() code - that one is fairly obvious (look for "fid = fmt->ops->create_instance"). However when I use a different duplicate, it succeeds: # tools/lvm pvs --noheadings -o vg_name /dev/mapper/mpath16p1 Found duplicate PV nkeZjQANvjyrSLD2Nq5EAXXjYP83c2q0: using /dev/mpath/mpath15p1 not /dev/mpath/mpath16p1 DTI_apps Created attachment 493299 [details]
vgimportclone test to be put in upstream
Attached a testcase for this bz. This should be able to run in upstream nightly lvm2 test.
I tested it on lvm git 2.02.56, and it passed. Then I switched branches to 2.02.74 and it failed on that version.
vgimportclone is just a script. So can someone please determine a minimum necessary sequence of LVM commands to reproduce this problem? IOW expand out the script inside your test case and then try deleting as much as you can to find out what the *underlying* problem is. Oh, sorry I wasn't clear. The pvs command in the test script will fail to print the vgname. I'll attach another script that is not in the nightly test and does not include the vgimportclone command. Assuming it might be related to my proposal patch: http://www.redhat.com/archives/lvm-devel/2011-April/msg00039.html (with minor change in http://www.redhat.com/archives/lvm-devel/2011-April/msg00040.html) Waits for review. Created attachment 493510 [details]
Simpler test that reproduces the problem with a few commands
New test script which shows the failure of 'pvs' to find the vgname, when duplicate PV is attached to the system.
On RHEL5.5, pvs works correctly, showing the vgname every time (though get_pv_from_vg_by_id still fails with an odd output):
# rpm -q lvm2
lvm2-2.02.56-8.el5
# grep filter /etc/lvm/lvm.conf | grep -v \#
filter = [ "a/loop/", "r/.*/" ]
# source /tmp/test.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.009567 seconds, 110 MB/s
No physical volume label read from /dev/loop0
Physical volume "/dev/loop0" successfully created
Volume group "vgtest" successfully created
Querying /dev/loop0 before duplicate PV is added
PV VG Fmt Attr PSize PFree
/dev/loop0 vgtest lvm2 a- 1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
Found duplicate PV REwmWCulCPNjShq6XHqqGGg7hN06o2ci: using /dev/loop1 not /dev/loop0
get_pv_from_vg_by_id: vg_read_internal failed to read VG fy95
PV VG Fmt Attr PSize PFree
/dev/loop1 vgtest lvm2 a- 1.00M 1.00M
Querying /dev/loop1 after duplicate PV is added
Found duplicate PV REwmWCulCPNjShq6XHqqGGg7hN06o2ci: using /dev/loop0 not /dev/loop1
get_pv_from_vg_by_id: vg_read_internal failed to read VG fy95
PV VG Fmt Attr PSize PFree
/dev/loop0 vgtest lvm2 a- 1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
Found duplicate PV REwmWCulCPNjShq6XHqqGGg7hN06o2ci: using /dev/loop1 not /dev/loop0
get_pv_from_vg_by_id: vg_read_internal failed to read VG fy95
PV VG Fmt Attr PSize PFree
/dev/loop1 vgtest lvm2 a- 1.00M 1.00M
Now on RHEL5.6 some of the queries fail. And strangely, pvs returns the vgname on the first query of /dev/loop0 after adding the duplicate PV /dev/loop1, but on the second query of /dev/loop0 it fails.
# rpm -q lvm2
lvm2-2.02.74-5.el5_6.1
# grep filter /etc/lvm/lvm.conf | grep -v \#
filter = [ "a/loop/", "r/.*/" ]
# source test.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.005961 seconds, 176 MB/s
No physical volume label read from /dev/loop0
Physical volume "/dev/loop0" successfully created
Volume group "vgtest" successfully created
Querying /dev/loop0 before duplicate PV is added
PV VG Fmt Attr PSize PFree
/dev/loop0 vgtest lvm2 a- 1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
PV VG Fmt Attr PSize PFree
/dev/loop0 vgtest lvm2 a- 1.00M 1.00M
Querying /dev/loop1 after duplicate PV is added
Found duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop0 not /dev/loop1
get_pv_from_vg_by_id: vg_read_internal failed to read VG pC
PV VG Fmt Attr PSize PFree
/dev/loop0 lvm2 a- 1.00M 1.00M
Querying /dev/loop0 after duplicate PV is added
Found duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop1 not /dev/loop0
get_pv_from_vg_by_id: vg_read_internal failed to read VG -ROM_QM00003
PV VG Fmt Attr PSize PFree
/dev/loop1 lvm2 a- 1.00M 1.00M
ound duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop0 not > /dev/loop1 > get_pv_from_vg_by_id: vg_read_internal failed to read VG pC > PV VG Fmt Attr PSize PFree > /dev/loop0 lvm2 a- 1.00M 1.00M > Querying /dev/loop0 after duplicate PV is added > Found duplicate PV 3bgfGvNjOOWT4oJK8fY40SkNc8lSp5np: using /dev/loop1 not > /dev/loop0 > get_pv_from_vg_by_id: vg_read_internal failed to read VG -ROM_QM00003 > PV VG Fmt Attr PSize PFree > /dev/loop1 lvm2 a- 1.00M 1.00M Indeed - that should be addressed by my patch from comment 8. Thanks Zdenek for the quick fix! Looks like this has fixed the problem. I've submitted a patch to lvm-devel for the nightly test. It now passes, but before your patches it failed. Can we get a new RHEL5.7 build I can give to the customer? Fixed in upstream cvs, setting POST fo now. Fixed in lvm2-2.02.84-3.el5. I have tested this fix using a script that included vgimportclone and verified it. Will give this to customer to test as well. Fix has been verified by customer. Marking verified based on comments #16 and #17. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: This field is the basis of the errata or release note for this bug. It can also be used for change logs. The Technical Note template, known as CCFR, is as follows: Cause What actions or circumstances cause this bug to present. Consequence What happens when the bug presents. Fix What was done to fix the bug. Result What now happens when the actions or circumstances above occur. Note: this is not the same as the bug doesn’t present anymore. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,13 +1 @@ -This field is the basis of the errata or release note for this bug. It can also be used for change logs. +vgimportclone script triggered a code path in lvm command which has been accessing already released memory when duplicated PV was found. By saving problematic strings to a temporal buffer this bug has been fixed.- -The Technical Note template, known as CCFR, is as follows: - -Cause - What actions or circumstances cause this bug to present. -Consequence - What happens when the bug presents. -Fix - What was done to fix the bug. -Result - What now happens when the actions or circumstances above occur. - Note: this is not the same as the bug doesn’t present anymore. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1071.html An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1071.html When will this fix make it into RHEL 6.1? I just ran into the bug. |