Bug 1663795

Summary: pvs crashes when vgrename runs at the same time as pvs
Product: Red Hat Enterprise Linux 7 Reporter: nikhil kshirsagar <nkshirsa>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Displaying and Reporting QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, loberman, mcsontos, msnitzer, prajnoha, rbednar, revers, rhandlin, teigland, yzheng, zkabelac
Version: 7.4   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.186-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-31 20:04:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1711360    

Description nikhil kshirsagar 2019-01-07 05:04:20 UTC
Description of problem:
same issue on RHEL7 as https://bugzilla.redhat.com/show_bug.cgi?id=1369741

Version-Release number of selected component (if applicable):
[root@nkshirsa sosreport-E-BRVNPD-IBK01-20181111023124]# cat uname 
Linux E-BRVNPD-IBK01 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@nkshirsa sosreport-E-BRVNPD-IBK01-20181111023124]# cat installed-rpms | grep lvm
lvm2-2.02.171-8.el7.x86_64                                  Wed Aug 29 18:04:21 2018
lvm2-libs-2.02.171-8.el7.x86_64                             Wed Aug 29 18:04:20 2018
mesa-private-llvm-3.9.1-3.el7.x86_64                        Wed Aug 29 18:02:15 2018


How reproducible:
run pvs in a loop and vgrename VG at the same time as pvs


Additional info:

core available if needed. same issue as sfdc case 01689391 which we had opened bz https://bugzilla.redhat.com/show_bug.cgi?id=1369741 (which was marked wontfix since it was a one-off crash for that customer)

this core matches the trace, seems exactly same to me, since we also see the vgrename's happening at the same time 

Nov 11 02:31:23 E-BRVNPD-IBK01 kernel: [kern.info] traps: pvs[18786] general protection ip:56491a7ad4bd sp:7ffe1e031738 error:0 in lvm[56491a71c000+1cb000]
Nov 11 02:31:23 E-BRVNPD-IBK01 abrt-hook-ccpp: [user.err] Process 18786 (lvm) of user 0 killed by SIGSEGV - dumping core

Core was generated by `pvs -a'.
Program terminated with signal 11, Segmentation fault.
#0  0x000056491a7ad4bd in dev_name (dev=0x56491ba8b320) at device/dev-cache.c:1539
1539		return (dev && dev->aliases.n) ? dm_list_item(dev->aliases.n, struct dm_str_list)->str :
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.168-8.el7.x86_64 elfutils-libs-0.168-8.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-9.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libsepol-2.5-6.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x000056491a7ad4bd in dev_name (dev=0x56491ba8b320) at device/dev-cache.c:1539
#1  0x000056491a7f25c5 in _vg_read_orphan_pv (info=0x56491ba882a0, baton=0x7ffe1e031870) at metadata/metadata.c:3808
#2  0x000056491a79ffb1 in lvmcache_foreach_pv (vginfo=vginfo@entry=0x56491ba62820, fun=fun@entry=0x56491a7f2580 <_vg_read_orphan_pv>, 
    baton=baton@entry=0x7ffe1e031870) at cache/lvmcache.c:2556
#3  0x000056491a7f6b42 in _vg_read_orphans (consistent=0x7ffe1e031a50, orphan_vgname=0x0, warn_flags=1, cmd=0x56491ba29030) at metadata/metadata.c:3928
#4  _vg_read (cmd=cmd@entry=0x56491ba29030, vgname=vgname@entry=0x56491ba70650 "#orphans_lvm2", vgid=vgid@entry=0x56491ba70640 "#orphans_lvm2", 
    warn_flags=warn_flags@entry=1, consistent=consistent@entry=0x7ffe1e031a50, precommitted=precommitted@entry=0) at metadata/metadata.c:4264
#5  0x000056491a7f822a in vg_read_internal (cmd=cmd@entry=0x56491ba29030, vgname=vgname@entry=0x56491ba70650 "#orphans_lvm2", 
    vgid=vgid@entry=0x56491ba70640 "#orphans_lvm2", warn_flags=warn_flags@entry=1, consistent=consistent@entry=0x7ffe1e031a50) at metadata/metadata.c:4892
#6  0x000056491a7f9e7c in _vg_lock_and_read (lockd_state=0, read_flags=262144, status_flags=0, lock_flags=33, vgid=0x56491ba70640 "#orphans_lvm2", 
    vg_name=0x56491ba70650 "#orphans_lvm2", cmd=0x56491ba29030) at metadata/metadata.c:5914
#7  vg_read (cmd=cmd@entry=0x56491ba29030, vg_name=vg_name@entry=0x56491ba70650 "#orphans_lvm2", vgid=vgid@entry=0x56491ba70640 "#orphans_lvm2", 
    read_flags=read_flags@entry=262144, lockd_state=0) at metadata/metadata.c:6021
#8  0x000056491a781ad7 in _process_pvs_in_vgs (cmd=cmd@entry=0x56491ba29030, read_flags=read_flags@entry=262144, 
    all_vgnameids=all_vgnameids@entry=0x7ffe1e031de0, all_devices=all_devices@entry=0x7ffe1e031df0, arg_devices=arg_devices@entry=0x7ffe1e031dc0, 
    arg_tags=arg_tags@entry=0x7ffe1e031da0, process_all_pvs=process_all_pvs@entry=1, handle=handle@entry=0x56491ba6e8b8, 
    process_single_pv=process_single_pv@entry=0x56491a778160 <_pvs_single>, process_all_devices=1) at toollib.c:4305
#9  0x000056491a7865fc in process_each_pv (cmd=cmd@entry=0x56491ba29030, argc=<optimized out>, argv=<optimized out>, 
    only_this_vgname=only_this_vgname@entry=0x0, all_is_set=<optimized out>, read_flags=262144, read_flags@entry=0, handle=handle@entry=0x56491ba6e8b8, 
    process_single_pv=0x56491a778160 <_pvs_single>) at toollib.c:4462
#10 0x000056491a77a9b3 in _do_report (cmd=cmd@entry=0x56491ba29030, handle=handle@entry=0x56491ba6e8b8, args=args@entry=0x7ffe1e032060, 
    single_args=single_args@entry=0x7ffe1e0320a8) at reporter.c:1175
#11 0x000056491a77ac2f in _report (cmd=0x56491ba29030, argc=0, argv=0x7ffe1e0326f8, report_type=<optimized out>) at reporter.c:1428
#12 0x000056491a76d595 in lvm_run_command (cmd=cmd@entry=0x56491ba29030, argc=0, argc@entry=2, argv=0x7ffe1e0326f8, argv@entry=0x7ffe1e0326e8)
    at lvmcmdline.c:2954
#13 0x000056491a76e4d3 in lvm2_main (argc=2, argv=0x7ffe1e0326e8) at lvmcmdline.c:3485
#14 0x00007f86484aec05 in __libc_start_main (main=0x56491a74cf80 <main>, argc=2, ubp_av=0x7ffe1e0326e8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffe1e0326d8) at ../csu/libc-start.c:274
#15 0x000056491a74cfae in _start ()




Here too we see, around the time of the core (Nov 11 02:31:23), we see in the backup folder of /etc/lvm in sosreport, something has been renaming volume groups at the same time ..

[root@nkshirsa backup]# grep vgrename -A3 * | grep "Nov 11" -B3
BCVBK_ACNTIFQA_vg_acntifq1_oradata_pr:description = "Created *after* executing 'vgrename vg_acntifq1_oradata_pr BCVBK_ACNTIFQA_vg_acntifq1_oradata_pr'"
BCVBK_ACNTIFQA_vg_acntifq1_oradata_pr-
BCVBK_ACNTIFQA_vg_acntifq1_oradata_pr-creation_host = "E-BRVNPD-IBK01"	# Linux E-BRVNPD-IBK01 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64
BCVBK_ACNTIFQA_vg_acntifq1_oradata_pr-creation_time = 1541871082	# Sun Nov 11 02:31:22 2018
--
BCVBK_CGRMASQB_vg_cgrmasq2_oradata_pr:description = "Created *after* executing 'vgrename vg_cgrmasq2_oradata_pr BCVBK_CGRMASQB_vg_cgrmasq2_oradata_pr'"
BCVBK_CGRMASQB_vg_cgrmasq2_oradata_pr-
BCVBK_CGRMASQB_vg_cgrmasq2_oradata_pr-creation_host = "E-BRVNPD-IBK01"	# Linux E-BRVNPD-IBK01 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64
BCVBK_CGRMASQB_vg_cgrmasq2_oradata_pr-creation_time = 1541871082	# Sun Nov 11 02:31:22 2018
--
BCVBK_KRNETLD1_vg_krnetld1_oradata_pr:description = "Created *after* executing 'vgrename vg_krnetld1_oradata_pr BCVBK_KRNETLD1_vg_krnetld1_oradata_pr'"
BCVBK_KRNETLD1_vg_krnetld1_oradata_pr-
BCVBK_KRNETLD1_vg_krnetld1_oradata_pr-creation_host = "E-BRVNPD-IBK01"	# Linux E-BRVNPD-IBK01 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64
BCVBK_KRNETLD1_vg_krnetld1_oradata_pr-creation_time = 1541871083	# Sun Nov 11 02:31:23 2018


and in archives,

vg_cgrmasq2_oradata_pr_00092-1454531420.vg:description = "Created *before* executing 'vgrename vg_cgrmasq2_oradata_pr BCVBK_CGRMASQB_vg_cgrmasq2_oradata_pr'"
vg_cgrmasq2_oradata_pr_00092-1454531420.vg-
vg_cgrmasq2_oradata_pr_00092-1454531420.vg-creation_host = "E-BRVNPD-IBK01"	# Linux E-BRVNPD-IBK01 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64
vg_cgrmasq2_oradata_pr_00092-1454531420.vg-creation_time = 1541871082	# Sun Nov 11 02:31:22 2018
--
vg_krnetld1_oradata_pr_00094-1011380638.vg:description = "Created *before* executing 'vgrename vg_krnetld1_oradata_pr BCVBK_KRNETLD1_vg_krnetld1_oradata_pr'"
vg_krnetld1_oradata_pr_00094-1011380638.vg-
vg_krnetld1_oradata_pr_00094-1011380638.vg-creation_host = "E-BRVNPD-IBK01"	# Linux E-BRVNPD-IBK01 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64
vg_krnetld1_oradata_pr_00094-1011380638.vg-creation_time = 1541871083	# Sun Nov 11 02:31:23 2018
[root@nkshirsa archive]#

Comment 6 David Teigland 2019-07-22 17:24:58 UTC
pushed to stable-2.02
https://sourceware.org/git/?p=lvm2.git;a=commit;h=3d980172b076547865efe5ca0839cabedc05a7b9

Continuous concurrent vgrename and pvs did not see any problem, although I never reproduced the problem without the fix either.

Comment 8 Corey Marthaler 2019-11-11 20:37:01 UTC
Marking verified in the latest rpms. 

Ran tens of thousands of vgrename ops of 8 different VGs with pvs looping over and over and saw none crash (both with and without lvmetad running).

3.10.0-1109.el7.x86_64

lvm2-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
lvm2-libs-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
lvm2-cluster-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
lvm2-lockd-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-libs-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-event-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-event-libs-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-persistent-data-0.8.5-1.el7    BUILT: Mon Jun 10 03:58:20 CDT 2019

Comment 10 errata-xmlrpc 2020-03-31 20:04:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1129