Bug 1035044

Summary: pvs -a segfault during cluster mirror device failure testing
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WORKSFORME Docs Contact:
Severity: high    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-10 21:59:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1075262    

Description Corey Marthaler 2013-11-26 23:23:53 UTC
Description of problem:
================================================================================
Iteration 0.4 started at Tue Nov 26 16:04:26 CST 2013
================================================================================
Scenario kill_primary_synced_core_log_4_legs: Kill primary leg of synced core log 4 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_primary_core_4legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sda1 /dev/sde1 /dev/sdd1 /dev/sdb1
* log devices:        
* no MDA devices:     
* failpv(s):          /dev/sda1
* failnode(s):        virt-123.cluster-qe.lab.eng.brq.redhat.com virt-124.cluster-qe.lab.eng.brq.redhat.com virt-125.cluster-qe.lab.eng.brq.redhat.com
* lvmetad:            0
* leg fault policy:   remove
* log fault policy:   allocate
******************************************************

Creating mirror(s) on virt-123.cluster-qe.lab.eng.brq.redhat.com...
virt-123.cluster-qe.lab.eng.brq.redhat.com: lvcreate --mirrorlog core -m 3 -n syncd_primary_core_4legs_1 -L 500M helter_skelter /dev/sda1:0-1000 /dev/sde1:0-1000 /dev/sdd1:0-1000 /dev/sdb1:0-1000

Current mirror/raid device structure(s):
  LV                                    Attr       LSize   Cpy%Sync Devices
   syncd_primary_core_4legs_1            mwi-a-m--- 500.00m     6.40 syncd_primary_core_4legs_1_mimage_0(0),syncd_primary_core_4legs_1_mimage_1(0),syncd_primary_core_4legs_1_mimage_2(0),syncd_primary_core_4legs_1_mimage_3(0)
   [syncd_primary_core_4legs_1_mimage_0] Iwi-aom--- 500.00m          /dev/sda1(0)
   [syncd_primary_core_4legs_1_mimage_1] Iwi-aom--- 500.00m          /dev/sde1(0)
   [syncd_primary_core_4legs_1_mimage_2] Iwi-aom--- 500.00m          /dev/sdd1(0)
   [syncd_primary_core_4legs_1_mimage_3] Iwi-aom--- 500.00m          /dev/sdb1(0)

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 92.60% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating gfs2 on top of mirror(s) on virt-123.cluster-qe.lab.eng.brq.redhat.com...
mkfs.gfs2 -J 32M -j 3 -p lock_dlm -t STSRHTS22168:gfs1 /dev/helter_skelter/syncd_primary_core_4legs_1 -O
Mounting mirrored gfs2 filesystems on virt-123.cluster-qe.lab.eng.brq.redhat.com...
Mounting mirrored gfs2 filesystems on virt-124.cluster-qe.lab.eng.brq.redhat.com...
Mounting mirrored gfs2 filesystems on virt-125.cluster-qe.lab.eng.brq.redhat.com...

PV=/dev/sda1
        syncd_primary_core_4legs_1_mimage_0: 6
PV=/dev/sda1
        syncd_primary_core_4legs_1_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- virt-123.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-124.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-125.cluster-qe.lab.eng.brq.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- virt-123.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-124.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-125.cluster-qe.lab.eng.brq.redhat.com ----

Disabling device sda on virt-123.cluster-qe.lab.eng.brq.redhat.com
Disabling device sde on virt-124.cluster-qe.lab.eng.brq.redhat.com
Disabling device sdc on virt-125.cluster-qe.lab.eng.brq.redhat.com

sh: line 1: 32516 Segmentation fault      (core dumped) qarsh root.lab.eng.brq.redhat.com pvs -a > /dev/null 2>&1
simple pvs failed/segfaulted possible regression of BZ 571963
FI_engine: inject() method failed


Nov 26 23:06:53 virt-123 qarshd[24087]: Running cmdline: echo offline > /sys/block/sda/device/state &
Nov 26 23:06:54 virt-123 kernel: [82926.110209] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.112492] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.114695] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.116896] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.119071] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:55 virt-123 lvm[21428]: Primary mirror device 253:2 has failed (D).
Nov 26 23:06:55 virt-123 lvm[21428]: Device failure in helter_skelter-syncd_primary_core_4legs_1.
[...]
Nov 26 23:06:56 virt-123 cmirrord[13471]: cpg_dispatch failed: CS_ERR_BAD_HANDLE
Nov 26 23:06:56 virt-123 lvm[21428]: Monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:06:58 virt-123 cmirrord[13471]: cpg_dispatch failed: CS_ERR_BAD_HANDLE
Nov 26 23:07:01 virt-123 lvm[21428]: Monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:01 virt-123 systemd: Starting qarsh Per-Connection Server...
Nov 26 23:07:01 virt-123 systemd: Started qarsh Per-Connection Server.
Nov 26 23:07:01 virt-123 qarshd[24141]: Talking to peer ::ffff:10.15.80.47:43496 (IPv6)
Nov 26 23:07:01 virt-123 qarshd[24141]: Running cmdline: pvs -a
Nov 26 23:07:02 virt-123 lvm[21428]: Mirror status: 1 of 4 images failed.
Nov 26 23:07:02 virt-123 lvm[21428]: Trying to up-convert to 3 images, 0 logs.
Nov 26 23:07:13 virt-123 kernel: [82944.896173] device-mapper: dm-log-userspace: [dyiCP3yC] Request timed out: [9/296085] - retrying
Nov 26 23:07:15 virt-123 cmirrord[13471]: cpg_dispatch failed: CS_ERR_BAD_HANDLE
Nov 26 23:07:15 virt-123 kernel: [82946.642100] pvs[24142]: segfault at 3 ip 00000000004a897c sp 00007fff31ef0ed0 error 4 in lvm[400000+102000]
Nov 26 23:07:15 virt-123 abrt[24154]: Saved core dump of pid 24142 (/usr/sbin/lvm) to /var/tmp/abrt/ccpp-2013-11-26-23:07:15-24142 (1441792 bytes)
Nov 26 23:07:15 virt-123 abrtd: New client connected
Nov 26 23:07:18 virt-123 lvm[21428]: Monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:19 virt-123 lvm[21428]: Repair of mirrored device helter_skelter-syncd_primary_core_4legs_1 finished successfully.
Nov 26 23:07:20 virt-123 lvm[21428]: No longer monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:22 virt-123 lvm[21428]: No longer monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:24 virt-123 lvm[21428]: No longer monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:54 virt-123 lvm[21428]: helter_skelter-syncd_primary_core_4legs_1 is now in-sync.

Core was generated by `pvs -a'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004a897c in _parse_mirror_status (mirror_status_str=<optimized out>, images_health=images_health@entry=0x7fff31ef1090, 
    log_dev=log_dev@entry=0x7fff31ef1088, log_health=log_health@entry=0x7fff31ef1098) at activate/dev_manager.c:184
184             if (!strcmp(log_args[0], "disk")) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-36.el7.x86_64 libgcc-4.8.2-3.el7.x86_64 libselinux-2.1.13-21.el7.x86_64 libsepol-2.1.9-1.el7.x86_64 ncurses-libs-5.9-11.20130511.el7.x86_64 pcre-8.32-8.el7.x86_64 readline-6.2-6.el7.x86_64 systemd-libs-207-8.el7.x86_64
(gdb) bt
#0  0x00000000004a897c in _parse_mirror_status (mirror_status_str=<optimized out>, images_health=images_health@entry=0x7fff31ef1090, 
    log_dev=log_dev@entry=0x7fff31ef1088, log_health=log_health@entry=0x7fff31ef1098) at activate/dev_manager.c:184
#1  0x00000000004aaadd in _ignore_blocked_mirror_devices (mirror_status_str=<optimized out>, length=1024000, start=0, dev=0xfe3518)
    at activate/dev_manager.c:250
#2  _device_is_usable (dev=dev@entry=0xfe3518, check_lv_names=check_lv_names@entry=1) at activate/dev_manager.c:401
#3  0x00000000004ab34a in device_is_usable (dev=dev@entry=0xfe3518) at activate/dev_manager.c:460
#4  0x000000000045632d in _lookup_p (f=<optimized out>, dev=0xfe3518) at filters/filter-persistent.c:291
#5  0x000000000045036a in dev_iter_get (iter=iter@entry=0xfdc100) at device/dev-cache.c:1011
#6  0x00000000004344fe in _process_all_devs (process_single_pv=0x431480 <_pvs_single>, handle=0xfcddf0, cmd=0xf78060) at toollib.c:763
#7  process_each_pv (cmd=cmd@entry=0xf78060, argc=argc@entry=0, argv=argv@entry=0x7fff31ef16e8, vg=vg@entry=0x0, flags=0, 
    scan_label_only=<optimized out>, handle=0xfcddf0, process_single_pv=0x431480 <_pvs_single>) at toollib.c:934
#8  0x0000000000431e95 in _report (cmd=0xf78060, argc=0, argv=0x7fff31ef16e8, report_type=PVS) at reporter.c:408
#9  0x0000000000429b08 in lvm_run_command (cmd=cmd@entry=0xf78060, argc=0, argc@entry=2, argv=0x7fff31ef16e8, argv@entry=0x7fff31ef16d8)
    at lvmcmdline.c:1179
#10 0x000000000042a049 in lvm2_main (argc=2, argv=0x7fff31ef16d8) at lvmcmdline.c:1614
#11 0x00000036aba21af5 in __libc_start_main () from /lib64/libc.so.6
#12 0x0000000000416771 in _start ()


Version-Release number of selected component (if applicable):
3.10.0-54.el7.x86_64

lvm2-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
lvm2-libs-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
lvm2-cluster-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-libs-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-event-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-event-libs-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-persistent-data-0.2.8-2.el7    BUILT: Wed Oct 30 16:20:48 CET 2013
cmirror-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013

Comment 1 Jonathan Earl Brassow 2014-03-10 19:22:05 UTC
This issue doesn't seem reproducible.  If it is, we can fix in one of two ways:
1) check for out-of-bounds parameters in '_parse_mirror_status' to prevent the segfault
2) find out the reason why this code path is being traversed at all in a way that leads to an error condition and fix that.

If it isn't reproducible, the bug can be closed.  I'm not closing the bug now because I think the bug needs more time in testing.  However, it shouldn't be considered a blocker unless it can be reproduced.