RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1035044 - pvs -a segfault during cluster mirror device failure testing
Summary: pvs -a segfault during cluster mirror device failure testing
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: BrassowRHEL7Bugs
TreeView+ depends on / blocked
 
Reported: 2013-11-26 23:23 UTC by Corey Marthaler
Modified: 2021-09-08 20:20 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-04-10 21:59:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2013-11-26 23:23:53 UTC
Description of problem:
================================================================================
Iteration 0.4 started at Tue Nov 26 16:04:26 CST 2013
================================================================================
Scenario kill_primary_synced_core_log_4_legs: Kill primary leg of synced core log 4 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_primary_core_4legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sda1 /dev/sde1 /dev/sdd1 /dev/sdb1
* log devices:        
* no MDA devices:     
* failpv(s):          /dev/sda1
* failnode(s):        virt-123.cluster-qe.lab.eng.brq.redhat.com virt-124.cluster-qe.lab.eng.brq.redhat.com virt-125.cluster-qe.lab.eng.brq.redhat.com
* lvmetad:            0
* leg fault policy:   remove
* log fault policy:   allocate
******************************************************

Creating mirror(s) on virt-123.cluster-qe.lab.eng.brq.redhat.com...
virt-123.cluster-qe.lab.eng.brq.redhat.com: lvcreate --mirrorlog core -m 3 -n syncd_primary_core_4legs_1 -L 500M helter_skelter /dev/sda1:0-1000 /dev/sde1:0-1000 /dev/sdd1:0-1000 /dev/sdb1:0-1000

Current mirror/raid device structure(s):
  LV                                    Attr       LSize   Cpy%Sync Devices
   syncd_primary_core_4legs_1            mwi-a-m--- 500.00m     6.40 syncd_primary_core_4legs_1_mimage_0(0),syncd_primary_core_4legs_1_mimage_1(0),syncd_primary_core_4legs_1_mimage_2(0),syncd_primary_core_4legs_1_mimage_3(0)
   [syncd_primary_core_4legs_1_mimage_0] Iwi-aom--- 500.00m          /dev/sda1(0)
   [syncd_primary_core_4legs_1_mimage_1] Iwi-aom--- 500.00m          /dev/sde1(0)
   [syncd_primary_core_4legs_1_mimage_2] Iwi-aom--- 500.00m          /dev/sdd1(0)
   [syncd_primary_core_4legs_1_mimage_3] Iwi-aom--- 500.00m          /dev/sdb1(0)

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 92.60% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating gfs2 on top of mirror(s) on virt-123.cluster-qe.lab.eng.brq.redhat.com...
mkfs.gfs2 -J 32M -j 3 -p lock_dlm -t STSRHTS22168:gfs1 /dev/helter_skelter/syncd_primary_core_4legs_1 -O
Mounting mirrored gfs2 filesystems on virt-123.cluster-qe.lab.eng.brq.redhat.com...
Mounting mirrored gfs2 filesystems on virt-124.cluster-qe.lab.eng.brq.redhat.com...
Mounting mirrored gfs2 filesystems on virt-125.cluster-qe.lab.eng.brq.redhat.com...

PV=/dev/sda1
        syncd_primary_core_4legs_1_mimage_0: 6
PV=/dev/sda1
        syncd_primary_core_4legs_1_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- virt-123.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-124.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-125.cluster-qe.lab.eng.brq.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- virt-123.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-124.cluster-qe.lab.eng.brq.redhat.com ----
        ---- virt-125.cluster-qe.lab.eng.brq.redhat.com ----

Disabling device sda on virt-123.cluster-qe.lab.eng.brq.redhat.com
Disabling device sde on virt-124.cluster-qe.lab.eng.brq.redhat.com
Disabling device sdc on virt-125.cluster-qe.lab.eng.brq.redhat.com

sh: line 1: 32516 Segmentation fault      (core dumped) qarsh root.lab.eng.brq.redhat.com pvs -a > /dev/null 2>&1
simple pvs failed/segfaulted possible regression of BZ 571963
FI_engine: inject() method failed


Nov 26 23:06:53 virt-123 qarshd[24087]: Running cmdline: echo offline > /sys/block/sda/device/state &
Nov 26 23:06:54 virt-123 kernel: [82926.110209] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.112492] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.114695] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.116896] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:54 virt-123 kernel: [82926.119071] sd 3:0:0:1: rejecting I/O to offline device
Nov 26 23:06:55 virt-123 lvm[21428]: Primary mirror device 253:2 has failed (D).
Nov 26 23:06:55 virt-123 lvm[21428]: Device failure in helter_skelter-syncd_primary_core_4legs_1.
[...]
Nov 26 23:06:56 virt-123 cmirrord[13471]: cpg_dispatch failed: CS_ERR_BAD_HANDLE
Nov 26 23:06:56 virt-123 lvm[21428]: Monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:06:58 virt-123 cmirrord[13471]: cpg_dispatch failed: CS_ERR_BAD_HANDLE
Nov 26 23:07:01 virt-123 lvm[21428]: Monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:01 virt-123 systemd: Starting qarsh Per-Connection Server...
Nov 26 23:07:01 virt-123 systemd: Started qarsh Per-Connection Server.
Nov 26 23:07:01 virt-123 qarshd[24141]: Talking to peer ::ffff:10.15.80.47:43496 (IPv6)
Nov 26 23:07:01 virt-123 qarshd[24141]: Running cmdline: pvs -a
Nov 26 23:07:02 virt-123 lvm[21428]: Mirror status: 1 of 4 images failed.
Nov 26 23:07:02 virt-123 lvm[21428]: Trying to up-convert to 3 images, 0 logs.
Nov 26 23:07:13 virt-123 kernel: [82944.896173] device-mapper: dm-log-userspace: [dyiCP3yC] Request timed out: [9/296085] - retrying
Nov 26 23:07:15 virt-123 cmirrord[13471]: cpg_dispatch failed: CS_ERR_BAD_HANDLE
Nov 26 23:07:15 virt-123 kernel: [82946.642100] pvs[24142]: segfault at 3 ip 00000000004a897c sp 00007fff31ef0ed0 error 4 in lvm[400000+102000]
Nov 26 23:07:15 virt-123 abrt[24154]: Saved core dump of pid 24142 (/usr/sbin/lvm) to /var/tmp/abrt/ccpp-2013-11-26-23:07:15-24142 (1441792 bytes)
Nov 26 23:07:15 virt-123 abrtd: New client connected
Nov 26 23:07:18 virt-123 lvm[21428]: Monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:19 virt-123 lvm[21428]: Repair of mirrored device helter_skelter-syncd_primary_core_4legs_1 finished successfully.
Nov 26 23:07:20 virt-123 lvm[21428]: No longer monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:22 virt-123 lvm[21428]: No longer monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:24 virt-123 lvm[21428]: No longer monitoring mirror device helter_skelter-syncd_primary_core_4legs_1 for events.
Nov 26 23:07:54 virt-123 lvm[21428]: helter_skelter-syncd_primary_core_4legs_1 is now in-sync.

Core was generated by `pvs -a'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004a897c in _parse_mirror_status (mirror_status_str=<optimized out>, images_health=images_health@entry=0x7fff31ef1090, 
    log_dev=log_dev@entry=0x7fff31ef1088, log_health=log_health@entry=0x7fff31ef1098) at activate/dev_manager.c:184
184             if (!strcmp(log_args[0], "disk")) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-36.el7.x86_64 libgcc-4.8.2-3.el7.x86_64 libselinux-2.1.13-21.el7.x86_64 libsepol-2.1.9-1.el7.x86_64 ncurses-libs-5.9-11.20130511.el7.x86_64 pcre-8.32-8.el7.x86_64 readline-6.2-6.el7.x86_64 systemd-libs-207-8.el7.x86_64
(gdb) bt
#0  0x00000000004a897c in _parse_mirror_status (mirror_status_str=<optimized out>, images_health=images_health@entry=0x7fff31ef1090, 
    log_dev=log_dev@entry=0x7fff31ef1088, log_health=log_health@entry=0x7fff31ef1098) at activate/dev_manager.c:184
#1  0x00000000004aaadd in _ignore_blocked_mirror_devices (mirror_status_str=<optimized out>, length=1024000, start=0, dev=0xfe3518)
    at activate/dev_manager.c:250
#2  _device_is_usable (dev=dev@entry=0xfe3518, check_lv_names=check_lv_names@entry=1) at activate/dev_manager.c:401
#3  0x00000000004ab34a in device_is_usable (dev=dev@entry=0xfe3518) at activate/dev_manager.c:460
#4  0x000000000045632d in _lookup_p (f=<optimized out>, dev=0xfe3518) at filters/filter-persistent.c:291
#5  0x000000000045036a in dev_iter_get (iter=iter@entry=0xfdc100) at device/dev-cache.c:1011
#6  0x00000000004344fe in _process_all_devs (process_single_pv=0x431480 <_pvs_single>, handle=0xfcddf0, cmd=0xf78060) at toollib.c:763
#7  process_each_pv (cmd=cmd@entry=0xf78060, argc=argc@entry=0, argv=argv@entry=0x7fff31ef16e8, vg=vg@entry=0x0, flags=0, 
    scan_label_only=<optimized out>, handle=0xfcddf0, process_single_pv=0x431480 <_pvs_single>) at toollib.c:934
#8  0x0000000000431e95 in _report (cmd=0xf78060, argc=0, argv=0x7fff31ef16e8, report_type=PVS) at reporter.c:408
#9  0x0000000000429b08 in lvm_run_command (cmd=cmd@entry=0xf78060, argc=0, argc@entry=2, argv=0x7fff31ef16e8, argv@entry=0x7fff31ef16d8)
    at lvmcmdline.c:1179
#10 0x000000000042a049 in lvm2_main (argc=2, argv=0x7fff31ef16d8) at lvmcmdline.c:1614
#11 0x00000036aba21af5 in __libc_start_main () from /lib64/libc.so.6
#12 0x0000000000416771 in _start ()


Version-Release number of selected component (if applicable):
3.10.0-54.el7.x86_64

lvm2-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
lvm2-libs-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
lvm2-cluster-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-libs-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-event-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-event-libs-1.02.82-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013
device-mapper-persistent-data-0.2.8-2.el7    BUILT: Wed Oct 30 16:20:48 CET 2013
cmirror-2.02.103-5.el7    BUILT: Tue Nov 12 13:36:17 CET 2013

Comment 1 Jonathan Earl Brassow 2014-03-10 19:22:05 UTC
This issue doesn't seem reproducible.  If it is, we can fix in one of two ways:
1) check for out-of-bounds parameters in '_parse_mirror_status' to prevent the segfault
2) find out the reason why this code path is being traversed at all in a way that leads to an error condition and fix that.

If it isn't reproducible, the bug can be closed.  I'm not closing the bug now because I think the bug needs more time in testing.  However, it shouldn't be considered a blocker unless it can be reproduced.


Note You need to log in before you can comment on or make changes to this bug.