Bug 1125849 - dmeventd[10531]: segfault at 50 ip 00007f1f2e1660e7 sp 00007f1f2f0fc398 error 4 in libc-2.12.so[7f1f2e0d8000+18b000] - dev is NULL
Summary: dmeventd[10531]: segfault at 50 ip 00007f1f2e1660e7 sp 00007f1f2f0fc398 error...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Petr Rockai
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-01 09:02 UTC by Marian Csontos
Modified: 2014-10-14 08:25 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Not applicable, the bug arose during development and was fixed before a release.
Clone Of:
Environment:
Last Closed: 2014-10-14 08:25:40 UTC


Attachments (Terms of Use)
bt full (11.53 KB, text/plain)
2014-08-01 12:38 UTC, Marian Csontos
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1387 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2014-10-14 01:39:47 UTC

Description Marian Csontos 2014-08-01 09:02:30 UTC
Description of problem:
Segfaulting dmeventd.

Version-Release number of selected component (if applicable):
lvm2-2.02.108-1.el6.x86_64
kernel-2.6.32-494.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
sda1 is a leg in RAID LV. Though likely irrelevant the raid_fault_policy is warn.
Executing these two commands:
    echo offline > /sys/block/sda/device/state
    pvscan --cache /dev/sda1
Results in segfault in dmeventd.

Messages:

Jul 31 17:41:44 barb-02c1-node01 kernel: sd 2:0:0:0: rejecting I/O to offline device
Jul 31 17:41:44 barb-02c1-node01 kernel: end_request: I/O error, dev sda, sector 0
Jul 31 17:41:44 barb-02c1-node01 kernel: sd 2:0:0:0: rejecting I/O to offline device
Jul 31 17:41:44 barb-02c1-node01 kernel: end_request: I/O error, dev sda, sector 0
Jul 31 17:41:44 barb-02c1-node01 kernel: sd 2:0:0:0: rejecting I/O to offline device
Jul 31 17:41:44 barb-02c1-node01 kernel: md: super_written gets error=-5, uptodate=0
Jul 31 17:41:44 barb-02c1-node01 kernel: md/raid10:mdX: Disk failure on dm-9, disabling device.
Jul 31 17:41:44 barb-02c1-node01 kernel: md/raid10:mdX: Operation continuing on 5 devices.
Jul 31 17:41:44 barb-02c1-node01 lvm[10528]: Device #4 of raid10 array, black_bird-synced_random_raid10_3legs_1, has failed.
Jul 31 17:41:44 barb-02c1-node01 kernel: sd 2:0:0:0: rejecting I/O to offline device
Jul 31 17:41:44 barb-02c1-node01 kernel: md: super_written gets error=-5, uptodate=0
Jul 31 17:41:44 barb-02c1-node01 lvm[10528]: PV mkHEgE-rMpc-w6eY-3y0r-Qce7-Cy1G-COvwAi not recognised. Is the device missing?
Jul 31 17:41:44 barb-02c1-node01 kernel: dmeventd[10531]: segfault at 50 ip 00007f1f2e1660e7 sp 00007f1f2f0fc398 error 4 in libc-2.12.so[7f1f2e0d8000+18b000]

Comment 2 Marian Csontos 2014-08-01 12:38:02 UTC
Created attachment 923286 [details]
bt full

Comment 3 Marian Csontos 2014-08-01 13:13:39 UTC
Nened just verified this happens only when running the two commands in close sequence like this:

    echo offline > /sys/block/sda/device/state && pvscan --cache /dev/sda1

May not be very likely to happen... Could it be inherent to ANY parallel scans?

Comment 4 Petr Rockai 2014-08-04 12:38:30 UTC
No, other parallel scans are unaffected -- it is only lvscan that is susceptible. This is because lvscan relies on device information from lvmetad. A lvmetad debug log would be very useful here (with -l wire) -- but as far as I can tell, the problem is not parallel scans at all -- dmeventd should crash the same way when any LV that is already partial is rescanned. I'll write a testcase for that and see if that's the case.

The problem is that lvscan --cache uses device information from lvmetad, which may be already partial if pvscan --cache finishes before lvscan --cache can read the device info. In this case, lvscan --cache should ignore devices that are already known to be missing, instead of trying to scan them again (and encountering a null pointer).

Comment 6 Nenad Peric 2014-08-08 13:50:34 UTC

[root@tardis-01 ~]# echo offline > /sys/block/sdb/device/state && pvscan --cache /dev/sdb1
  /dev/sdb1: open failed: No such device or address
  No PV label found on /dev/sdb1.
[root@tardis-01 ~]# lvs
  PV lkU8Mh-SKZb-zhGM-pf7T-vKR6-Wnn1-JsHjy1 not recognised. Is the device missing?
  PV 8ozawT-pdg3-6sPH-e9xv-UXyc-5kz4-hqPOa0 not recognised. Is the device missing?
  PV lkU8Mh-SKZb-zhGM-pf7T-vKR6-Wnn1-JsHjy1 not recognised. Is the device missing?
  PV 8ozawT-pdg3-6sPH-e9xv-UXyc-5kz4-hqPOa0 not recognised. Is the device missing?
  LV      VG          Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  d_lv    vg          -wi-a---p-   2.00g                                                    
  raid1   vg          rwi---r-p-   1.00g                                                    
  lv_home vg_tardis01 -wi-ao---- 224.88g                                                    
  lv_root vg_tardis01 -wi-ao----  50.00g                                                    
  lv_swap vg_tardis01 -wi-ao----   4.00g            

[root@tardis-01 ~]# ps -ef | grep dmevent
root      3824     1  0 15:45 ?        00:00:00 /sbin/dmeventd
root      3907  3655  0 15:50 pts/0    00:00:00 grep dmevent




Marking VERIFIED with:
2.6.32-495.el6.x86_64

lvm2-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
lvm2-libs-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
lvm2-cluster-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
udev-147-2.57.el6    BUILT: Thu Jul 24 15:48:47 CEST 2014
device-mapper-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-libs-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-event-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-event-libs-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 15:43:06 CEST 2014
cmirror-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014

Comment 7 errata-xmlrpc 2014-10-14 08:25:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1387.html


Note You need to log in before you can comment on or make changes to this bug.