Bug 502648 - dmeventd segfaults during single node mirror device failure testing
Summary: dmeventd segfaults during single node mirror device failure testing
Keywords:
Status: CLOSED DUPLICATE of bug 502899
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2
Version: 5.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Petr Rockai
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-26 18:37 UTC by Corey Marthaler
Modified: 2010-01-12 03:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-01 12:35:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2009-05-26 18:37:45 UTC
Description of problem:
Was running the standard regression tests on the latest lvm build lvm2-2.02.46-2.el5, and saw this during helter_skelter (mirror device failure testing).


Scenario: Kill primary leg of synced 2 leg mirror(s)                            

****** Mirror hash info for this scenario ******
* name:         syncd_primary_2legs             
* sync:         1                               
* num mirrors:  1                               
* disklog:      /dev/sdd1                       
* failpv(s):    /dev/sde1                       
* leg devices:  /dev/sde1 /dev/sdg1             
************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate -m 1 -n syncd_primary_2legs_1 -L 600M helter_skelter /dev/sde1:0-1000 /dev/sdg1:0-1000 /dev/sdd1:0-150

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=6.75% )
        0/1 mirror(s) are fully synced: ( 1=73.00% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Creating ext on top of mirror(s) on taft-04...
mke2fs 1.39 (29-May-2006)                     
Mounting mirrored ext filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-04 ----                              

<start name="taft-04_1" pid="6195" time="Tue May 26 12:52:31 2009" type="cmd" />
Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
        ---- taft-04 ----

Disabling device sde on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-04
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.175171 seconds, 239 MB/s
Verifying the down conversion of the failed mirror(s)
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 629080064: Input/output error
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 629137408: Input/output error
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 0: Input/output error
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 4096: Input/output error
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 0: Input/output error
  /dev/sde1: open failed: No such device or address
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 0: Input/output error
  /dev/sde1: open failed: No such device or address
Verifying LOG device /dev/sdd1 is *NOT* in the linear(s)
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 0: Input/output error
  /dev/sde1: open failed: No such device or address
log device /dev/sdd1 should no longer be present on taft-04



May 26 12:52:56 taft-04 qarshd[30826]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_primary_2legs_1/ddfile count=10 bs=4M
May 26 12:52:56 taft-04 kernel: dmeventd[30726]: segfault at 0000000000000010 rip 00002aaaae2c0faa rsp 000000004093d520 error 4


[root@taft-04 ~]# lvs -a -o +devices
  /dev/helter_skelter/syncd_primary_2legs_1_mimage_1: read failed after 0 of 4096 at 0: Input/output error
  LV                             VG             Attr   LSize   Origin Snap%  Move Log Copy%  Convert Devices        
  LogVol00                       VolGroup00     -wi-ao  58.38G                                       /dev/sda2(0)   
  LogVol01                       VolGroup00     -wi-ao   9.75G                                       /dev/sda2(1868)
  syncd_primary_2legs_1          helter_skelter -wi-ao 600.00M                                       /dev/sdg1(0)   
  syncd_primary_2legs_1_mimage_1 helter_skelter vwi-a- 600.00M                                                      
  syncd_primary_2legs_1_mlog     helter_skelter -wi-a-   4.00M                                       /dev/sdd1(0)


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Corey Marthaler 2009-05-26 19:41:56 UTC
This is reproducible.

2.6.18-149.el5
lvm2-2.02.46-2.el5
device-mapper-1.02.32-1.el5

Comment 2 Petr Rockai 2009-05-27 06:12:38 UTC
Could you please provide a little longer excerpt from syslog, in case there are previous dmeventd messages in there? (There should at least be the one saying that monitoring for a mirror has started.)

I have tested dmeventd in upstream just before the release, although with lvconvert and not vgreduce, which could be the crucial difference (the lvconvert patch has been reverted in the last minute). It would be also great if you could substitute the dd (writing to the mirror to trig downconversion) with vgreduce --removemissing --force helter_skelter and check if that fails as well. That would definitely help with debugging (I'll try to set up things to reproduce the problem myself later, but it might be possible to fix the problem in the meantime -- and it may also turn out I cannot replicate it).

Thanks!

Comment 3 Petr Rockai 2009-05-28 18:47:32 UTC
From the backtrace in 502899, this bug may be a manifestation of the same problem, in fact. It would be great if you could catch the crashing dmeventd in gdb (by using gdb . <pid of dmeventd>, then c and wait for the crash, or make it dump core). Output of "bt full" (or in this case, "thread apply all bt full", since dmeventd is a multi-threaded program) would be very useful.

Thanks again!

Comment 4 Milan Broz 2009-06-01 12:35:59 UTC
I think the problem is the same as in bug 502899 (the readahead calculation have apparent bug when operatin during mirror conversion and error segment), marking as duplicate.

*** This bug has been marked as a duplicate of bug 502899 ***


Note You need to log in before you can comment on or make changes to this bug.