Bug 688241

Summary: log device failure of mirror containing a snapshot device may lead to a deadlock
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WORKSFORME QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: agk, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-18 18:15:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 756082    
Attachments:
Description Flags
log from taft-03 none

Description Corey Marthaler 2011-03-16 16:15:41 UTC
Description of problem:
This may be related or even the same as bugs 613829 and 678645. In fact it's the exact same test case that was executed in those bugs. However, unlike bug 678645, dmeventd didn't die.

This testcase appears to fail less then 10% of the time. In fact, it had passed 10 times before it failed here.

Scenario: Kill disk log of non synced 3 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              nonsyncd_log_3legs_1
* sync:               0
* leg devices:        /dev/sdd1 /dev/sdg1 /dev/sdf1
* log devices:        /dev/sdb1
* no MDA devices:     
* failpv(s):          /dev/sdb1
* failnode(s):        taft-03
* additional snap:    /dev/sdd1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-03...
taft-03: lvcreate -m 2 -n nonsyncd_log_3legs_1 -L 600M helter_skelter /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdf1:0-1000 /dev/sdb1:0-150
Creating a snapshot volume of each of the mirrors

PV=/dev/sdb1
        nonsyncd_log_3legs_1_mlog: 1.2
PV=/dev/sdb1
        nonsyncd_log_3legs_1_mlog: 1.2

Continuing on without fully syncd mirrors, currently at...
        ( 42.92% )

Creating ext on top of mirror(s) on taft-03...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-03...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-03 ----

Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-03 ----

Disabling device sdb on taft-03

[DEADLOCK]


[root@taft-03 ~]# ps -ef | grep dmeventd
root      2318     1  0 Mar11 ?        00:24:55 /sbin/dmeventd
root      9579  8802  0 10:47 pts/1    00:00:00 grep dmeventd

[root@taft-03 ~]# ps -ef | grep udev
root       570     1  0 Mar11 ?        00:03:34 /sbin/udevd -d
root      8863   570  0 06:33 ?        00:00:00 /sbin/udevd -d
root      9075   570  0 06:33 ?        00:00:00 /sbin/udevd -d
root      9583  8802  0 10:47 pts/1    00:00:00 grep udev


# /var/log/messages:

Mar 16 06:33:19 taft-03 lvm[2318]: Mirror log status: 1 of 1 images failed - switching to core
Mar 16 06:33:20 taft-03 lvm[2318]: device-mapper: waitevent ioctl failed: Interrupted system call
Mar 16 06:33:20 taft-03 lvm[2318]: Another thread is handling an event. Waiting...
Mar 16 06:36:20 taft-03 kernel: INFO: task kjournald:8809 blocked for more than 120 seconds.
Mar 16 06:36:20 taft-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 16 06:36:20 taft-03 kernel: kjournald     D ffff88021fc24500     0  8809      2 0x00000080
Mar 16 06:36:20 taft-03 kernel: ffff880202e23c40 0000000000000046 ffff880202e23c08 ffff880202e23c04
Mar 16 06:36:20 taft-03 kernel: ffff880202e23bb0 ffff88021fc24500 ffff880028216a00 00000001178a14c1
Mar 16 06:36:20 taft-03 kernel: ffff8802124f9a98 ffff880202e23fd8 0000000000010558 ffff8802124f9a98
Mar 16 06:36:20 taft-03 kernel: Call Trace:
Mar 16 06:36:20 taft-03 kernel: [<ffffffff811974a0>] ? sync_buffer+0x0/0x50
Mar 16 06:36:20 taft-03 kernel: [<ffffffff814c4a53>] io_schedule+0x73/0xc0
Mar 16 06:36:20 taft-03 kernel: [<ffffffff811974e0>] sync_buffer+0x40/0x50
Mar 16 06:36:20 taft-03 kernel: [<ffffffff814c52cf>] __wait_on_bit+0x5f/0x90
Mar 16 06:36:20 taft-03 kernel: [<ffffffff811974a0>] ? sync_buffer+0x0/0x50
Mar 16 06:36:20 taft-03 kernel: [<ffffffff814c5378>] out_of_line_wait_on_bit+0x78/0x90
Mar 16 06:36:20 taft-03 kernel: [<ffffffff8108a130>] ? wake_bit_function+0x0/0x50
Mar 16 06:36:20 taft-03 kernel: [<ffffffff81197496>] __wait_on_buffer+0x26/0x30
Mar 16 06:36:20 taft-03 kernel: [<ffffffffa02fdfb1>] journal_commit_transaction+0x9c1/0x1140 [jbd]
Mar 16 06:36:20 taft-03 kernel: [<ffffffff81052002>] ? finish_task_switch+0x42/0xd0
Mar 16 06:36:20 taft-03 kernel: [<ffffffff8108a0f0>] ? autoremove_wake_function+0x0/0x40
Mar 16 06:36:20 taft-03 kernel: [<ffffffff8107640b>] ? try_to_del_timer_sync+0x7b/0xe0
Mar 16 06:36:20 taft-03 kernel: [<ffffffffa03010f8>] kjournald+0xe8/0x250 [jbd]
Mar 16 06:36:20 taft-03 kernel: [<ffffffff8108a0f0>] ? autoremove_wake_function+0x0/0x40
Mar 16 06:36:20 taft-03 kernel: [<ffffffffa0301010>] ? kjournald+0x0/0x250 [jbd]
Mar 16 06:36:20 taft-03 kernel: [<ffffffff81089d86>] kthread+0x96/0xa0
Mar 16 06:36:20 taft-03 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20
Mar 16 06:36:20 taft-03 kernel: [<ffffffff81089cf0>] ? kthread+0x0/0xa0
Mar 16 06:36:20 taft-03 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Mar 16 06:36:20 taft-03 kernel: INFO: task xdoio:8819 blocked for more than 120 seconds.



Version-Release number of selected component (if applicable):
2.6.32-94.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.31.el6    BUILT: Wed Jan 26 05:39:15 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011

Comment 1 Corey Marthaler 2011-03-16 16:18:53 UTC
Created attachment 485782 [details]
log from taft-03

Comment 2 Jonathan Earl Brassow 2011-09-16 15:16:14 UTC
Does this bug occur without the snapshot?

Comment 5 Corey Marthaler 2012-04-18 18:15:40 UTC
I'm unable to reproduce this issue with the latest rhel6.3 rpms. I'll reopen if this bug is seen again.

2.6.32-262.el6.x86_64
lvm2-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
lvm2-libs-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
lvm2-cluster-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
device-mapper-libs-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
device-mapper-event-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
device-mapper-event-libs-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
cmirror-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012


[...]
================================================================================
Iteration 10.1 started at Wed Apr 18 13:07:18 CDT 2012
================================================================================
Scenario kill_log_non_synced_3_legs: Kill disk log of non synced 3 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              nonsyncd_log_3legs_1
* sync:               0
* striped:            0
* leg devices:        /dev/sdb1 /dev/sdg1 /dev/sdc1
* log devices:        /dev/sdh1
* no MDA devices:     
* failpv(s):          /dev/sdh1
* failnode(s):        taft-01
* additional snap:    /dev/sdb1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-01...
taft-01: lvcreate -m 2 -n nonsyncd_log_3legs_1 -L 500M helter_skelter /dev/sdb1:0-1000 /dev/sdg1:0-1000 /dev/sdc1:0-1000 /dev/sdh1:0-150
Creating a snapshot volume of each of the mirrors
[...]