Hide Forgot
Description of problem: This may be related or even the same as bugs 613829 and 678645. In fact it's the exact same test case that was executed in those bugs. However, unlike bug 678645, dmeventd didn't die. This testcase appears to fail less then 10% of the time. In fact, it had passed 10 times before it failed here. Scenario: Kill disk log of non synced 3 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: nonsyncd_log_3legs_1 * sync: 0 * leg devices: /dev/sdd1 /dev/sdg1 /dev/sdf1 * log devices: /dev/sdb1 * no MDA devices: * failpv(s): /dev/sdb1 * failnode(s): taft-03 * additional snap: /dev/sdd1 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-03... taft-03: lvcreate -m 2 -n nonsyncd_log_3legs_1 -L 600M helter_skelter /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdf1:0-1000 /dev/sdb1:0-150 Creating a snapshot volume of each of the mirrors PV=/dev/sdb1 nonsyncd_log_3legs_1_mlog: 1.2 PV=/dev/sdb1 nonsyncd_log_3legs_1_mlog: 1.2 Continuing on without fully syncd mirrors, currently at... ( 42.92% ) Creating ext on top of mirror(s) on taft-03... mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on taft-03... Writing verification files (checkit) to mirror(s) on... ---- taft-03 ---- Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-03 ---- Disabling device sdb on taft-03 [DEADLOCK] [root@taft-03 ~]# ps -ef | grep dmeventd root 2318 1 0 Mar11 ? 00:24:55 /sbin/dmeventd root 9579 8802 0 10:47 pts/1 00:00:00 grep dmeventd [root@taft-03 ~]# ps -ef | grep udev root 570 1 0 Mar11 ? 00:03:34 /sbin/udevd -d root 8863 570 0 06:33 ? 00:00:00 /sbin/udevd -d root 9075 570 0 06:33 ? 00:00:00 /sbin/udevd -d root 9583 8802 0 10:47 pts/1 00:00:00 grep udev # /var/log/messages: Mar 16 06:33:19 taft-03 lvm[2318]: Mirror log status: 1 of 1 images failed - switching to core Mar 16 06:33:20 taft-03 lvm[2318]: device-mapper: waitevent ioctl failed: Interrupted system call Mar 16 06:33:20 taft-03 lvm[2318]: Another thread is handling an event. Waiting... Mar 16 06:36:20 taft-03 kernel: INFO: task kjournald:8809 blocked for more than 120 seconds. Mar 16 06:36:20 taft-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 16 06:36:20 taft-03 kernel: kjournald D ffff88021fc24500 0 8809 2 0x00000080 Mar 16 06:36:20 taft-03 kernel: ffff880202e23c40 0000000000000046 ffff880202e23c08 ffff880202e23c04 Mar 16 06:36:20 taft-03 kernel: ffff880202e23bb0 ffff88021fc24500 ffff880028216a00 00000001178a14c1 Mar 16 06:36:20 taft-03 kernel: ffff8802124f9a98 ffff880202e23fd8 0000000000010558 ffff8802124f9a98 Mar 16 06:36:20 taft-03 kernel: Call Trace: Mar 16 06:36:20 taft-03 kernel: [<ffffffff811974a0>] ? sync_buffer+0x0/0x50 Mar 16 06:36:20 taft-03 kernel: [<ffffffff814c4a53>] io_schedule+0x73/0xc0 Mar 16 06:36:20 taft-03 kernel: [<ffffffff811974e0>] sync_buffer+0x40/0x50 Mar 16 06:36:20 taft-03 kernel: [<ffffffff814c52cf>] __wait_on_bit+0x5f/0x90 Mar 16 06:36:20 taft-03 kernel: [<ffffffff811974a0>] ? sync_buffer+0x0/0x50 Mar 16 06:36:20 taft-03 kernel: [<ffffffff814c5378>] out_of_line_wait_on_bit+0x78/0x90 Mar 16 06:36:20 taft-03 kernel: [<ffffffff8108a130>] ? wake_bit_function+0x0/0x50 Mar 16 06:36:20 taft-03 kernel: [<ffffffff81197496>] __wait_on_buffer+0x26/0x30 Mar 16 06:36:20 taft-03 kernel: [<ffffffffa02fdfb1>] journal_commit_transaction+0x9c1/0x1140 [jbd] Mar 16 06:36:20 taft-03 kernel: [<ffffffff81052002>] ? finish_task_switch+0x42/0xd0 Mar 16 06:36:20 taft-03 kernel: [<ffffffff8108a0f0>] ? autoremove_wake_function+0x0/0x40 Mar 16 06:36:20 taft-03 kernel: [<ffffffff8107640b>] ? try_to_del_timer_sync+0x7b/0xe0 Mar 16 06:36:20 taft-03 kernel: [<ffffffffa03010f8>] kjournald+0xe8/0x250 [jbd] Mar 16 06:36:20 taft-03 kernel: [<ffffffff8108a0f0>] ? autoremove_wake_function+0x0/0x40 Mar 16 06:36:20 taft-03 kernel: [<ffffffffa0301010>] ? kjournald+0x0/0x250 [jbd] Mar 16 06:36:20 taft-03 kernel: [<ffffffff81089d86>] kthread+0x96/0xa0 Mar 16 06:36:20 taft-03 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Mar 16 06:36:20 taft-03 kernel: [<ffffffff81089cf0>] ? kthread+0x0/0xa0 Mar 16 06:36:20 taft-03 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Mar 16 06:36:20 taft-03 kernel: INFO: task xdoio:8819 blocked for more than 120 seconds. Version-Release number of selected component (if applicable): 2.6.32-94.el6.x86_64 lvm2-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 lvm2-libs-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 lvm2-cluster-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 udev-147-2.31.el6 BUILT: Wed Jan 26 05:39:15 CST 2011 device-mapper-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-libs-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-event-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-event-libs-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 cmirror-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011
Created attachment 485782 [details] log from taft-03
Does this bug occur without the snapshot?
I'm unable to reproduce this issue with the latest rhel6.3 rpms. I'll reopen if this bug is seen again. 2.6.32-262.el6.x86_64 lvm2-2.02.95-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 lvm2-libs-2.02.95-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 lvm2-cluster-2.02.95-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 udev-147-2.41.el6 BUILT: Thu Mar 1 13:01:08 CST 2012 device-mapper-1.02.74-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 device-mapper-libs-1.02.74-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 device-mapper-event-1.02.74-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 device-mapper-event-libs-1.02.74-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 cmirror-2.02.95-4.el6 BUILT: Wed Apr 11 09:03:19 CDT 2012 [...] ================================================================================ Iteration 10.1 started at Wed Apr 18 13:07:18 CDT 2012 ================================================================================ Scenario kill_log_non_synced_3_legs: Kill disk log of non synced 3 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: nonsyncd_log_3legs_1 * sync: 0 * striped: 0 * leg devices: /dev/sdb1 /dev/sdg1 /dev/sdc1 * log devices: /dev/sdh1 * no MDA devices: * failpv(s): /dev/sdh1 * failnode(s): taft-01 * additional snap: /dev/sdb1 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-01... taft-01: lvcreate -m 2 -n nonsyncd_log_3legs_1 -L 500M helter_skelter /dev/sdb1:0-1000 /dev/sdg1:0-1000 /dev/sdc1:0-1000 /dev/sdh1:0-150 Creating a snapshot volume of each of the mirrors [...]