Bug 682309

Summary: stuck kmirrord processes during mkfs
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WONTFIX QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: agk, cmarthal, coughlan, djansa, dwysocha, heinzm, jbrassow, mike.miller, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-08 16:08:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 756082    

Description Corey Marthaler 2011-03-04 20:01:05 UTC
Description of problem:
Hit this issue while attempting to mkfs a HA LVM mirror.

Test Output:
Creating PVs out of /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3 /dev/etherd/e1.1p4 /dev/etherd/e1.1p5
Creating VG out of /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3 /dev/etherd/e1.1p4 /dev/etherd/e1.1p5
Creating LV(s) on VG HAYES (with hayes-01 tag for activation)
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!

Creating ext3 filesystems on LV(s)
mke2fs 1.41.12 (17-May-2010)

[root@hayes-01 ~]# lvs -a -o +devices
[appears stuck]

[root@hayes-02 ~]# lvs -a -o +devices
  LV             VG         Attr   LSize  Log      Copy%  Devices
  ha1            HAYES      Mwi--- 25.00g ha1_mlog        ha1_mimage_0(0),ha1_mimage_1(0)
  [ha1_mimage_0] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p1(0)
  [ha1_mimage_1] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p2(0)
  [ha1_mlog]     HAYES      lwi---  4.00m                 /dev/etherd/e1.1p5(0)
  ha2            HAYES      Mwi--- 25.00g ha2_mlog        ha2_mimage_0(0),ha2_mimage_1(0)
  [ha2_mimage_0] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p1(6400)
  [ha2_mimage_1] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p2(6400)
  [ha2_mlog]     HAYES      lwi---  4.00m                 /dev/etherd/e1.1p5(1)


aoe: 0030488d63d2 e1.1 v4467 has 19046937725 sectors
 etherd/e1.1: p1 p2 p3 p4 p5
DLM (built Feb 22 2011 11:17:55) installed
INFO: task kmirrord:2436 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kmirrord      D 0000000000000003     0  2436      2 0x00000080
 ffff880219f1fb40 0000000000000046 0000000000000000 ffff8801e3728e00
 ffff880219f1fb60 ffffffffa0009413 0000000019f1fb70 00000000ffff3f2a
 ffff880219198638 ffff880219f1ffd8 000000000000f558 ffff880219198638
Call Trace:
 [<ffffffffa0009413>] ? dispatch_io+0x233/0x260 [dm_mod]
 [<ffffffff814d9663>] io_schedule+0x73/0xc0
 [<ffffffffa00094d5>] sync_io+0x95/0x110 [dm_mod]
 [<ffffffffa0009777>] ? dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0008fa0>] ? bvec_next_page+0x0/0x10 [dm_mod]
 [<ffffffffa0322209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
 [<ffffffffa0009777>] dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0009040>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffffa0008fb0>] ? vm_next_page+0x0/0x30 [dm_mod]
 [<ffffffffa02b1781>] disk_flush+0x91/0x170 [dm_log]
 [<ffffffff8110f34e>] ? mempool_kfree+0xe/0x10
 [<ffffffff8110f4b5>] ? mempool_free+0x95/0xa0
 [<ffffffffa02dbe63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffffa032297d>] do_mirror+0xad/0x6e0 [dm_mirror]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffff810883b0>] worker_thread+0x170/0x2a0
 [<ffffffff8108dce0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088240>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108d976>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108d8e0>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
INFO: task kmirrord:2436 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kmirrord      D 0000000000000003     0  2436      2 0x00000080
 ffff880219f1fb40 0000000000000046 0000000000000000 ffff8801df137c80
 ffff880219f1fb60 ffffffffa0009413 0000000019f1fb70 000000010004031b
 ffff880219198638 ffff880219f1ffd8 000000000000f558 ffff880219198638
Call Trace:
 [<ffffffffa0009413>] ? dispatch_io+0x233/0x260 [dm_mod]
 [<ffffffff814d9663>] io_schedule+0x73/0xc0
 [<ffffffffa00094d5>] sync_io+0x95/0x110 [dm_mod]
 [<ffffffffa0009777>] ? dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0008fa0>] ? bvec_next_page+0x0/0x10 [dm_mod]
 [<ffffffffa0322209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
 [<ffffffffa0009777>] dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0009040>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffffa0008fb0>] ? vm_next_page+0x0/0x30 [dm_mod]
 [<ffffffffa02b1781>] disk_flush+0x91/0x170 [dm_log]
 [<ffffffff8110f34e>] ? mempool_kfree+0xe/0x10
 [<ffffffff8110f4b5>] ? mempool_free+0x95/0xa0
 [<ffffffffa02dbe63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffffa032297d>] do_mirror+0xad/0x6e0 [dm_mirror]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffff810883b0>] worker_thread+0x170/0x2a0
 [<ffffffff8108dce0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088240>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108d976>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108d8e0>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
INFO: task lvs:2559 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lvs           D 0000000000000002     0  2559   2510 0x00000080
 ffff880117ba9b88 0000000000000086 0000000000000000 00000000000000d0
 ffff880117ba9b38 000000000d2a122a ffff880117ba9b78 000000010004034b
 ffff880119285078 ffff880117ba9fd8 000000000000f558 ffff880119285078
Call Trace:
 [<ffffffff814d9663>] io_schedule+0x73/0xc0
 [<ffffffff811aa49e>] __blockdev_direct_IO+0x70e/0xc40
 [<ffffffff811a80e7>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811a72d0>] ? blkdev_get_blocks+0x0/0xc0
 [<ffffffff8110eb8b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff8120ae21>] ? avc_has_perm+0x71/0x90
 [<ffffffff8120494f>] ? security_inode_permission+0x1f/0x30
 [<ffffffff81170a9a>] do_sync_read+0xfa/0x140
 [<ffffffff8108dce0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811a769c>] ? block_ioctl+0x3c/0x40
 [<ffffffff81183412>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff8121037b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff812037e6>] ? security_file_permission+0x16/0x20
 [<ffffffff811714c5>] vfs_read+0xb5/0x1a0
 [<ffffffff810d1652>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81171601>] sys_read+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b


Version-Release number of selected component (if applicable):
2.6.32-118.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.34.el6    BUILT: Wed Feb 16 08:09:21 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011

Comment 1 Zdenek Kabelac 2011-03-07 16:26:14 UTC
Could you please attach output of: dmsetup info.

Looks like some node stayed suspended.

Comment 5 Corey Marthaler 2011-07-26 21:01:31 UTC
Just reproduced this again.

Jul 26 15:15:47 hayes-01 qarshd[3295]: Running cmdline: mkfs.ext3 /dev/HAYES/ha1
Jul 26 15:19:32 hayes-01 kernel: INFO: task kmirrord:2431 blocked for more than 120 seconds.
Jul 26 15:19:32 hayes-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 15:19:32 hayes-01 kernel: kmirrord      D 0000000000000002     0  2431      2 0x00000080
Jul 26 15:19:32 hayes-01 kernel: ffff88011a6c3b40 0000000000000046 0000000000000000 ffff8800be8fd6c0
Jul 26 15:19:32 hayes-01 kernel: ffff88011a6c3b60 ffffffffa00094c3 000000001a6c3b70 000000010004dc5f
Jul 26 15:19:32 hayes-01 kernel: ffff8801197a5078 ffff88011a6c3fd8 000000000000f598 ffff8801197a5078
Jul 26 15:19:32 hayes-01 kernel: Call Trace:
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa00094c3>] ? dispatch_io+0x233/0x260 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff814dbad3>] io_schedule+0x73/0xc0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009585>] sync_io+0x95/0x110 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009827>] ? dm_io+0x1b7/0x1c0 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009050>] ? bvec_next_page+0x0/0x10 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0324209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009827>] dm_io+0x1b7/0x1c0 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa00090f0>] ? vm_get_page+0x0/0x70 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009060>] ? vm_next_page+0x0/0x30 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa02b3781>] disk_flush+0x91/0x170 [dm_log]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8110f95e>] ? mempool_kfree+0xe/0x10
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8110fac5>] ? mempool_free+0x95/0xa0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa02dde63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa03248d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa032497d>] do_mirror+0xad/0x6e0 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa03248d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff81088830>] worker_thread+0x170/0x2a0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8108ddf6>] kthread+0x96/0xa0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8108dd60>] ? kthread+0x0/0xa0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Jul 26 15:21:32 hayes-01 kernel: INFO: task kmirrord:2431 blocked for more than 120 seconds.



Looks like all the devices are "ACTIVE":
[root@hayes-01 ~]# dmsetup info | grep State
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE

Comment 6 Corey Marthaler 2011-08-03 16:31:35 UTC
Bumping priority now that this is being seen more often.

Creating LV(s) on VG HAYES (with hayes-01 tag for activation)
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Creating ext3 filesystems on LV(s)
mke2fs 1.41.12 (17-May-2010)
mke2fs 1.41.12 (17-May-2010)


INFO: task kmirrord:2725 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kmirrord      D 0000000000000002     0  2725      2 0x00000080
 ffff88021af79b40 0000000000000046 0000000000000000 ffff8800d9d80800
 ffff88021af79b60 ffffffffa00095a3 000000001af79b70 ffffffffa00091d0
 ffff88021a8f3078 ffff88021af79fd8 000000000000f4e8 ffff88021a8f3078
Call Trace:
 [<ffffffffa00095a3>] ? dispatch_io+0x233/0x260 [dm_mod]
 [<ffffffffa00091d0>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffff814e0943>] io_schedule+0x73/0xc0
 [<ffffffffa0009665>] sync_io+0x95/0x110 [dm_mod]
 [<ffffffffa0009907>] ? dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa033b209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
 [<ffffffffa0009907>] dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa00091d0>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffffa0009140>] ? vm_next_page+0x0/0x30 [dm_mod]
 [<ffffffffa02e9781>] disk_flush+0x91/0x170 [dm_log]
 [<ffffffff8111002e>] ? mempool_kfree+0xe/0x10
 [<ffffffff81110195>] ? mempool_free+0x95/0xa0
 [<ffffffffa0331e63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
 [<ffffffffa033b8d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffffa033b97d>] do_mirror+0xad/0x6e0 [dm_mirror]
 [<ffffffffa033b8d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffff81088e50>] worker_thread+0x170/0x2a0
 [<ffffffff8108e780>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088ce0>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108e416>] kthread+0x96/0xa0
 [<ffffffff8100c10a>] child_rip+0xa/0x20
 [<ffffffff8108e380>] ? kthread+0x0/0xa0
 [<ffffffff8100c100>] ? child_rip+0x0/0x20


2.6.32-174.el6.x86_64

lvm2-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-libs-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-cluster-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
udev-147-2.35.el6    BUILT: Wed Mar 30 07:32:05 CDT 2011
device-mapper-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
cmirror-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011

Comment 9 Mike Miller (OS Dev) 2012-01-04 17:35:50 UTC
What is the underlying hardware?

Comment 14 Alasdair Kergon 2013-10-08 12:48:03 UTC
Is this still a problem after 2.5 years (and if so can we get fresh diagnostics from a current release), or can we close it?

Comment 18 Jonathan Earl Brassow 2013-11-08 16:08:17 UTC
No reports from the field.  No reproducers in > 2yrs.  I am closing this bug.  Please re-open if you can reproduce.