Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 682309

Summary: stuck kmirrord processes during mkfs
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WONTFIX QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: agk, cmarthal, coughlan, djansa, dwysocha, heinzm, jbrassow, mike.miller, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-08 16:08:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 756082    

Description Corey Marthaler 2011-03-04 20:01:05 UTC
Description of problem:
Hit this issue while attempting to mkfs a HA LVM mirror.

Test Output:
Creating PVs out of /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3 /dev/etherd/e1.1p4 /dev/etherd/e1.1p5
Creating VG out of /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3 /dev/etherd/e1.1p4 /dev/etherd/e1.1p5
Creating LV(s) on VG HAYES (with hayes-01 tag for activation)
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!

Creating ext3 filesystems on LV(s)
mke2fs 1.41.12 (17-May-2010)

[root@hayes-01 ~]# lvs -a -o +devices
[appears stuck]

[root@hayes-02 ~]# lvs -a -o +devices
  LV             VG         Attr   LSize  Log      Copy%  Devices
  ha1            HAYES      Mwi--- 25.00g ha1_mlog        ha1_mimage_0(0),ha1_mimage_1(0)
  [ha1_mimage_0] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p1(0)
  [ha1_mimage_1] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p2(0)
  [ha1_mlog]     HAYES      lwi---  4.00m                 /dev/etherd/e1.1p5(0)
  ha2            HAYES      Mwi--- 25.00g ha2_mlog        ha2_mimage_0(0),ha2_mimage_1(0)
  [ha2_mimage_0] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p1(6400)
  [ha2_mimage_1] HAYES      Iwi--- 25.00g                 /dev/etherd/e1.1p2(6400)
  [ha2_mlog]     HAYES      lwi---  4.00m                 /dev/etherd/e1.1p5(1)


aoe: 0030488d63d2 e1.1 v4467 has 19046937725 sectors
 etherd/e1.1: p1 p2 p3 p4 p5
DLM (built Feb 22 2011 11:17:55) installed
INFO: task kmirrord:2436 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kmirrord      D 0000000000000003     0  2436      2 0x00000080
 ffff880219f1fb40 0000000000000046 0000000000000000 ffff8801e3728e00
 ffff880219f1fb60 ffffffffa0009413 0000000019f1fb70 00000000ffff3f2a
 ffff880219198638 ffff880219f1ffd8 000000000000f558 ffff880219198638
Call Trace:
 [<ffffffffa0009413>] ? dispatch_io+0x233/0x260 [dm_mod]
 [<ffffffff814d9663>] io_schedule+0x73/0xc0
 [<ffffffffa00094d5>] sync_io+0x95/0x110 [dm_mod]
 [<ffffffffa0009777>] ? dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0008fa0>] ? bvec_next_page+0x0/0x10 [dm_mod]
 [<ffffffffa0322209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
 [<ffffffffa0009777>] dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0009040>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffffa0008fb0>] ? vm_next_page+0x0/0x30 [dm_mod]
 [<ffffffffa02b1781>] disk_flush+0x91/0x170 [dm_log]
 [<ffffffff8110f34e>] ? mempool_kfree+0xe/0x10
 [<ffffffff8110f4b5>] ? mempool_free+0x95/0xa0
 [<ffffffffa02dbe63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffffa032297d>] do_mirror+0xad/0x6e0 [dm_mirror]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffff810883b0>] worker_thread+0x170/0x2a0
 [<ffffffff8108dce0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088240>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108d976>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108d8e0>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
INFO: task kmirrord:2436 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kmirrord      D 0000000000000003     0  2436      2 0x00000080
 ffff880219f1fb40 0000000000000046 0000000000000000 ffff8801df137c80
 ffff880219f1fb60 ffffffffa0009413 0000000019f1fb70 000000010004031b
 ffff880219198638 ffff880219f1ffd8 000000000000f558 ffff880219198638
Call Trace:
 [<ffffffffa0009413>] ? dispatch_io+0x233/0x260 [dm_mod]
 [<ffffffff814d9663>] io_schedule+0x73/0xc0
 [<ffffffffa00094d5>] sync_io+0x95/0x110 [dm_mod]
 [<ffffffffa0009777>] ? dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0008fa0>] ? bvec_next_page+0x0/0x10 [dm_mod]
 [<ffffffffa0322209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
 [<ffffffffa0009777>] dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa0009040>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffffa0008fb0>] ? vm_next_page+0x0/0x30 [dm_mod]
 [<ffffffffa02b1781>] disk_flush+0x91/0x170 [dm_log]
 [<ffffffff8110f34e>] ? mempool_kfree+0xe/0x10
 [<ffffffff8110f4b5>] ? mempool_free+0x95/0xa0
 [<ffffffffa02dbe63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffffa032297d>] do_mirror+0xad/0x6e0 [dm_mirror]
 [<ffffffffa03228d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffff810883b0>] worker_thread+0x170/0x2a0
 [<ffffffff8108dce0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088240>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108d976>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108d8e0>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
INFO: task lvs:2559 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lvs           D 0000000000000002     0  2559   2510 0x00000080
 ffff880117ba9b88 0000000000000086 0000000000000000 00000000000000d0
 ffff880117ba9b38 000000000d2a122a ffff880117ba9b78 000000010004034b
 ffff880119285078 ffff880117ba9fd8 000000000000f558 ffff880119285078
Call Trace:
 [<ffffffff814d9663>] io_schedule+0x73/0xc0
 [<ffffffff811aa49e>] __blockdev_direct_IO+0x70e/0xc40
 [<ffffffff811a80e7>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811a72d0>] ? blkdev_get_blocks+0x0/0xc0
 [<ffffffff8110eb8b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff8120ae21>] ? avc_has_perm+0x71/0x90
 [<ffffffff8120494f>] ? security_inode_permission+0x1f/0x30
 [<ffffffff81170a9a>] do_sync_read+0xfa/0x140
 [<ffffffff8108dce0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811a769c>] ? block_ioctl+0x3c/0x40
 [<ffffffff81183412>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff8121037b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff812037e6>] ? security_file_permission+0x16/0x20
 [<ffffffff811714c5>] vfs_read+0xb5/0x1a0
 [<ffffffff810d1652>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81171601>] sys_read+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b


Version-Release number of selected component (if applicable):
2.6.32-118.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.34.el6    BUILT: Wed Feb 16 08:09:21 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011

Comment 1 Zdenek Kabelac 2011-03-07 16:26:14 UTC
Could you please attach output of: dmsetup info.

Looks like some node stayed suspended.

Comment 5 Corey Marthaler 2011-07-26 21:01:31 UTC
Just reproduced this again.

Jul 26 15:15:47 hayes-01 qarshd[3295]: Running cmdline: mkfs.ext3 /dev/HAYES/ha1
Jul 26 15:19:32 hayes-01 kernel: INFO: task kmirrord:2431 blocked for more than 120 seconds.
Jul 26 15:19:32 hayes-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 15:19:32 hayes-01 kernel: kmirrord      D 0000000000000002     0  2431      2 0x00000080
Jul 26 15:19:32 hayes-01 kernel: ffff88011a6c3b40 0000000000000046 0000000000000000 ffff8800be8fd6c0
Jul 26 15:19:32 hayes-01 kernel: ffff88011a6c3b60 ffffffffa00094c3 000000001a6c3b70 000000010004dc5f
Jul 26 15:19:32 hayes-01 kernel: ffff8801197a5078 ffff88011a6c3fd8 000000000000f598 ffff8801197a5078
Jul 26 15:19:32 hayes-01 kernel: Call Trace:
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa00094c3>] ? dispatch_io+0x233/0x260 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff814dbad3>] io_schedule+0x73/0xc0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009585>] sync_io+0x95/0x110 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009827>] ? dm_io+0x1b7/0x1c0 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009050>] ? bvec_next_page+0x0/0x10 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0324209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009827>] dm_io+0x1b7/0x1c0 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa00090f0>] ? vm_get_page+0x0/0x70 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa0009060>] ? vm_next_page+0x0/0x30 [dm_mod]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa02b3781>] disk_flush+0x91/0x170 [dm_log]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8110f95e>] ? mempool_kfree+0xe/0x10
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8110fac5>] ? mempool_free+0x95/0xa0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa02dde63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa03248d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa032497d>] do_mirror+0xad/0x6e0 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffffa03248d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff81088830>] worker_thread+0x170/0x2a0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8108ddf6>] kthread+0x96/0xa0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8108dd60>] ? kthread+0x0/0xa0
Jul 26 15:19:32 hayes-01 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Jul 26 15:21:32 hayes-01 kernel: INFO: task kmirrord:2431 blocked for more than 120 seconds.



Looks like all the devices are "ACTIVE":
[root@hayes-01 ~]# dmsetup info | grep State
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE
State:             ACTIVE

Comment 6 Corey Marthaler 2011-08-03 16:31:35 UTC
Bumping priority now that this is being seen more often.

Creating LV(s) on VG HAYES (with hayes-01 tag for activation)
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Creating ext3 filesystems on LV(s)
mke2fs 1.41.12 (17-May-2010)
mke2fs 1.41.12 (17-May-2010)


INFO: task kmirrord:2725 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kmirrord      D 0000000000000002     0  2725      2 0x00000080
 ffff88021af79b40 0000000000000046 0000000000000000 ffff8800d9d80800
 ffff88021af79b60 ffffffffa00095a3 000000001af79b70 ffffffffa00091d0
 ffff88021a8f3078 ffff88021af79fd8 000000000000f4e8 ffff88021a8f3078
Call Trace:
 [<ffffffffa00095a3>] ? dispatch_io+0x233/0x260 [dm_mod]
 [<ffffffffa00091d0>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffff814e0943>] io_schedule+0x73/0xc0
 [<ffffffffa0009665>] sync_io+0x95/0x110 [dm_mod]
 [<ffffffffa0009907>] ? dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa033b209>] ? mirror_flush+0xd9/0x150 [dm_mirror]
 [<ffffffffa0009907>] dm_io+0x1b7/0x1c0 [dm_mod]
 [<ffffffffa00091d0>] ? vm_get_page+0x0/0x70 [dm_mod]
 [<ffffffffa0009140>] ? vm_next_page+0x0/0x30 [dm_mod]
 [<ffffffffa02e9781>] disk_flush+0x91/0x170 [dm_log]
 [<ffffffff8111002e>] ? mempool_kfree+0xe/0x10
 [<ffffffff81110195>] ? mempool_free+0x95/0xa0
 [<ffffffffa0331e63>] dm_rh_update_states+0x2d3/0x2f0 [dm_region_hash]
 [<ffffffffa033b8d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffffa033b97d>] do_mirror+0xad/0x6e0 [dm_mirror]
 [<ffffffffa033b8d0>] ? do_mirror+0x0/0x6e0 [dm_mirror]
 [<ffffffff81088e50>] worker_thread+0x170/0x2a0
 [<ffffffff8108e780>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088ce0>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108e416>] kthread+0x96/0xa0
 [<ffffffff8100c10a>] child_rip+0xa/0x20
 [<ffffffff8108e380>] ? kthread+0x0/0xa0
 [<ffffffff8100c100>] ? child_rip+0x0/0x20


2.6.32-174.el6.x86_64

lvm2-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-libs-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-cluster-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
udev-147-2.35.el6    BUILT: Wed Mar 30 07:32:05 CDT 2011
device-mapper-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
cmirror-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011

Comment 9 Mike Miller (OS Dev) 2012-01-04 17:35:50 UTC
What is the underlying hardware?

Comment 14 Alasdair Kergon 2013-10-08 12:48:03 UTC
Is this still a problem after 2.5 years (and if so can we get fresh diagnostics from a current release), or can we close it?

Comment 18 Jonathan Earl Brassow 2013-11-08 16:08:17 UTC
No reports from the field.  No reproducers in > 2yrs.  I am closing this bug.  Please re-open if you can reproduce.