RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 595883 - kdmflush hung task timeout warnings
Summary: kdmflush hung task timeout warnings
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: ppc64
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Mikuláš Patočka
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 846704
TreeView+ depends on / blocked
 
Reported: 2010-05-25 20:32 UTC by Mike Gahagan
Modified: 2012-11-26 13:00 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-11-26 13:00:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike Gahagan 2010-05-25 20:32:18 UTC
Description of problem:
During run of the rhts test /kernel/distribution/ctcs/1.3.0 I'm seeing a number of hung task warnings for kdmflush while the system is under heavy load. I don't recall seeing them on anything other than ppc.


INFO: task kdmflush:324 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kdmflush      D 0000000000000000     0   324      2 0x00008000
Call Trace:
[c00000008448b700] [c00000000006dbe0] .default_scale_smt_power+0x20/0x50 (unreliable)
[c00000008448b8d0] [c000000000014120] .__switch_to+0x100/0x1d0
[c00000008448b960] [c000000000572fe8] .schedule+0x438/0xd50
[c00000008448baf0] [c000000000573990] .io_schedule+0x90/0x110
[c00000008448bb80] [d000000000ae2c78] .dm_wait_for_completion+0xb8/0x160 [dm_mod]
[c00000008448bc60] [d000000000ae4474] .dm_flush+0x24/0x70 [dm_mod]
[c00000008448bcf0] [d000000000ae45b4] .dm_wq_work+0xf4/0x270 [dm_mod]
[c00000008448bdb0] [c0000000000aa4bc] .worker_thread+0x1dc/0x3c0
[c00000008448bed0] [c0000000000b05bc] .kthread+0xbc/0xd0
[c00000008448bf90] [c00000000002f104] .kernel_thread+0x54/0x70
INFO: task crashme:9395 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
crashme       D 00000fffafe1bc08     0  9395   5163 0x00008080
Call Trace:
[c0000000843cf1d0] [c00000000010bb18] .acct_update_integrals+0x108/0x130 (unreliable)
[c0000000843cf3a0] [c000000000014120] .__switch_to+0x100/0x1d0
[c0000000843cf430] [c000000000572fe8] .schedule+0x438/0xd50
[c0000000843cf5c0] [c0000000005740a8] .schedule_timeout+0x248/0x350
[c0000000843cf6e0] [c000000000573cb8] .wait_for_common+0x198/0x210
[c0000000843cf7c0] [c0000000002ae69c] .blkdev_issue_flush+0xcc/0x140
[c0000000843cf880] [d000000001c92d4c] .ext4_sync_file+0x19c/0x250 [ext4]
[c0000000843cf930] [c0000000001d5ee4] .vfs_fsync_range+0xd4/0x130
[c0000000843cf9e0] [c00000000013d1cc] .generic_file_aio_write+0xdc/0x100
[c0000000843cfaa0] [d000000001c92b14] .ext4_file_write+0x44/0xe0 [ext4]
[c0000000843cfb40] [c0000000001a3724] .do_sync_write+0xd4/0x160
[c0000000843cfce0] [c0000000001a466c] .vfs_write+0xec/0x1f0
[c0000000843cfd80] [c0000000001a4898] .SyS_write+0x58/0xb0
[c0000000843cfe30] [c00000000000852c] syscall_exit+0x0/0x40
INFO: task cat:9415 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cat           D 00000fff9eab3674     0  9415   9414 0x00008080
Call Trace:
[c000000086ce32f0] [c00000000010bb18] .acct_update_integrals+0x108/0x130 (unreliable)
[c000000086ce34c0] [c000000000014120] .__switch_to+0x100/0x1d0
[c000000086ce3550] [c000000000572fe8] .schedule+0x438/0xd50
[c000000086ce36e0] [c000000000573990] .io_schedule+0x90/0x110
[c000000086ce3770] [c00000000013b1f0] .sync_page+0x70/0xa0
[c000000086ce37f0] [c0000000005742b4] .__wait_on_bit_lock+0xd4/0x1b0
[c000000086ce38b0] [c00000000013b134] .__lock_page+0x54/0x70
[c000000086ce3970] [c00000000013b3f8] .find_lock_page+0x98/0xc0
[c000000086ce3a10] [c00000000013bef0] .filemap_fault+0x1e0/0x500
[c000000086ce3ae0] [c000000000167b88] .__do_fault+0xa8/0x640
[c000000086ce3be0] [c000000000168ffc] .handle_pte_fault+0x14c/0xa90
[c000000086ce3cf0] [c000000000578f90] .do_page_fault+0x390/0x720
[c000000086ce3e30] [c00000000000518c] handle_page_fault+0x20/0x74
INFO: task awk:9416 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
awk           D 00000fff84df3674     0  9416   9414 0x00008080
Call Trace:
[c0000000848c7390] [c00000000010bb18] .acct_update_integrals+0x108/0x130 (unreliable)
[c0000000848c7560] [c000000000014120] .__switch_to+0x100/0x1d0
[c0000000848c75f0] [c000000000572fe8] .schedule+0x438/0xd50
[c0000000848c7780] [c000000000573990] .io_schedule+0x90/0x110
[c0000000848c7810] [c00000000013b1f0] .sync_page+0x70/0xa0
[c0000000848c7890] [c0000000005742b4] .__wait_on_bit_lock+0xd4/0x1b0
[c0000000848c7950] [c00000000013b134] .__lock_page+0x54/0x70
[c0000000848c7a10] [c00000000013c0b0] .filemap_fault+0x3a0/0x500
[c0000000848c7ae0] [c000000000167b88] .__do_fault+0xa8/0x640
[c0000000848c7be0] [c000000000168ffc] .handle_pte_fault+0x14c/0xa90
[c0000000848c7cf0] [c000000000578f90] .do_page_fault+0x390/0x720
[c0000000848c7e30] [c00000000000518c] handle_page_fault+0x20/0x74
INFO: task kdmflush:324 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kdmflush      D 0000000000000000     0   324      2 0x00008000
Call Trace:
[c00000008448b700] [c00000000006dbe0] .default_scale_smt_power+0x20/0x50 (unreliable)
[c00000008448b8d0] [c000000000014120] .__switch_to+0x100/0x1d0
[c00000008448b960] [c000000000572fe8] .schedule+0x438/0xd50
[c00000008448baf0] [c000000000573990] .io_schedule+0x90/0x110
[c00000008448bb80] [d000000000ae2c78] .dm_wait_for_completion+0xb8/0x160 [dm_mod]
[c00000008448bc60] [d000000000ae4474] .dm_flush+0x24/0x70 [dm_mod]
[c00000008448bcf0] [d000000000ae45b4] .dm_wq_work+0xf4/0x270 [dm_mod]
[c00000008448bdb0] [c0000000000aa4bc] .worker_thread+0x1dc/0x3c0
[c00000008448bed0] [c0000000000b05bc] .kthread+0xbc/0xd0
[c00000008448bf90] [c00000000002f104] .kernel_thread+0x54/0x70
INFO: task jbd2/dm-0-8:346 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-0-8   D 0000000000000000     0   346      2 0x00008000
Call Trace:
[c0000000848ff440] [c00000000006dbe0] .default_scale_smt_power+0x20/0x50 (unreliable)
[c0000000848ff610] [c000000000014120] .__switch_to+0x100/0x1d0
[c0000000848ff6a0] [c000000000572fe8] .schedule+0x438/0xd50
[c0000000848ff830] [c000000000573990] .io_schedule+0x90/0x110
[c0000000848ff8c0] [c00000000013b1f0] .sync_page+0x70/0xa0
[c0000000848ff940] [c0000000005744d4] .__wait_on_bit+0xb4/0x130
[c0000000848ff9f0] [c00000000013b50c] .wait_on_page_bit+0x7c/0xa0
[c0000000848ffab0] [c00000000013c4cc] .wait_on_page_writeback_range+0x10c/0x1d0
[c0000000848ffc10] [d000000001b44710] .jbd2_journal_commit_transaction+0x940/0x1ba0 [jbd2]
[c0000000848ffdd0] [d000000001b4e870] .kjournald2+0x110/0x470 [jbd2]
[c0000000848ffed0] [c0000000000b05bc] .kthread+0xbc/0xd0
[c0000000848fff90] [c00000000002f104] .kernel_thread+0x54/0x70
INFO: task rs:main Q:Reg:1112 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rs:main Q:Reg D 00000fff961c0d34     0  1112      1 0x00008080
Call Trace:
[c0000000869832f0] [c00000000010bb18] .acct_update_integrals+0x108/0x130 (unreliable)
[c0000000869834c0] [c000000000014120] .__switch_to+0x100/0x1d0
[c000000086983550] [c000000000572fe8] .schedule+0x438/0xd50
[c0000000869836e0] [c000000000573990] .io_schedule+0x90/0x110
[c000000086983770] [c00000000013b1f0] .sync_page+0x70/0xa0
[c0000000869837f0] [c0000000005742b4] .__wait_on_bit_lock+0xd4/0x1b0
[c0000000869838b0] [c00000000013b134] .__lock_page+0x54/0x70
[c000000086983970] [c00000000013b3f8] .find_lock_page+0x98/0xc0
[c000000086983a10] [c00000000013bef0] .filemap_fault+0x1e0/0x500
[c000000086983ae0] [c000000000167b88] .__do_fault+0xa8/0x640
[c000000086983be0] [c000000000168ffc] .handle_pte_fault+0x14c/0xa90
[c000000086983cf0] [c000000000578f90] .do_page_fault+0x390/0x720
[c000000086983e30] [c00000000000518c] handle_page_fault+0x20/0x74
INFO: task flush-253:0:1124 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-253:0   D 0000000000000000     0  1124      2 0x00008080
Call Trace:
[c000000085fee990] [c000000085feea30] 0xc000000085feea30 (unreliable)
[c000000085feeb60] [c000000000014120] .__switch_to+0x100/0x1d0
[c000000085feebf0] [c000000000572fe8] .schedule+0x438/0xd50
[c000000085feed80] [c000000000573990] .io_schedule+0x90/0x110
[c000000085feee10] [c0000000001da2b8] .sync_buffer+0x68/0x80
[c000000085feee80] [c0000000005744d4] .__wait_on_bit+0xb4/0x130
[c000000085feef30] [c0000000005745c0] .out_of_line_wait_on_bit+0x70/0x90
[c000000085fef010] [c0000000001da170] .__wait_on_buffer+0x30/0x50
[c000000085fef090] [d000000001cc8674] .ext4_mb_init_cache+0x2d4/0x960 [ext4]
[c000000085fef1c0] [d000000001ccb0d0] .ext4_mb_load_buddy+0x330/0x400 [ext4]
[c000000085fef270] [d000000001cce64c] .ext4_mb_regular_allocator+0x27c/0x620 [ext4]
[c000000085fef3c0] [d000000001cceea0] .ext4_mb_new_blocks+0x4b0/0x670 [ext4]
[c000000085fef4a0] [d000000001cc437c] .ext4_ext_get_blocks+0x4fc/0x890 [ext4]
[c000000085fef5f0] [d000000001c9b5fc] .ext4_get_blocks+0x2dc/0x570 [ext4]
[c000000085fef6c0] [d000000001c9bec0] .mpage_da_map_blocks+0xb0/0x3e0 [ext4]
[c000000085fef7f0] [d000000001c9c504] .ext4_da_writepages+0x314/0x630 [ext4]
[c000000085fef970] [c000000000153154] .do_writepages+0x44/0x70
[c000000085fef9e0] [c0000000001cfc34] .writeback_single_inode+0xc4/0x2d0
[c000000085fefa90] [c0000000001d0dc0] .writeback_inodes_wb+0x390/0x5b0
[c000000085fefba0] [c0000000001d1148] .wb_writeback+0x168/0x290
[c000000085fefcd0] [c0000000001d1654] .wb_do_writeback+0x274/0x310
[c000000085fefda0] [c0000000001d174c] .bdi_writeback_task+0x5c/0x140
[c000000085fefe40] [c000000000165824] .bdi_start_fn+0xa4/0x150
[c000000085fefed0] [c0000000000b05bc] .kthread+0xbc/0xd0
[c000000085feff90] [c00000000002f104] .kernel_thread+0x54/0x70
INFO: task ntpd:3793 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ntpd          D 00000fff8d00b7b0     0  3793      1 0x00008080
Call Trace:
[c0000000870532f0] [c00000000010bb18] .acct_update_integrals+0x108/0x130 (unreliable)
[c0000000870534c0] [c000000000014120] .__switch_to+0x100/0x1d0
[c000000087053550] [c000000000572fe8] .schedule+0x438/0xd50
[c0000000870536e0] [c000000000573990] .io_schedule+0x90/0x110
[c000000087053770] [c00000000013b1f0] .sync_page+0x70/0xa0
[c0000000870537f0] [c0000000005742b4] .__wait_on_bit_lock+0xd4/0x1b0
[c0000000870538b0] [c00000000013b134] .__lock_page+0x54/0x70
[c000000087053970] [c00000000013b3f8] .find_lock_page+0x98/0xc0
[c000000087053a10] [c00000000013bef0] .filemap_fault+0x1e0/0x500
[c000000087053ae0] [c000000000167b88] .__do_fault+0xa8/0x640
[c000000087053be0] [c000000000168ffc] .handle_pte_fault+0x14c/0xa90
[c000000087053cf0] [c000000000578f90] .do_page_fault+0x390/0x720
[c000000087053e30] [c00000000000518c] handle_page_fault+0x20/0x74
INFO: task master:3871 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
master        D 00000fff794bbc08     0  3871      1 0x00008080
Call Trace:
[c000000085b1f080] [c00000000010bb18] .acct_update_integrals+0x108/0x130 (unreliable)
[c000000085b1f250] [c000000000014120] .__switch_to+0x100/0x1d0
[c000000085b1f2e0] [c000000000572fe8] .schedule+0x438/0xd50
[c000000085b1f470] [d000000001b42454] .do_get_write_access+0x4a4/0x650 [jbd2]
[c000000085b1f5a0] [d000000001b42638] .jbd2_journal_get_write_access+0x38/0x70 [jbd2]
[c000000085b1f630] [d000000001cc4ae4] .__ext4_journal_get_write_access+0x44/0xc0 [ext4]
[c000000085b1f6d0] [d000000001c97f98] .ext4_reserve_inode_write+0xa8/0xe0 [ext4]
[c000000085b1f770] [d000000001c9802c] .ext4_mark_inode_dirty+0x5c/0x210 [ext4]
[c000000085b1f850] [d000000001c98388] .ext4_dirty_inode+0x48/0x80 [ext4]
[c000000085b1f8e0] [c0000000001d050c] .__mark_inode_dirty+0x5c/0x1b0
[c000000085b1f970] [c0000000001c0d5c] .file_update_time+0x10c/0x1c0
[c000000085b1fa30] [c0000000001af348] .pipe_write+0x248/0x5c0
[c000000085b1fb40] [c0000000001a3724] .do_sync_write+0xd4/0x160
[c000000085b1fce0] [c0000000001a466c] .vfs_write+0xec/0x1f0
[c000000085b1fd80] [c0000000001a4898] .SyS_write+0x58/0xb0
[c000000085b1fe30] [c00000000000852c] syscall_exit+0x0/0x40

Version-Release number of selected component (if applicable):
0520 tree, 2.6.32-27.el6 kernel

How reproducible:
frequent

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:
no warnings as on other arch's

Additional info:

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14310730

Comment 1 RHEL Program Management 2010-05-25 20:46:27 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Mikuláš Patočka 2010-05-26 15:59:57 UTC
I suspect either a bug in device mapper barrier support or a bug in the underlying device driver (it can't process barrier requests). Another possibility is broken disk firmware (some disks crash their firmware if too many/too long commands are being sent to them).

Questions:
* What kind of device do you run the tests on? What md, dm and other layers are stacked and how? What is the physical device under this?

* Are the tasks hung forever? Or do they eventually unwedge?

You say that it is frequently reproducible, so try the same test using different host adapters and different disks and try to pinpoint conditions of the bug.

Comment 5 Mike Gahagan 2010-06-01 20:42:20 UTC
The storage device should have been provided by IBM's virtual scsi driver, I am not 100% sure of what the underlying storage was. The install was using LVM per anaconda's/Beaker's defaults.

The tasks did appear to recover eventually.

I'll try and grab some differently configured ppc's this week and do some more testing.

Comment 6 Mikuláš Patočka 2010-07-08 20:12:35 UTC
If the tasks recover, it suggests that it is not a device mapper problem (there are no timers in device mapper). I would look at the underlying device, why is it causing i/o delays.

Comment 7 RHEL Program Management 2010-07-15 15:15:20 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 9 RHEL Program Management 2011-01-07 04:48:31 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 10 Suzanne Logcher 2011-01-07 16:19:51 UTC
This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 11 RHEL Program Management 2011-02-01 06:18:51 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 12 RHEL Program Management 2011-02-01 18:22:54 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 13 RHEL Program Management 2011-04-04 02:41:58 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 14 RHEL Program Management 2011-10-07 15:04:35 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 15 Alasdair Kergon 2012-11-26 13:00:34 UTC
No update for over 2 years so I'm closing this one.

If further data is found that points to something devel needs to investigate, then please reopen it.


Note You need to log in before you can comment on or make changes to this bug.