Bug 689008 - Occasional mini-freezes
Summary: Occasional mini-freezes
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-18 19:18 UTC by Daniel Hokka Zakrisson
Modified: 2013-11-12 13:58 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-12 13:58:56 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Daniel Hokka Zakrisson 2011-03-18 19:18:02 UTC
Description of problem:
Every now and then(tm), the system will appear to hang. Sometimes it will hang for several seconds, but usually it just feels a bit laggy, with delays of below 1 second. Input is queued, and will all happen at once when the system resumes normal operation again. It is, in my opinion, odd that even something as simple as an ssh session will freeze.

Setting /proc/sys/kernel/hung_task_timeout_secs to 1, I was able to capture the following traces:
INFO: task jbd2/dm-0-8:420 blocked for more than 1 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-0-8   D ef44be54     0   420      2 0x00000000
 f29b8030 00000046 00000002 ef44be54 c1f84b34 00000000 f0244130 c05c74b0
 efab0980 ddc48040 0000990f c0ac0060 28d4b08b 0000990f c0ac0060 f29b82d8
 c0ac0060 c0abbb34 c0ac0060 f29b82d8 c1f84b34 c0479e40 0a044e6d f29b8030
Call Trace:
 [<c05c74b0>] ? blk_unplug+0x20/0x50
 [<c0479e40>] ? ktime_get_ts+0xd0/0x100
 [<c080bab9>] ? io_schedule+0x59/0xa0
 [<c0547540>] ? sync_buffer+0x30/0x40
 [<c080c1a5>] ? __wait_on_bit+0x45/0x70
 [<c0547510>] ? sync_buffer+0x0/0x40
 [<c0547510>] ? sync_buffer+0x0/0x40
 [<c080c238>] ? out_of_line_wait_on_bit+0x68/0x80
 [<c0470d50>] ? wake_bit_function+0x0/0x60
 [<c05474fe>] ? __wait_on_buffer+0x1e/0x30
 [<f38ef0b9>] ? jbd2_journal_commit_transaction+0xeb9/0x1230 [jbd2]
 [<c0408237>] ? __switch_to+0xd7/0x1a0
 [<c045fee7>] ? lock_timer_base+0x27/0x50
 [<f38f44ad>] ? kjournald2+0x8d/0x1d0 [jbd2]
 [<c0470d10>] ? autoremove_wake_function+0x0/0x40
 [<f38f4420>] ? kjournald2+0x0/0x1d0 [jbd2]
 [<c0470ad4>] ? kthread+0x74/0x80
 [<c0470a60>] ? kthread+0x0/0x80
 [<c040a647>] ? kernel_thread_helper+0x7/0x10
INFO: task kdmflush:401 blocked for more than 1 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kdmflush      D efa7def4     0   401      2 0x00000000
 efa10030 00000046 00000002 efa7def4 c1e84b34 00000000 c1e89060 c0abd380
 c1e89060 f0330200 0000991c c0ac0060 6cdfccb0 0000991c c0ac0060 efa102d8
 c0ac0060 c0abbb34 c0ac0060 efa102d8 c1e84b34 c0479e40 0a052d0e efa10030
Call Trace:
 [<c0479e40>] ? ktime_get_ts+0xd0/0x100
 [<c080bab9>] ? io_schedule+0x59/0xa0
 [<f368838a>] ? dm_wait_for_completion+0x7a/0xd0 [dm_mod]
 [<c0445670>] ? default_wake_function+0x0/0x10
 [<f36892dc>] ? dm_flush+0x1c/0x60 [dm_mod]
 [<f3689359>] ? dm_wq_work+0x39/0x190 [dm_mod]
 [<f3689320>] ? dm_wq_work+0x0/0x190 [dm_mod]
 [<c046c83b>] ? worker_thread+0x11b/0x230
 [<c0470d10>] ? autoremove_wake_function+0x0/0x40
 [<c046c720>] ? worker_thread+0x0/0x230
 [<c0470ad4>] ? kthread+0x74/0x80
 [<c0470a60>] ? kthread+0x0/0x80
 [<c040a647>] ? kernel_thread_helper+0x7/0x10

It seems like ktime_get_ts is the culprit. Running with nodelayacct does appear to improve the situation, although I have been unable to determine (yet) if it completely alleviates it, or just gets rid of the most common offenders (or it just hasn't been seen on those machines yet).

This could of course just be a red herring too, and they're all really waiting on something else that pre-empted them in that location.


Version-Release number of selected component (if applicable):
2.6.32-71.18.2.el6
2.6.32-71.el6 (the above traces were made on this kernel in isolation)


How reproducible:
Not very. Like most things, it doesn't happen when you're looking too hard at it. Even trying to produce heavy I/O doesn't always cause a problem.


Steps to Reproduce:
1. Try using an editor in a terminal, either over ssh to the box, or locally.


Actual results:
Laggy experience.


Expected results:
Beautifully smooth experience.

Comment 2 RHEL Program Management 2011-04-04 02:47:24 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Prarit Bhargava 2011-08-23 12:19:26 UTC
Hi did

>It seems like ktime_get_ts is the culprit. Running with nodelayacct does appear
>to improve the situation, although I have been unable to determine (yet) if it
>completely alleviates it, or just gets rid of the most common offenders (or it
>just hasn't been seen on those machines yet).

work for you?

Thanks,

P.

Comment 4 RHEL Program Management 2011-10-07 15:27:19 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 5 Daniel Hokka Zakrisson 2012-03-12 12:10:34 UTC
Sorry, I hadn't noticed that there had been a comment here. Using nodelayacct appears to make it a little bit better, decreasing the frequency. However, it does still occur.

Comment 6 Mika Ilmaranta 2012-03-12 12:17:55 UTC
(In reply to comment #5)
> Sorry, I hadn't noticed that there had been a comment here. Using nodelayacct
> appears to make it a little bit better, decreasing the frequency. However, it
> does still occur.

You could also check https://bugzilla.redhat.com/show_bug.cgi?id=723516

Comment 7 Daniel Hokka Zakrisson 2012-03-12 12:41:55 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Sorry, I hadn't noticed that there had been a comment here. Using nodelayacct
> > appears to make it a little bit better, decreasing the frequency. However, it
> > does still occur.
> 
> You could also check https://bugzilla.redhat.com/show_bug.cgi?id=723516

I can't, as I'm not authorized.

Comment 8 Mika Ilmaranta 2012-03-12 13:21:59 UTC
(In reply to comment #7)
> I can't, as I'm not authorized.

Simply adding kernel parameters

"processor.max_cstate=1 intel_idle.max_cstate=0"

fixed "system-freezing" for me. But I'm not sure if this is exactly the same problem because nodelayacct didn't help my system at all.

Comment 10 Prarit Bhargava 2013-11-12 13:58:56 UTC
WORKSFORME AFAICT. 

P.


Note You need to log in before you can comment on or make changes to this bug.