Bug 735946 - khugepaged stalls system
Summary: khugepaged stalls system
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: i686
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-06 09:00 UTC by Slawomir Czarko
Modified: 2011-11-17 23:28 UTC (History)
6 users (show)

Fixed In Version: kernel-2.6.41.1-1.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-17 23:28:55 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Slawomir Czarko 2011-09-06 09:00:33 UTC
Description of problem:

System becomes unresponsive from time to time. Mouse stops moving, keyboard doesn't react either. Sometimes keyboard and mouse are responsive but window refresh is delayed (5-30 sec) for some applications.

Usually it happens when switching to a web browser (happens with any of Firefox, Opera, Chrome, Midori) or email client (only tried with Thunderbird). Also when opening a new tab in a web browser. This unresponsiveness lasts about 15-30 seconds and then stops. Then it will repeat again and again.

I'm not sure what exactly triggers it but I can reproduce it reliably by running a Fedora 9 VM in VMplayer and running compilation there.

It doesn't happen immediately but after a few minutes. You can see high CPU load from khugepaged and from some of the commands executed during compilation.

The problem seems to go away after executing this:

echo madvise > /sys/kernel/mm/transparent_hugepage/defrag

The default value was "always"


Version-Release number of selected component (if applicable):

kernel-PAE-2.6.40.3-0.fc15.i686


How reproducible:

100%


Steps to Reproduce:
1. Start a Fedora 9 VM in VMplayer.
2. Run software compilation in the VM.
3.
  
Actual results:

System becomes unresponsive.

Expected results:

System stays responsive.

Additional info:

Comment 1 Slawomir Czarko 2011-09-13 14:43:44 UTC
Unfortunately after executing:

echo madvise > /sys/kernel/mm/transparent_hugepage/defrag

I get OOM errors and applications being killed even that system has lots of free RAM. OOM errors get triggered by rsync on a large directory for example.

Comment 2 Slawomir Czarko 2011-10-04 07:09:25 UTC
This patch fixed the problem for me:

https://lkml.org/lkml/2011/7/26/103

Comment 3 Josh Boyer 2011-10-04 14:23:12 UTC
(In reply to comment #2)
> This patch fixed the problem for me:
> 
> https://lkml.org/lkml/2011/7/26/103

Thanks for the URL.

That thread seems to have stalled with no alternative solution.  I've contacted the two developers involved and hopefully we'll get some kind of resolution.

Comment 4 Josh Boyer 2011-10-10 13:29:46 UTC
A new set of patches has been posted for this issue.  You can find the thread here:

http://thread.gmane.org/gmane.linux.kernel/1200542

Would it be possible for you to test those two patches and let us know if they resolve the issues you were seeing?

Comment 5 Slawomir Czarko 2011-10-11 18:31:28 UTC
I'm building kernel with these patches now. Will let you know in a few days if these patches work.

Comment 6 Slawomir Czarko 2011-10-16 13:02:07 UTC
The new patches work.

Comment 7 Slawomir Czarko 2011-10-16 13:03:08 UTC
Btw, I was unable to reproduce the problem with VMware Player 4.0 and unpatched kernel. I can reproduce it with VMware Player 3.1.4.

Comment 8 Josh Boyer 2011-10-17 17:35:27 UTC
I've added these patches to the f15 kernel (as well as rawhide/f16).  They will be in the next build.

Comment 9 Fedora Update System 2011-10-17 21:53:50 UTC
kernel-2.6.40.7-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.7-0.fc15

Comment 10 Fedora Update System 2011-10-18 22:06:57 UTC
Package kernel-2.6.40.7-0.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.40.7-0.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-14513
then log in and leave karma (feedback).

Comment 11 Fedora Update System 2011-10-24 15:12:54 UTC
kernel-2.6.40.7-3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.7-3.fc15

Comment 12 Slawomir Czarko 2011-10-26 13:03:43 UTC
I'm running kernel patched with patches from comment#4 and today I was getting some stalls again when working with a Windows VM.

It's not as bad as before but noticeable.

Output of cat /proc/`pgrep khugepaged`/io shows:

rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 0
write_bytes: 8192
cancelled_write_bytes: 0

(this command was mentioned here https://lkml.org/lkml/2011/9/20/261)

Comment 13 Fedora Update System 2011-10-27 14:13:07 UTC
kernel-2.6.40.8-2.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.8-2.fc15

Comment 14 Fedora Update System 2011-11-01 19:59:11 UTC
kernel-2.6.40.8-4.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.8-4.fc15

Comment 15 Andy 2011-11-04 11:34:52 UTC
Kernel 2.6.40.6-0.fc15.i686 on Toshiba NB305

After around 1-1/2 hours of leaving the laptop untouched, but with at least one application open (e.g., Firefox, LibreOffice), the computer becomes unresponsive (mouse, keyboard, black screen when left on X desktop). 
Last time it happened I left it on tty2, and once the hour and so had passed, everything was unresponsive, but this time the screen was displaying repeated bug messages. The message that kept being displayed again and again was something like:

 BUG: soft lockup CPU #1 khugepaged: 28

Sorry for not being able to be more specific.

Comment 16 Andy 2011-11-04 13:04:47 UTC
(In reply to comment #15)
Update to kernel-2.6.40.8-4.fc15 seems to have fixed the BUG.

Comment 17 Andy 2011-11-05 14:59:50 UTC
(In reply to comment #16)
> (In reply to comment #15)
> Update to kernel-2.6.40.8-4.fc15 seems to have fixed the BUG.
It didn't. It made it worse. Now it just stalls after a few minutes. I can't even get to switch to TTY2 to try to see if there are any debugging messages.

Comment 18 Fedora Update System 2011-11-12 00:08:13 UTC
kernel-2.6.41.1-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.41.1-1.fc15

Comment 19 Fedora Update System 2011-11-17 23:28:55 UTC
kernel-2.6.41.1-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.