Red Hat Bugzilla – Bug 856144
Very long stalls when using commands which needs user information (e.g. w, ps axu)
Last modified: 2016-09-05 16:35:15 EDT
Created attachment 611716 [details]
Description of problem:
w or ps axuwww can take 40 minutes sometimes to complete.
Version-Release number of selected component (if applicable):
Happens quite often but I don't know which type of load triggers this issue. Servers are multipurpose application servers. Sometimes khugepaged takes 100% cpu time but not always when this is happening. enabling/disabling zone_reclaim_mode doesn't make any difference.
HW: HP DL580 G7, 1TB ram, 4x Xeon X7560 8 core (total 32 cores, HT disabled)
Created attachment 611718 [details]
zoneinfo fetched during the stall
Created attachment 611719 [details]
zoneinfo when khugepaged is taking 100% cputime
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
The problem persists, on the same server.
We are currently running the following OS version:
Red Hat Enterprise Linux Server release 6.4 (Santiago)
The "ps -ef" command stalls in the middle of execution, and usually it continues (and completes) after a couple of minutes of no activity. The same applies to commands such as "valgrind --help" , and "java -version".
Created attachment 732679 [details]
Strace of various commands which hang
Created attachment 736818 [details]
perf top view
Culprit is THP, disabling it will cure this problem in couple of seconds.
When enabled, spin_lock_irqsave is on top of perf
15.79% [kernel] [k] _spin_lock_irqsave