Bug 856144 - Very long stalls when using commands which needs user information (e.g. w, ps axu)
Very long stalls when using commands which needs user information (e.g. w, ps...
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Larry Woodman
Wang Shu
:
Depends On:
Blocks: 1366045 1270638 1359574
  Show dependency treegraph
 
Reported: 2012-09-11 06:02 EDT by Tommi Tervo
Modified: 2016-09-05 16:35 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-02 11:06:29 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sysrq-t dumps (135.21 KB, application/x-bzip)
2012-09-11 06:02 EDT, Tommi Tervo
no flags Details
zoneinfo fetched during the stall (1.81 KB, application/x-bzip)
2012-09-11 06:03 EDT, Tommi Tervo
no flags Details
zoneinfo when khugepaged is taking 100% cputime (1.83 KB, application/x-bzip)
2012-09-11 06:04 EDT, Tommi Tervo
no flags Details
Strace of various commands which hang (315.52 KB, text/plain)
2013-04-08 09:52 EDT, Marco Passerini
no flags Details
perf top view (4.11 KB, text/plain)
2013-04-17 08:03 EDT, Tommi Tervo
no flags Details

  None (edit)
Description Tommi Tervo 2012-09-11 06:02:03 EDT
Created attachment 611716 [details]
sysrq-t dumps

Description of problem:
w or ps axuwww can take 40 minutes sometimes to complete. 

Version-Release number of selected component (if applicable):
2.6.32-279.2.1.el6.x86_64

How reproducible:
Happens quite often but I don't know which type of load triggers this issue. Servers are multipurpose application servers. Sometimes khugepaged takes 100% cpu time but not always when this is happening. enabling/disabling zone_reclaim_mode doesn't make any difference.


Additional info:
HW: HP DL580 G7, 1TB ram, 4x Xeon X7560 8 core (total 32 cores, HT disabled)
Comment 1 Tommi Tervo 2012-09-11 06:03:31 EDT
Created attachment 611718 [details]
zoneinfo fetched during the stall
Comment 2 Tommi Tervo 2012-09-11 06:04:13 EDT
Created attachment 611719 [details]
zoneinfo when khugepaged is taking 100% cputime
Comment 4 RHEL Product and Program Management 2012-12-14 03:25:13 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 5 Marco Passerini 2013-04-08 09:40:55 EDT
The problem persists, on the same server.
We are currently running the following OS version:

Red Hat Enterprise Linux Server release 6.4 (Santiago)
2.6.32-358.0.1.el6.x86_64

The "ps -ef" command stalls in the middle of execution, and usually it continues (and completes) after a couple of minutes of no activity. The same applies to commands such as "valgrind --help" , and "java -version".
Comment 6 Marco Passerini 2013-04-08 09:52:46 EDT
Created attachment 732679 [details]
Strace of various commands which hang
Comment 7 Tommi Tervo 2013-04-17 08:03:52 EDT
Created attachment 736818 [details]
perf top view

Culprit is THP, disabling it will cure this problem in couple of seconds.

When enabled, spin_lock_irqsave is on top of perf
15.79%  [kernel]  [k] _spin_lock_irqsave

Note You need to log in before you can comment on or make changes to this bug.