Bug 103780 - LTC4188-Up to 6 Second Latencies in some cpu bound transactions
LTC4188-Up to 6 Second Latencies in some cpu bound transactions
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Jim Paradis
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-09-04 16:34 EDT by IBM Bug Proxy
Modified: 2013-08-05 21:02 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-06-08 17:42:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description IBM Bug Proxy 2003-09-04 16:34:20 EDT
The following has be reported by IBM LTC:  
Up to 6 Second Latencies in some cpu bound transactions
Hardware Environment: Egenera Blades

Software Environment: RH 2.4.9-e25

Steps to Reproduce: Can only be done at CSFB's production environment.
1.  Run CSFB's AMM application (stock quoting system)
2.  Observe 6 second maximum logged transaction time (happens every 15 minutes)

Additional Information:

CSFB's stock quoting application has some cpu bound transactions that take up to
6 seconds, while average time is 24ms.  Kernel 2.4.7-10 did not experience this,
but kernel 2.4.9-e25 does. 

We have instrumented try_to_free_pages to see if we were stuck in a long quest
for memory allocation.  None of the threads called try_to_free_pages during the
test (6 second transactions were still there).  Kswapd also did not call
try_to_free_pages

We have instrumented task timeslice assignments (min, max) per task to look for
scheduler starvation.  All timeslices were 15ms.  All tasks had the same priority.

We are preparing for a A/B test with a "whitebox" system to compare to Egenera's
blade systems.


We would like RedHat's help with this problem:

1) Have there been any situations like this, a multi-second latency for cpu
bound operations?

2) What else can we instrument to better identify the problem

Glen/Greg - this is a performance bug against a Red Hat errata kernel.
Per Andrew's request, please submit this to Red Hat. Thanks.
Comment 1 Arjan van de Ven 2003-09-04 16:37:11 EDT
Is this our kernel or the egenera recompiled-kernel-with-changes-and-hooks kernel ?
Comment 2 Andrew Theurer 2003-09-04 17:31:31 EDT
egenera kernel.  Whenever I need to patch for instrumentation, test, etc, I 
send to them, they rebuild, then send to customer.

FYI, they had also been experiencing significant average performance drop 
compared to a 2.4.7-10 kernel (at least 40%).  Application of Ingo's aggressive 
idle steal (add "idle ||" to CAN_MIGRATE) to 2.4.9-e25 brought average 
performance slightly better than 2.4.7-10, and worst case latency from 6 
seconds to about 3 seconds.  Worst case latency on 2.4.7-10 is 500 ms.
Comment 3 Arjan van de Ven 2003-09-08 04:31:57 EDT
Please report back if this also shows on an actual supported Red Hat kernel
Comment 4 IBM Bug Proxy 2003-09-26 00:04:01 EDT
------ Additional Comments From khoa@us.ibm.com  2003-25-09 23:23 -------
Andrew - Red Hat has refused to look at this problem if it does not
happen on their kernel.  I had thought about this when I first screened
this bug, but I thought Red Hat would answer your two questions above.
As it turned out, they would not.  So I'd like to assign this bug back
to you for more analysis.  Thanks. 
Comment 5 IBM Bug Proxy 2004-04-21 15:16:01 EDT
----- Additional Comments From atheurer@us.ibm.com(prefers email via habanero@us.ibm.com)  2004-04-21 15:18 -------
Chnages to scheduler resolved this bug.  Changes already in RHEL3 
Comment 6 Jim Paradis 2006-06-08 17:42:07 EDT
RHEL2.1 is currently accepting only critical security fixes.  This issue is
outside the current scope of support.

Note You need to log in before you can comment on or make changes to this bug.