Bug 103780 - LTC4188-Up to 6 Second Latencies in some cpu bound transactions
Summary: LTC4188-Up to 6 Second Latencies in some cpu bound transactions
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jim Paradis
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-09-04 20:34 UTC by IBM Bug Proxy
Modified: 2013-08-06 01:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-06-08 21:42:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description IBM Bug Proxy 2003-09-04 20:34:20 UTC
The following has be reported by IBM LTC:  
Up to 6 Second Latencies in some cpu bound transactions
Hardware Environment: Egenera Blades

Software Environment: RH 2.4.9-e25

Steps to Reproduce: Can only be done at CSFB's production environment.
1.  Run CSFB's AMM application (stock quoting system)
2.  Observe 6 second maximum logged transaction time (happens every 15 minutes)

Additional Information:

CSFB's stock quoting application has some cpu bound transactions that take up to
6 seconds, while average time is 24ms.  Kernel 2.4.7-10 did not experience this,
but kernel 2.4.9-e25 does. 

We have instrumented try_to_free_pages to see if we were stuck in a long quest
for memory allocation.  None of the threads called try_to_free_pages during the
test (6 second transactions were still there).  Kswapd also did not call
try_to_free_pages

We have instrumented task timeslice assignments (min, max) per task to look for
scheduler starvation.  All timeslices were 15ms.  All tasks had the same priority.

We are preparing for a A/B test with a "whitebox" system to compare to Egenera's
blade systems.


We would like RedHat's help with this problem:

1) Have there been any situations like this, a multi-second latency for cpu
bound operations?

2) What else can we instrument to better identify the problem

Glen/Greg - this is a performance bug against a Red Hat errata kernel.
Per Andrew's request, please submit this to Red Hat. Thanks.

Comment 1 Arjan van de Ven 2003-09-04 20:37:11 UTC
Is this our kernel or the egenera recompiled-kernel-with-changes-and-hooks kernel ?

Comment 2 Andrew Theurer 2003-09-04 21:31:31 UTC
egenera kernel.  Whenever I need to patch for instrumentation, test, etc, I 
send to them, they rebuild, then send to customer.

FYI, they had also been experiencing significant average performance drop 
compared to a 2.4.7-10 kernel (at least 40%).  Application of Ingo's aggressive 
idle steal (add "idle ||" to CAN_MIGRATE) to 2.4.9-e25 brought average 
performance slightly better than 2.4.7-10, and worst case latency from 6 
seconds to about 3 seconds.  Worst case latency on 2.4.7-10 is 500 ms.

Comment 3 Arjan van de Ven 2003-09-08 08:31:57 UTC
Please report back if this also shows on an actual supported Red Hat kernel

Comment 4 IBM Bug Proxy 2003-09-26 04:04:01 UTC
------ Additional Comments From khoa.com  2003-25-09 23:23 -------
Andrew - Red Hat has refused to look at this problem if it does not
happen on their kernel.  I had thought about this when I first screened
this bug, but I thought Red Hat would answer your two questions above.
As it turned out, they would not.  So I'd like to assign this bug back
to you for more analysis.  Thanks. 

Comment 5 IBM Bug Proxy 2004-04-21 19:16:01 UTC
----- Additional Comments From atheurer.com(prefers email via habanero.com)  2004-04-21 15:18 -------
Chnages to scheduler resolved this bug.  Changes already in RHEL3 

Comment 6 Jim Paradis 2006-06-08 21:42:07 UTC
RHEL2.1 is currently accepting only critical security fixes.  This issue is
outside the current scope of support.


Note You need to log in before you can comment on or make changes to this bug.