Red Hat Bugzilla – Bug 103780
LTC4188-Up to 6 Second Latencies in some cpu bound transactions
Last modified: 2013-08-05 21:02:09 EDT
The following has be reported by IBM LTC:
Up to 6 Second Latencies in some cpu bound transactions
Hardware Environment: Egenera Blades
Software Environment: RH 2.4.9-e25
Steps to Reproduce: Can only be done at CSFB's production environment.
1. Run CSFB's AMM application (stock quoting system)
2. Observe 6 second maximum logged transaction time (happens every 15 minutes)
CSFB's stock quoting application has some cpu bound transactions that take up to
6 seconds, while average time is 24ms. Kernel 2.4.7-10 did not experience this,
but kernel 2.4.9-e25 does.
We have instrumented try_to_free_pages to see if we were stuck in a long quest
for memory allocation. None of the threads called try_to_free_pages during the
test (6 second transactions were still there). Kswapd also did not call
We have instrumented task timeslice assignments (min, max) per task to look for
scheduler starvation. All timeslices were 15ms. All tasks had the same priority.
We are preparing for a A/B test with a "whitebox" system to compare to Egenera's
We would like RedHat's help with this problem:
1) Have there been any situations like this, a multi-second latency for cpu
2) What else can we instrument to better identify the problem
Glen/Greg - this is a performance bug against a Red Hat errata kernel.
Per Andrew's request, please submit this to Red Hat. Thanks.
Is this our kernel or the egenera recompiled-kernel-with-changes-and-hooks kernel ?
egenera kernel. Whenever I need to patch for instrumentation, test, etc, I
send to them, they rebuild, then send to customer.
FYI, they had also been experiencing significant average performance drop
compared to a 2.4.7-10 kernel (at least 40%). Application of Ingo's aggressive
idle steal (add "idle ||" to CAN_MIGRATE) to 2.4.9-e25 brought average
performance slightly better than 2.4.7-10, and worst case latency from 6
seconds to about 3 seconds. Worst case latency on 2.4.7-10 is 500 ms.
Please report back if this also shows on an actual supported Red Hat kernel
------ Additional Comments From email@example.com 2003-25-09 23:23 -------
Andrew - Red Hat has refused to look at this problem if it does not
happen on their kernel. I had thought about this when I first screened
this bug, but I thought Red Hat would answer your two questions above.
As it turned out, they would not. So I'd like to assign this bug back
to you for more analysis. Thanks.
----- Additional Comments From firstname.lastname@example.org(prefers email via email@example.com) 2004-04-21 15:18 -------
Chnages to scheduler resolved this bug. Changes already in RHEL3
RHEL2.1 is currently accepting only critical security fixes. This issue is
outside the current scope of support.