Description of problem:
Red Hat engineers told VERITAS that in RHEL 4 release on the x86
architecture the kernel stack size would be reduced to 4 Kbyte, but
all interrupts would be handled on a separate stack.
it turns out this is only partially true. on typical SMP x86
implementations the APIC timer interrupt is performed on the standard
thread stack and can consume a significant amount of stack space
around 376 bytes). here's an example stack that we collected using a
special kernel we developed that tracks stack consumption:
[kernel] smp_apic_timer_interrupt (+0x84 = x000e0)
[kernel] smp_local_timer_interrupt (+0x14 = 0x000f4)
[kernel] update_process_times (+0x10 = 0x00104)
[kernel] scheduler_tick (+0x3c = 0x00140)
[kernel] rebalance_tick (+0x24 = 0x00164)
[kernel] load_balance (+0x24 = 0x00188)
[kernel] wake_up_process (+0xc = 0x00194)
[kernel] try_to_wake_up (+0x48 = 0x001dc)
[kernel] activate_task (+0x1c = 0x001f8)
[kernel] sched_clock (+0xc = 0x00204)
[kernel] cycles_2_ns (+0x20 = 0x00224)
where the format is (+stack_frame_size = cumulative_stack_depth).
ignoring the size of the stack frame for smp_apic_timer_interrupt(),
which is incorrect, the stack depth here is 0x224-0xe0 + 0x34 = 0x178
(where 0x2c is the actual amount of stack used by
smp_apic_timer_interrupt(), including the interrupt stack frame).
VERITAS is quite short of stack space on 32 bit Intel and would like
to have as much available as possible. we've been restructuring our
code to decrease our stack consumption, but still find that our stacks
can be quite deep. as you've probably guessed from the above trace,
we've developed code to track kernel stack usage and find *all*
instances ofdeep stack consumption (using a gcc compiler option to
insert code in each function entry and exit point).
while we currently believe that we don't have any situations where our
stack is within 376 bytes of overflow (such that an interrupt would
take us over the limit), we're still testing our software stack and
are worried that something might come up in a code path we haven't
adequately exercised yet. with that in mind, we're making this
request against the possibility/probability that we'll need the
additional stack space.
it's relatively easy to make timer interrupts execute on a separate
stack and it the change should have zero measurable impact on the
kernel performance, so we'd like Red Hat to consider making this
change to benefit us and other subsystems that may have deep stacks
(one example being the NFS client code).
Note: timer interrupts are seem to be handled differently for x86 UP
kernels so that we don't see this problem there. the IA64 kernel
already has a 32 Kbyte stack, so it doesn't have separate stacks for
interrupts and doesn't need them. x86_64 uses the thread stack for
timer interrupts has a larger stack size so this isn't really an issue
for us on x86_64.
Version-Release number of selected component (if applicable):
run an kernel that instruments stack consumption and look for deep
stacks; typically you'll find 300 odd bytes of stack consumed by
smp_apic_timer_interrupt at the bottom.
we can supply you with a deep stack measuring kernel if you like
(along with the source patches).
Steps to Reproduce:
some additional comments from mark hemment:
For older IA-32 system, whose without a local APIC, the timer
interrupt will use the 'standard' interrupt handler (do_IRQ()) so will
(far as I can tell) use a separate stack. Modern server systems won't
be using this interrupt, so can be ignored in this discussion.
we don't see an issue here for UP kernels because timer interrupt
stacks don't go as deep because update_process_times() isn't called.
Created attachment 112043 [details]
patch to handle timer interrupts on the interrupt stack
here's a patch mark developed to switch to the interrupt stack for handling
APIC timer interrupts. Note that this has been tested, but not very heavily.
The patch looks OK to me in principle, and i've submitted it for inclusion.
Could you also send it to Andrew Morton & Linus? It makes sense and saves 10%
off the worst-case process-stack footprint. We indeed call quite deep into the
scheduler from the APIC timer interrupt, which makes it special (and different
from the other SMP IPI interrupt routines).
sure. mark or i will submit it. thanks!
PM ACK for U2.
Hi Craig - A beta kernel which we believe resolves this issue is available on
the Red Hat partners FTP site (partners.redhat.com) and the Red Hat Network
(rhn.redhat.com). Can you download one of the new kernels and see if it
resolves your problem? Thanks.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.