Description of problem: Red Hat engineers told VERITAS that in RHEL 4 release on the x86 architecture the kernel stack size would be reduced to 4 Kbyte, but all interrupts would be handled on a separate stack. it turns out this is only partially true. on typical SMP x86 implementations the APIC timer interrupt is performed on the standard thread stack and can consume a significant amount of stack space around 376 bytes). here's an example stack that we collected using a special kernel we developed that tracks stack consumption: [kernel] smp_apic_timer_interrupt (+0x84 = x000e0) [kernel] smp_local_timer_interrupt (+0x14 = 0x000f4) [kernel] update_process_times (+0x10 = 0x00104) [kernel] scheduler_tick (+0x3c = 0x00140) [kernel] rebalance_tick (+0x24 = 0x00164) [kernel] load_balance (+0x24 = 0x00188) [kernel] wake_up_process (+0xc = 0x00194) [kernel] try_to_wake_up (+0x48 = 0x001dc) [kernel] activate_task (+0x1c = 0x001f8) [kernel] sched_clock (+0xc = 0x00204) [kernel] cycles_2_ns (+0x20 = 0x00224) where the format is (+stack_frame_size = cumulative_stack_depth). ignoring the size of the stack frame for smp_apic_timer_interrupt(), which is incorrect, the stack depth here is 0x224-0xe0 + 0x34 = 0x178 (where 0x2c is the actual amount of stack used by smp_apic_timer_interrupt(), including the interrupt stack frame). VERITAS is quite short of stack space on 32 bit Intel and would like to have as much available as possible. we've been restructuring our code to decrease our stack consumption, but still find that our stacks can be quite deep. as you've probably guessed from the above trace, we've developed code to track kernel stack usage and find *all* instances ofdeep stack consumption (using a gcc compiler option to insert code in each function entry and exit point). while we currently believe that we don't have any situations where our stack is within 376 bytes of overflow (such that an interrupt would take us over the limit), we're still testing our software stack and are worried that something might come up in a code path we haven't adequately exercised yet. with that in mind, we're making this request against the possibility/probability that we'll need the additional stack space. it's relatively easy to make timer interrupts execute on a separate stack and it the change should have zero measurable impact on the kernel performance, so we'd like Red Hat to consider making this change to benefit us and other subsystems that may have deep stacks (one example being the NFS client code). Note: timer interrupts are seem to be handled differently for x86 UP kernels so that we don't see this problem there. the IA64 kernel already has a 32 Kbyte stack, so it doesn't have separate stacks for interrupts and doesn't need them. x86_64 uses the thread stack for timer interrupts has a larger stack size so this isn't really an issue for us on x86_64. Version-Release number of selected component (if applicable): kernel-smp-2.6.5-7.109.12.EMP How reproducible: run an kernel that instruments stack consumption and look for deep stacks; typically you'll find 300 odd bytes of stack consumed by smp_apic_timer_interrupt at the bottom. we can supply you with a deep stack measuring kernel if you like (along with the source patches). Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
some additional comments from mark hemment: For older IA-32 system, whose without a local APIC, the timer interrupt will use the 'standard' interrupt handler (do_IRQ()) so will (far as I can tell) use a separate stack. Modern server systems won't be using this interrupt, so can be ignored in this discussion. we don't see an issue here for UP kernels because timer interrupt stacks don't go as deep because update_process_times() isn't called. arch/i386/kernel/apic.c: smp_local_timer_interrupt() { ... #ifdef CONFIG_SMP update_process_times(user_mode(regs)); #endif
Created attachment 112043 [details] patch to handle timer interrupts on the interrupt stack here's a patch mark developed to switch to the interrupt stack for handling APIC timer interrupts. Note that this has been tested, but not very heavily.
The patch looks OK to me in principle, and i've submitted it for inclusion. Could you also send it to Andrew Morton & Linus? It makes sense and saves 10% off the worst-case process-stack footprint. We indeed call quite deep into the scheduler from the APIC timer interrupt, which makes it special (and different from the other SMP IPI interrupt routines).
sure. mark or i will submit it. thanks!
PM ACK for U2.
Hi Craig - A beta kernel which we believe resolves this issue is available on the Red Hat partners FTP site (partners.redhat.com) and the Red Hat Network (rhn.redhat.com). Can you download one of the new kernels and see if it resolves your problem? Thanks.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-514.html