Bug 162548
| Summary: | interrupt handlers run on thread's kernel stack | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | craig harmer <craig> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.0 | CC: | linux26port |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHSA-2005-514 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2005-10-05 13:39:35 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 156322 | ||
Hi Craig, Thanks for the bug report. This was simply an oversight, and we have corrected this issue by re-enabling 4k irq stacks during for U2. The bug noting the issue is 162257. Thus, i'm closing this one as a duplicate of that. thanks. -Jason *** This bug has been marked as a duplicate of 162257 *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-514.html |
Description of problem: In several discussions, Red Hat engineers told us (Veritas) that Red Hat EL 4.0 would be based on the 2.6 kernel and would move to a 4 Kbyte stack size, but would process hardware interrupts on a seperate stack. but it turns out that it's not true! Interrupts are still being processed on the thread's kernel stack. this is a huge problem for Veritas. here's an example of an interrupt handler running on the thread stack caught by our deep stack tracking kernel: Comm: find (0xea4) [kernel] sys_getdents64 (+0x64 = 0x00064) [kernel] vfs_readdir (+0x24 = 0x00088) [nfs] nfs_readdir (+0x1a4 = 0x0022c) [nfs] readdir_search_pagecache (+0x18 = 0x00244) [nfs] find_dirent_page (+0x14 = 0x00258) [kernel] read_cache_page (+0x24 = 0x0027c) [kernel] __read_cache_page (+0x24 = 0x002a0) [nfs] nfs_readdir_filler (+0x28 = 0x002c8) [nfs] nfs3_proc_readdir (+0x104 = 0x003cc) [nfs] nfs3_rpc_wrapper (+0x28 = 0x003f4) [sunrpc] rpc_call_sync (+0x28 = 0x0041c) [sunrpc] rpc_execute (+0x14 = 0x00430) [sunrpc] __rpc_execute (+0x64 = 0x00494) [sunrpc] call_transmit (+0x10 = 0x004a4) [sunrpc] xprt_transmit (+0x24 = 0x004c8) [sunrpc] xprt_sendmsg (+0x28 = 0x004f0) [sunrpc] xdr_sendpages (+0x94 = 0x00584) [kernel] kernel_sendmsg (+0x24 = 0x005a8) [kernel] sock_sendmsg (+0xec = 0x00694) [kernel] __sock_sendmsg (+0x24 = 0x006b8) [kernel] inet_sendmsg (+0x20 = 0x006d8) [kernel] tcp_sendmsg (+0x58 = 0x00730) [kernel] tcp_push (+0x28 = 0x00758) [kernel] __tcp_push_pending_frames (+0x30 = 0x00788) [kernel] tcp_write_xmit (+0x24 = 0x007ac) [kernel] tcp_transmit_skb (+0x2c = 0x007d8) [kernel] ip_queue_xmit (+0xc4 = 0x0089c) [kernel] dst_output (+0x10 = 0x008ac) [kernel] ip_output (+0x14 = 0x008c0) [kernel] ip_finish_output (+0x14 = 0x008d4) [kernel] nf_hook_slow (+0x38 = 0x0090c) [kernel] nf_iterate (+0x34 = 0x00940) [kernel] selinux_ipv4_postroute_last (+0x20 = 0x00960) [kernel] selinux_ip_postroute_last (+0x94 = 0x009f4) [kernel] avc_has_perm (+0x48 = 0x00a3c) [kernel] avc_has_perm_noaudit (+0x5c = 0x00a98) [kernel] avc_lookup (+0x24 = 0x00abc) [kernel] avc_search_node (+0x28 = 0x00ae4) [kernel] avc_hash (+0x1c = 0x00b00) ====> CDROM interrupt occurs here with ~1,200 bytes remaining <=== [kernel] do_IRQ (+0x74 = 0x00b74) [kernel] handle_IRQ_event (+0x20 = 0x00b94) [kernel] ide_intr (+0x28 = 0x00bbc) [kernel] cdrom_read_intr (+0x20 = 0x00bdc) [kernel] ide_end_request (+0x24 = 0x00c00) [kernel] __ide_end_request (+0x28 = 0x00c28) [kernel] end_that_request_first (+0x18 = 0x00c40) [kernel] __end_that_request_first (+0x2c = 0x00c6c) [kernel] bio_endio (+0x20 = 0x00c8c) [kernel] bounce_end_io_read (+0x1c = 0x00ca8) [kernel] __bounce_end_io_read (+0x18 = 0x00cc0) [kernel] bounce_end_io (+0x24 = *0x00ce4) [kernel] bio_endio (+0x20 = *0x00d04) [kernel] end_bio_bh_io_sync (+0x20 = *0x00d24) [kernel] end_buffer_async_read (+0x24 = *0x00d48) [kernel] unlock_page (+0xc = *0x00d54) [kernel] wake_up_page (+0x14 = *0x00d68) [kernel] __wake_up (+0x1c = *0x00d84) [kernel] __wake_up_common (+0x28 = *0x00dac) [kernel] page_wake_function (+0x1c = *0x00dc8) [kernel] autoremove_wake_function (+0x20 = *0x00de8) [kernel] default_wake_function (+0x1c = *0x00e04) [kernel] try_to_wake_up (+0x48 = *0x00e4c) [kernel] wake_idle (+0x20 = *0x00e6c) [kernel] find_next_bit (+0x38 = *0x00ea4) (CDROM interrupt consumes 0xea4 - 0xb00 + 0x74 = 1,048 bytes of stack.) Note that frame sizes and stack depths shown here are roughly 10% larger than on a production Redhat kernel, since our deepstack tracking kernel is compiled with frame pointers and with "-mregparm=0" to make out debugging easier) Also note that interrupts on Linux can nest. the ~1,000 bytes consumed by the CDROM interrupt could easily have had another ~500 bytes added to it by the ethernet driver and another ~500 bytes added to it by the QLogic FC driver. So, under the right confluence of events this could have been a stack overflow involving only kernel code shipped by Redhat. you're probably wondering why we're only reporting this problem now ... the problem is that veritas does most of it's testing using custom kernels built with an kdb, frame-pointers, "-mregparm=0", and an 8 Kbyte stack. because we have larger stack frames due to passing arguments on the stack, extra debugging code, and kdb we need additional stack space (it really sucks when dropping into kdb causes a stack overrun; in addition, we used to have problems with deep stacks in our production code, although we believe they've all been resolved). when we built our custome kernels, we used "#define CONFIG_4KSTACKS" because it enables the interrupt stack switching code and because we *assumed* that's what Red Hat was doing to get 4 Kbyte kernel stacks. that was a mistake. it turns out Red Hat builds their kernels with a custom patch that enables 4 Kbyte stacks but disables interrupt stack switching. that patch is: linux-2.6.5-x86-nostack.patch it strips out every "#ifdef CONFIG_4KSTACKS" in the kernel *except* for the #ifdef around the interrupt stack switching code in do_IRQ() (in arch/i386/kernel/irq.c), which explains why we're in this situation. i'd really like to know why that patch was added. Veritas has done some limited testing on Red Hat production kernels (most recently the rhel4 Update 1 RC 1 drop) and hasn't seen any actual stack overflows, or even any stack overflow warning messages. but our stack depth tracking kernels were being built using CONFIG_4KSTACKS so we weren't exploring this issue with most of our testing. at this point it's difficult to know what the actual risk is, but currently we don't think we can release our products for the i386 (or i686) with this ncreased risk of stack overflow (since we do know overflow *might* occur if the conditions were right). so we're urgently looking for Red Hat to make kernels available that actually perform hardware interrupt handling on a different stack. Version-Release number of selected component (if applicable): kernel-2.6.9-11.EL How reproducible: every time Steps to Reproduce: 1. build a kernel with stack depth tracking 2. run an i/o intensive test like SpecSFS 3. Actual results: interrupts are handled on thread's kernel stack, not interrupt stack Expected results: interrupts handled on dedicated interrupt stack Additional info: