RHN: Customer: BTI - Australia Platform: ia64 (rx2600) with RHAS2.1 and 2.4.9-e.31smp / 2.4.18-e.43smp Host prod1 product id is 94ad-5abf-40b5-1119 Host prod2 product id is 40db-6f7b-b4b1-a274 RHN username is Worldcare - Quantity 2: MCT0219US RHEL 2.1 for IPF Level 3 (9-9 Monday-Friday) Problem: Context: two rx2600, RH EL AS 2.1, 2.4.9-e.31smp, cciss, in production for the past two or three weeks. Systems hang on heavy system load, no ping, no console response, no console output, management processor event log inconclusive, SysRq and serial console enabled. Regards, Dilip Daya. ---------- Action by: ddaya Hi Chris, Latest update: 1./ HP-Australia has involved Red Hat Support in Australia. Please investigate that you and your Austalian Red Hat engineers are in sync with this issue. I was not given a issue or a contact person within Red Hat/Australia. 2./ The customer rx2600 system kernel hangs were occurring on 2.4.18-e.31smp and on 2.4.18-e.43 (as shown in the gmiller.2380 sysreport), but since then the customer has been willing to try a HP kernel change provided by their CCISS driver engineer. As of at 9:15am (Australian Time) Tuesday the customer claims the systems have not hung with the 2.4.18-e.31.hp kernel and Oracle combination. The kernel modification and explanation is as follows from teh CCISS driver developer within HP: --- The following is also related to Red Hat Issues: 37897 _and_ 34923 and this issue: 42071 ---- Even though this is an unsupported configuration is important that we try this workaround. If it proves to help as it did in the db lab then we can go back to RH for a fix. Soo, I reviewed source code: /usr/src/linux-2.4.18-e.41/arch/ia64/hp/common/sba_iommu.c --- Change line #43: 43 #define ENABLE_MARK_CLEAN To: 43 #undef ENABLE_MARK_CLEAN --- ...and recompile the kernel and test... --- => HP-Australia made the kernel modification and provided the customer "2.4.18-e.31.hp" and since executing this kernel the customer's systems are up and fine without any kernel panics/hangs/crash. Soo, could RH Support Engineering review the above code modification and reply with any implications or orther workaround.
<0>Kernel panic: not continuing In interrupt handler - not syncing <6>Syncing device 68:04 ... kernel BUG at sched.c:834! Unable to handle kernel NULL pointer dereferencecp[2095]: Oops 11003706212352 --> schedule [kernel] 0x81 <-- Pid: 2095, comm: cp psr : 0000121008026018 ifs : 8000000000000813 ip : [<e000000004470781>] Not tainted unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003 rnat: 0000000000001000 bsps: e0000040dadf8000 pr : 80000000ff615565 ldrs: 0000000000000000 ccv : 000000007fffffff fpsr: 0009804c8a70033f b0 : e000000004470780 b6 : e0000000045e8d40 b7 : e00000000440e2b0 f6 : 0fffbccccccccc8c00000 f7 : 0ffdca200000000000000 f8 : 100028000000000000000 f9 : 10002a000000000000000 r1 : e000000004bb2310 r2 : 00000000000051d7 r3 : e00000000485d2d5 r8 : 000000000000001b r9 : 0000000000000000 r10 : 0000000000000000 r11 : 80000000ff611a65 r12 : e0000040d674f970 r13 : e0000040d6748000 r14 : 0000000000000000 r15 : e00000000495da20 r16 : e00000000495da08 r17 : 0000000000000000 r18 : 0000000000000001 r19 : e000000004a13750 r20 : e000000004a13748 r21 : e0000000049bdc58 r22 : 000000000000ffff r23 : 0000000000000000 r24 : 0000000000000058 r25 : 0000000000000059 r26 : 000000000000005a r27 : 00000000000000e0 r28 : 0000000000000000 r29 : 0000000000000001 r30 : 0000000000000005 r31 : 0000000000000894 Call Trace: [<e000000004412d90>] sp=0xe0000040d674f560 bsp=0xe0000040d67498d0 decoded to show_stack [kernel] 0x50 [<e0000000044135c0>] sp=0xe0000040d674f720 bsp=0xe0000040d6749878 decoded to show_regs [kernel] 0x7c0 [<e00000000442c7e0>] sp=0xe0000040d674f740 bsp=0xe0000040d6749850 decoded to die [kernel] 0x120 [<e00000000444bc20>] sp=0xe0000040d674f740 bsp=0xe0000040d67497e8 decoded to ia64_do_page_fault [kernel] 0x780 [<e00000000440dce0>] sp=0xe0000040d674f7d0 bsp=0xe0000040d67497e8 decoded to ia64_leave_kernel [kernel] 0x0 [<e000000004470780>] sp=0xe0000040d674f970 bsp=0xe0000040d6749750 decoded to schedule [kernel] 0x80 [<e0000000044d9750>] sp=0xe0000040d674f980 bsp=0xe0000040d6749710 decoded to __wait_on_buffer [kernel] 0xf0 [<e0000000044dc800>] sp=0xe0000040d674f9b0 bsp=0xe0000040d67496e8 decoded to bread [kernel] 0xe0 [<e000000004540540>] sp=0xe0000040d674f9c0 bsp=0xe0000040d6749688 decoded to ext2_update_inode [kernel] 0x2e0 [<e000000004540cb0>] sp=0xe0000040d674f9d0 bsp=0xe0000040d6749668 decoded to ext2_write_inode [kernel] 0x30 [<e000000004507d60>] sp=0xe0000040d674f9d0 bsp=0xe0000040d6749600 decoded to sync_inodes_sb [kernel] 0x2c0 [<e000000004508740>] sp=0xe0000040d674f9d0 bsp=0xe0000040d67495e0 decoded to sync_inodes [kernel] 0x60 [<e0000000044da260>] sp=0xe0000040d674f9d0 bsp=0xe0000040d67495c8 decoded to fsync_dev [kernel] 0x40 [<e0000000045ef270>] sp=0xe0000040d674f9d0 bsp=0xe0000040d6749598 decoded to go_sync [kernel] 0x370 [<e0000000045ef460>] sp=0xe0000040d674f9e0 bsp=0xe0000040d6749568 decoded to do_emergency_sync [kernel] 0x180 [<e000000004478320>] sp=0xe0000040d674f9e0 bsp=0xe0000040d6749510 decoded to panic [kernel] 0x300 [<e00000000442c870>] sp=0xe0000040d674fa20 bsp=0xe0000040d67494e8 decoded to die [kernel] 0x1b0 [<e00000404764b580>] sp=0xe0000040d674fa20 bsp=0xe0000040d67494c0 decoded to vlan_ioctl_hook_R570c4b11 [] 0x42c8dae0 [<e00000404764b580>] sp=0xe0000040d674fa20 bsp=0xe0000040d6749498 decoded to vlan_ioctl_hook_R570c4b11 [] 0x42c8dae0 [<e00000404764b580>] sp=0xe0000040d674fa20 bsp=0xe0000040d6749470 decoded to vlan_ioctl_hook_R570c4b11 [] 0x42c8dae0 ...
Unable to handle kernel paging request at virtual address 3030203030303030 kswapd[5]: Oops 8813272891392 --> kmem_cache_reap [kernel] 0x570 <-- Pid: 5, comm: kswapd psr : 0000101008022038 ifs : 8000000000000c1a ip : [<e0000000044d0b50>] Not tainted unat: 0000000000000000 pfs : 0000000000000c1a rsc : 0000000000000003 rnat: 80000000ff602939 bsps: e0000000044141c0 pr : 80000000ff602979 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a74433f b0 : e0000000044d0970 b6 : e0000000044141c0 b7 : e00000000440d990 f6 : 0fff6fffffffff0000000 f7 : 0ffe7b800000000000000 f8 : 1000bb800000000000000 f9 : 100078000000000000000 r1 : e000000004cf5760 r2 : 0000000000000000 r3 : e000004046a37d98 r8 : 0000000000000017 r9 : ffffffffffffffff r10 : 0000000000000000 r11 : 0000000000000a98 r12 : e000004046a37e20 r13 : e000004046a30000 r14 : 3030203030303030 r15 : e00000404722b210 r16 : 0000000000000032 r17 : 000000000000242b r18 : e0000040d3608008 r19 : 0000000000000000 r20 : 0000000000000000 r21 : 0000000066666667 r22 : 0000000000000000 r23 : 0000000000000000 r24 : ffffffffffff781a r25 : e0000040fef68058 r26 : 0000000000000000 r27 : e000004046a37e30 r28 : e000004046a37e38 r29 : 0000000000000001 r30 : 0000000000000000 r31 : 0000000000000000 Call Trace: [<e000000004414910>] sp=0xe000004046a37a10 bsp=0xe000004046a313a0 decoded to show_stack [kernel] 0x50 [<e000000004415140>] sp=0xe000004046a37bd0 bsp=0xe000004046a31348 decoded to show_regs [kernel] 0x7c0 [<e00000000442fad0>] sp=0xe000004046a37bf0 bsp=0xe000004046a31320 decoded to die [kernel] 0x190 [<e000000004452580>] sp=0xe000004046a37bf0 bsp=0xe000004046a312c0 decoded to ia64_do_page_fault [kernel] 0x780 [<e00000000440df20>] sp=0xe000004046a37c80 bsp=0xe000004046a312c0 decoded to ia64_leave_kernel [kernel] 0x0 [<e0000000044d0b50>] sp=0xe000004046a37e20 bsp=0xe000004046a311e8 decoded to kmem_cache_reap [kernel] 0x570 [<e0000000044d8820>] sp=0xe000004046a37e30 bsp=0xe000004046a311c8 decoded to do_try_to_free_pages [kernel] 0xa0 [<e0000000044d9150>] sp=0xe000004046a37e30 bsp=0xe000004046a311a0 decoded to kswapd [kernel] 0x330 [<e000000004415f30>] sp=0xe000004046a37e50 bsp=0xe000004046a31168 decoded to arch_kernel_thread [kernel] 0x70 [<e000000004484010>] sp=0xe000004046a37e50 bsp=0xe000004046a31138 decoded to kernel_thread [kernel] 0xd0 [<e0000000048caf30>] sp=0xe000004046a37e50 bsp=0xe000004046a31128 decoded to kswapd_init [kernel] 0x50 ...
Opening bug to HP per Summer to request help from HP Engineering.
Reference Issue Trackers 42071 (HP L3 escalation), 44090 (HP-IPF)
ia64 i-caches are not coherent with respect to processor stores. Â So in general, when mapping an executable page, we have to flush the i-cache to avoid executing stale instructions. Â This flush normally happens in update_mmu_cache(). However, the i-cache IS coherent with respect to DMA. Â So if we DMA over an entire page and subsequently map it as executable, we can skip the flush. Â mark_clean() performs this optimization by setting the PG_arch_1 bit. Â update_mmu_cache() skips the i-cache flush if PG_arch_1 is set. I expect that the effectiveness of this optimization depends on the percentage of DMA-read pages that are subsequently mapped executable. If very few of them are ever executed (as is probably the case for Oracle), the time spent doing mark_clean() is wasted. On the other hand, if we're often reading executable pages from the disk, we can do a lot of mark_clean()s for the cost of a cache flush, so it's probably a win overall. The system should operate correctly either with or without mark_clean(). For general-purpose use, I think we want to keep it, but it might be worthwhile to consider a tunable for systems where almost all DMA reads are for non-executable data. While looking at the code, I noticed that RHEL3 U3 calls mark_clean() while holding the ioc->res_lock(), which is not needed (this is fixed in 2.6 already). Â Before adding a tunable, I'd propose moving the mark_clean() outside the critical section to make sure it's not just a lock contention problem they're seeing. Hope this helps.
Do we have any preliminary patches for us to build test kernels with, for the customer to test, apart from Don's untested patch?
Created attachment 108021 [details] Disable mark_clean in sba_iommu.c to avoid memory corruption.
I am currently working to get this included in the U7 update.
This seems interesting: derry isn't specifying GFP_DMA when allocating pages for just that: dma. I would expect IO failure, rather than the hangs and oppses that are described, so I'm unsure if it would relate here. [snipped from derry vs taroon diff of sba_iommu.c] @@ -947,7 +972,7 @@ sba_alloc_consistent(struct pci_dev *hwd return 0; } - ret = (void *) __get_free_pages(GFP_ATOMIC, get_order(size)); + ret = (void *) __get_free_pages(GFP_ATOMIC|GFP_DMA, get_order(size)); if (ret) { memset(ret, 0, size); @
Larry Woodman pointed out to me that omission of GFP_DMA could potentially cause trouble on ia64 machines that lack an iommu, but most (all?) of the reports I've seen of this are on hp hardware that has an iommu.
Is this IN or OUT of U7?
The patch that disables mark_clean() is in U7.
A fix for this problem has just been committed to the RHEL2.1 U7 patch pool this evening (in kernel version 2.4.18-54.1).
Make that kernel version 2.4.18-55...
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-284.html