From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 Description of problem: This debug message was seen while running a proprietary HP virtual memory suite on an rx7620 on RHEL4U3 with the 2.6.9-24 kernel. There was only the debug message, the system did not panic. This is the first I have had access to this system so it is unknown if this is a new issue. I have run this on various smaller HP ia64 systems and have not seen this. I will try to reproduce. Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0[expected: 0], irqs_disabled():1 Call Trace: [<a000000100016b20>] show_stack+0x80/0xa0 sp=e000000593e479c0 bsp=e000000593e41360 [<a000000100016b70>] dump_stack+0x30/0x60 sp=e000000593e47b90 bsp=e000000593e41348 [<a000000100068150>] __might_sleep+0x190/0x260 sp=e000000593e47b90 bsp=e000000593e41320 [<a0000001000616d0>] ia64_do_page_fault+0x110/0xbc0 sp=e000000593e47ba0 bsp=e000000593e412b8 [<a00000010000f540>] ia64_leave_kernel+0x0/0x260 sp=e000000593e47c50 bsp=e000000593e412b8 [<a00000010024eaf0>] __copy_user+0xb0/0x940 sp=e000000593e47e20 bsp=e000000593e411f0 [<a0000001001735f0>] simple_read_from_buffer+0xb0/0x100 sp=e000000593e47e20 bsp=e000000593e411a8 [<a0000001001a0510>] proc_info_read+0xf0/0x160 sp=e000000593e47e20 bsp=e000000593e41160 [<a00000010011fd30>] vfs_read+0x290/0x360 sp=e000000593e47e20 bsp=e000000593e41110 [<a000000100120370>] sys_read+0x70/0xe0 sp=e000000593e47e20 bsp=e000000593e41098 [<a00000010000f3e0>] ia64_ret_from_syscall+0x0/0x20 sp=e000000593e47e30 bsp=e000000593e41098 [<a000000000010640>] 0xa000000000010640 sp=e000000593e48000 bsp=e000000593e41098 Version-Release number of selected component (if applicable): kernel-2.6.9-24.EL How reproducible: Didn't try Steps to Reproduce: 1. proprietary HP test, took about 30 minutes to hit 2. 3. Additional info:
FYI, I was able to reproduce this after another 10-15 minutes of testing.
None of the routines in the trace would set the Interrupt bit in the processor PSR, which is what's being seen, and I don't understand the ia64 processor as to how the hardware might set it during fault handling? This would need an ia64 expert to determine whether it's possible/legitimate.
The only thing I can suggest, now that you can reproduce it, is to perhaps try to find the last RHEL4 kernel that it did *not* happen on (see porkchop.devel.redhat.com:/mnt/redhat/dist/4E-kernel/kernel for all the post-RHEL4-U2 kernels) and try to narrow it down to the precise version that introduced it. There's been a bit of activity in the ia64 area in the proposed RHEL4-U3, and once the suspect kernel is determined, the changelog can be checked for potential patches. Probably the first thing to do would be to run a RHEL4-U2 kernel.
Unfortunatly now I cannot seem to reproduce it. I saw it twice originall however I now realized I did not reboot between those test runs. Now that I have rebooted I have not seen the error. I will keep my eyes open for it and see if I can find a better way to reproduce it.
The strange thing is that the "might sleep" warning is more for the caller's health vs. the system's health. If the semaphore is held and the system blocks, schedule() will do a spin_lock_irq() on the cpu's runqueue lock -- blocking interrupts -- and then schedule a new task to run. The new task will resume executing in schedule(), and the last thing done there is a spin_unlock_irq() on the runqueue lock, enabling interrupts. Therefore when this process is eventually rescheduled, it will come back into life with IRQ's enabled. So while the system can seemingly handle schedule() being called with IRQ's disabled -- and will forcibly disable them anyway while making the task-switch -- it would only be a problem if the original task were expecting to run with IRQs disabled. And that's not the case here. I *think* the warning is for kernel code that *does* programatically disable interrupts, and then tries to take a semaphore. That is bad programming, because if it blocked, was scheduled out, and then rescheduled, the code would resume with IRQs disabled. But in this case, IRQs were *not* disabled by the code trace shown, but rather by some oddity with the ia64 processor. So it appears to be harmless, and would appear to have something to do with ia64 nested TLB miss handling, where the processor can disable interrupts somehow. (Check with Larry Woodman -- he understands this stuff...) My only question is whether we did some recent patch that made that happen. In fact, there is a RHEL4-U3 nested dtlb patch from Intel and upstream, but it dealt with recognizing hugepage nested dtlb misses, whereas the original code presumed that all nested dtlb misses were the standard page size.
FYI, just saw this again on a recent RHEL4 kernel - 2.6.9-34.24. So, not easily reproduced but certianly still there.
seems I can hit this a little more easily on my 64CPU ia64 system. I just saw it with 2.6.9-42.EL. Here is the stack trace. Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0[expected: 0], irqs_disabled():1 Call Trace: [<a000000100016da0>] show_stack+0x80/0xa0 sp=e00007009cba79c0 bsp=e00007009cba1360 [<a000000100016df0>] dump_stack+0x30/0x60 sp=e00007009cba7b90 bsp=e00007009cba1348 [<a00000010006a630>] __might_sleep+0x190/0x260 sp=e00007009cba7b90 bsp=e00007009cba1320 [<a000000100063cf0>] ia64_do_page_fault+0x110/0xbc0 sp=e00007009cba7ba0 bsp=e00007009cba12b8 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260 sp=e00007009cba7c50 bsp=e00007009cba12b8 [<a0000001002530d0>] __copy_user+0xb0/0x940 sp=e00007009cba7e20 bsp=e00007009cba11f0 [<a000000100177c70>] simple_read_from_buffer+0xb0/0x100 sp=e00007009cba7e20 bsp=e00007009cba11a8 [<a0000001001a4db0>] proc_info_read+0xf0/0x160 sp=e00007009cba7e20 bsp=e00007009cba1160 [<a000000100123b70>] vfs_read+0x290/0x360 sp=e00007009cba7e20 bsp=e00007009cba1110 [<a0000001001241b0>] sys_read+0x70/0xe0 sp=e00007009cba7e20 bsp=e00007009cba1098 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20 sp=e00007009cba7e30 bsp=e00007009cba1098 [<a000000000010640>] 0xa000000000010640 sp=e00007009cba8000 bsp=e00007009cba1098
We continue to hit this from time to time so I did a little more investigating. I modified the 2.6.9-42 source so that it would panic instead of just giving the debug message and reproduced the issue. I found something that seems a bit suspicous however between the inline functions and the ia64 asm code it is a bit hard to follow so I may be misinterpreting. What is happening is the user process is reading from /proc. The kernel stack shows we get into proc_info_read which calls simple_read_from_buffer which does a copy_to_user. Then some inline code and ia64 asm magic happens and we end up here: #9 [BSP:e000086fe36f91f0] __copy_user at a000000100253110 include/asm/atomic.h: 45 looking at atomic.h: 45 we see that it is in ia64_atomic_add (so I assume __copy_user was releasing a semaphore? 35 static __inline__ int 36 ia64_atomic_add (int i, atomic_t *v) 37 { 38 __s32 old, new; 39 CMPXCHG_BUGCHECK_DECL 40 41 do { 42 CMPXCHG_BUGCHECK(v); 43 old = atomic_read(v); 44 new = old + i; 45 } while (ia64_cmpxchg(acq, v, old, new, sizeof(atomic_t)) != old); 46 return new; 47 } at this point we hit a page fault which gets to the suspicious bit I noticed. In ia64_do_page_fault the first thing it does is this check: if (in_atomic() || !mm) goto no_context; where "no_context" is basically just some bail out code. Since we hit the page fault while in ia64_atomic_add shouldn't in_atomic() return true? I will post the full bt -fl output of the stack from "crash" as an attachment. I also have a vmcore if anybody wants to take a look at it.
Created attachment 135588 [details] bt -fl crash tool output This is the output from crash bt -fl on a vmcore generated by modifiying the kernel source to panic instead of just printing the debug output. I have the vmunix/vmcore if anybody wants to look at it.
Hi, Doug-san, We(Fujitsu) met this bug in these days. But it happens on the customer's host and we don't have enough information. Do you have the vmcore (in #9) now ? If you still have crash-dump, can we access it ? or please confirm what file your test program "proc_reader" was reading. you can use "files" command for that. From your bt -f output, interrupt-enable frag (14bit of PSR reg.) is dropped. So, some code disabled interrupt before calling copy_user().
I no longer have this dump. I have not seen the probelm recently however I have spent most of my time testing RHEL5. I should start testing RHEL4 again within a couple weeks so perhaps I will see this again. The proc_reader test reads random files from the /proc filesystem so without the core I don't have a way to know which file it was reading when it hit this problem.
Thank you for quick responce. I'll do the same kind of test.
Any news on this issue? Is it reproducible with recent rhel4 kernel, or rhel 5 kernel , or upstream?
I can see this with 2.6.9-55.EL too. Not sure with RHEL5 kernel and upstream kernel.
Could you please share a simple test case to me. I can try to re-produce it on ia64 boxes that I can accesss.
Please share the test case.
Hi, Attaching an IT that shows the same signature twice on different boxes. RHEL4U3(IPF) - kernel-2.6.9-34.EL RHEL4U4(IPF) - kernel-2.6.9-42.0.3.EL each system is running more than 2 year. No reproducer. Flavio (sync'ed sev/prio with IT, marked 4.8.0)
Maybe upgrading to RHEL5 is a lazy solution for this weird problm.. We need a reproducer that can trigger the problem in predicable and reliable manner in bearable long time to make the issue debug-able. So far I don't have. If anyone have a test case that can reproduce the problem in less than one hour (at most 2 hours), please let me know. This is the key to fix the problem for RHEL 4. Another approach is to add IRQ flag trace support to record IRQ flag modification history and related IP that probably can help us find out the exact scenario (or context) from which we could find clue.. But the reproducer is still the key to success..
Updating PM score.
Without a reproducer this report is going to be closed out as wontfix against R4.8.
I believe I have found the bad code that remains in RHEL 4 that causes the problem.. Thanks to a reproducer from NEC, (Although it takes irregular times to get the problem triggered (various, from less than one hour to nearly 24hours), but I always can see it in 24hours on my 4-socket tiger machine. in script_new (arch/ia64/kernel/unwind.c), there is broken code like this: script_new(..) { .... spin_lock_irqsave(&unw.lock, flags); {... ... } spin_unlock(&unw.lock); if(!write_trylock(&script->lock)) return NULL; <-------------It's here where _irqrestore is missed.. Double check with upstream and RHEL5 kernel, the spin_lock_irqsave(restore) only appears in STAT() in unwind.c, and STAT is actually defined as blank in RHEL 4, 5 and upstream... I have clean up the use of the spin_lock_irqsave(restore) in unwind.c, and is testing the kernel , and report back results soon. If it works, I will post patch..
The upstream fix: http://www.gelato.unsw.edu.au/archives/linux-ia64/0410/11488.html http://lia64.bkbits.net:8080/linux-ia64-release-2.6.10/?PAGE=cset&REV=4176083fc7oirb3qCEFcLGnDatNa_A
Committed in 78.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html
We _seem_ to be affected by the same problem on a different architecture (i386), the following error: Feb 3 04:53:51 ebsdb kernel: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Feb 3 04:53:51 ebsdb kernel: in_atomic():0[expected: 0], irqs_disabled():1 Feb 3 04:53:51 ebsdb kernel: [<02120c1d>] __might_sleep+0x7d/0x88 Feb 3 04:53:51 ebsdb kernel: [<0215796c>] rw_vm+0xe4/0x29c Feb 3 04:53:51 ebsdb kernel: [<02131675>] find_pid+0x26/0x3a Feb 3 04:53:51 ebsdb kernel: [<02131675>] find_pid+0x26/0x3a Feb 3 04:53:51 ebsdb kernel: [<02157de3>] get_user_size+0x30/0x57 Feb 3 04:53:51 ebsdb kernel: [<02131675>] find_pid+0x26/0x3a Feb 3 04:53:51 ebsdb kernel: [<0211b5c4>] __is_prefetch+0x1d5/0x2ba Feb 3 04:53:51 ebsdb kernel: [<02138ba8>] search_module_extables+0x5d/0x64 Feb 3 04:53:51 ebsdb kernel: [<02131675>] find_pid+0x26/0x3a Feb 3 04:53:51 ebsdb kernel: [<0211b9f9>] do_page_fault+0x350/0x5f7 Feb 3 04:53:51 ebsdb kernel: [<022d43d9>] __cond_resched+0x14/0x39 Feb 3 04:53:51 ebsdb kernel: [<021442b9>] rmqueue_bulk+0x5b/0x65 Feb 3 04:53:51 ebsdb kernel: [<02144648>] buffered_rmqueue+0x17d/0x1a5 Feb 3 04:53:51 ebsdb kernel: [<0211b6a9>] do_page_fault+0x0/0x5f7 Feb 3 04:53:51 ebsdb kernel: [<02131675>] find_pid+0x26/0x3a Feb 3 04:53:51 ebsdb kernel: [<02131803>] find_task_by_pid_type+0x8/0x1d Feb 3 04:53:51 ebsdb kernel: [<0211e04c>] sched_exit+0x1d/0xbc Feb 3 04:53:51 ebsdb kernel: [<021241ca>] release_task+0xb6/0xfa Feb 3 04:53:51 ebsdb kernel: [<02125d5c>] wait_task_zombie+0x475/0x48b Feb 3 04:53:51 ebsdb kernel: [<021262fd>] do_wait+0x183/0x3b8 Feb 3 04:53:51 ebsdb kernel: [<0211f28b>] default_wake_function+0x0/0xc Feb 3 04:53:51 ebsdb kernel: [<0212dfb9>] sys_rt_sigaction+0x73/0x88 Feb 3 04:53:51 ebsdb kernel: [<0211f28b>] default_wake_function+0x0/0xc Feb 3 04:53:51 ebsdb kernel: [<021265c5>] sys_wait4+0x27/0x2a Feb 3 04:53:51 ebsdb kernel: [<021265db>] sys_waitpid+0x13/0x17 Feb 3 04:53:51 ebsdb kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Feb 3 04:53:51 ebsdb kernel: printing eip: Feb 3 04:53:51 ebsdb kernel: 02131675 Feb 3 04:53:51 ebsdb kernel: *pde = 00004001 Feb 3 04:53:51 ebsdb kernel: Oops: 0000 [#1] Feb 3 04:53:51 ebsdb kernel: SMP Feb 3 04:53:51 ebsdb kernel: Modules linked in: mptctl mptbase hpilo(U) nfsd exportfs autofs4 nfs lockd nfs_acl sunrpc 8021q dm_mirror dm_round_robin dm_multipath button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 bonding(U) floppy sg ext3 jbd dm_mod cciss sd_mod qla2xxx(U) scsi_mod qla2xxx_conf(U) Feb 3 04:53:51 ebsdb kernel: CPU: 0 Feb 3 04:53:51 ebsdb kernel: EIP: 0060:[<02131675>] Not tainted VLI Feb 3 04:53:51 ebsdb kernel: EFLAGS: 00010086 (2.6.9-89.0.18.ELhugemem) Feb 3 04:53:51 ebsdb kernel: EIP is at find_pid+0x26/0x3a Feb 3 04:53:51 ebsdb kernel: eax: 0f1e1000 ebx: 00002fcf ecx: 00000000 edx: c1e7586c Feb 3 04:53:51 ebsdb kernel: esi: f3581430 edi: 00000000 ebp: c1259ed0 esp: c1259eac Feb 3 04:53:51 ebsdb kernel: ds: 007b es: 007b ss: 0068 Feb 3 04:53:51 ebsdb kernel: Process hpetfe (pid: 12239, threadinfo=c1259000 task=c1e757b0) Feb 3 04:53:51 ebsdb kernel: Stack: 00000000 02131803 f3581430 0211e04c f3581430 f3581430 f3581430 f3581430 Feb 3 04:53:51 ebsdb kernel: 00000000 00000000 021241ca f3581430 00002fd3 00000000 00000000 02125d5c Feb 3 04:53:51 ebsdb kernel: 03000000 00000000 00000003 00000000 a0ff8080 0011a6e2 39e805b0 c1e757b0 Feb 3 04:53:51 ebsdb kernel: Call Trace: Feb 3 04:53:51 ebsdb kernel: [<02131803>] find_task_by_pid_type+0x8/0x1d Feb 3 04:53:51 ebsdb kernel: [<0211e04c>] sched_exit+0x1d/0xbc Feb 3 04:53:51 ebsdb kernel: [<021241ca>] release_task+0xb6/0xfa Feb 3 04:53:51 ebsdb kernel: [<02125d5c>] wait_task_zombie+0x475/0x48b Feb 3 04:53:51 ebsdb kernel: [<021262fd>] do_wait+0x183/0x3b8 Feb 3 04:53:51 ebsdb kernel: [<0211f28b>] default_wake_function+0x0/0xc Feb 3 04:53:51 ebsdb kernel: [<0212dfb9>] sys_rt_sigaction+0x73/0x88 Feb 3 04:53:51 ebsdb kernel: [<0211f28b>] default_wake_function+0x0/0xc Feb 3 04:53:51 ebsdb kernel: [<021265c5>] sys_wait4+0x27/0x2a Feb 3 04:53:51 ebsdb kernel: [<021265db>] sys_waitpid+0x13/0x17 Feb 3 04:53:51 ebsdb kernel: Code: c8 ff 5b 5e c3 53 b9 20 00 00 00 8b 04 85 84 fe 43 02 2b 0d 94 fe 43 02 89 d3 69 d2 01 00 37 9e d3 ea 8b 14 90 85 d2 74 12 8b 0a <0f> 18 01 90 39 5a fc 8d 42 fc 74 06 89 ca eb ea 31 c0 5b c3 55 Feb 3 04:53:51 ebsdb kernel: <0>Fatal exception: panic in 5 seconds This issue has occurred several times in the past months (since the server was re-installed with RHEL4 instead of RHEL3). This is a HP Proliant DL585 G1 server (RHEL 4 update 9) with the following kernel: Linux ebsdb 2.6.9-89.0.18.ELhugemem #1 SMP Wed Nov 25 06:13:02 EST 2009 i686 athlon i386 GNU/Linux Further specs: 2 dual-core AMD Opteron 848 processors, 24GB memory (24 * 1GB dimms, ECC).