175189 – Debug: sleeping function called from invalid context at include/linux/rwsem.h:43

Bug 175189 - Debug: sleeping function called from invalid context at include/linux/rwsem.h:43

Summary: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	ia64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Luming Yu
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	391511 456483 461297
TreeView+	depends on / blocked

Reported:	2005-12-07 15:29 UTC by Doug Chapman
Modified:	2018-10-20 02:52 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-05-18 19:20:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
bt -fl crash tool output (7.20 KB, text/plain) 2006-09-05 16:33 UTC, Doug Chapman	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1024	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update	2009-05-18 14:57:26 UTC

Description Doug Chapman 2005-12-07 15:29:45 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
This debug message was seen while running a proprietary HP virtual memory suite on an rx7620 on RHEL4U3 with the 2.6.9-24 kernel.  There was only the debug message, the system did not panic.  This is the first I have had access to this system so it is unknown if this is a new issue.  I have run this on various smaller HP ia64 systems and have not seen this.

I will try to reproduce.

Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0[expected: 0], irqs_disabled():1

Call Trace:
 [<a000000100016b20>] show_stack+0x80/0xa0
                                sp=e000000593e479c0 bsp=e000000593e41360
 [<a000000100016b70>] dump_stack+0x30/0x60
                                sp=e000000593e47b90 bsp=e000000593e41348
 [<a000000100068150>] __might_sleep+0x190/0x260
                                sp=e000000593e47b90 bsp=e000000593e41320
 [<a0000001000616d0>] ia64_do_page_fault+0x110/0xbc0
                                sp=e000000593e47ba0 bsp=e000000593e412b8
 [<a00000010000f540>] ia64_leave_kernel+0x0/0x260
                                sp=e000000593e47c50 bsp=e000000593e412b8
 [<a00000010024eaf0>] __copy_user+0xb0/0x940
                                sp=e000000593e47e20 bsp=e000000593e411f0
 [<a0000001001735f0>] simple_read_from_buffer+0xb0/0x100
                                sp=e000000593e47e20 bsp=e000000593e411a8
 [<a0000001001a0510>] proc_info_read+0xf0/0x160
                                sp=e000000593e47e20 bsp=e000000593e41160
 [<a00000010011fd30>] vfs_read+0x290/0x360
                                sp=e000000593e47e20 bsp=e000000593e41110
 [<a000000100120370>] sys_read+0x70/0xe0
                                sp=e000000593e47e20 bsp=e000000593e41098
 [<a00000010000f3e0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e000000593e47e30 bsp=e000000593e41098
 [<a000000000010640>] 0xa000000000010640
                                sp=e000000593e48000 bsp=e000000593e41098


Version-Release number of selected component (if applicable):
kernel-2.6.9-24.EL

How reproducible:
Didn't try

Steps to Reproduce:
1. proprietary HP test, took about 30 minutes to hit 
2.
3.
  

Additional info:

Comment 1 Doug Chapman 2005-12-07 15:47:35 UTC

FYI, I was able to reproduce this after another 10-15 minutes of testing.

Comment 2 Dave Anderson 2005-12-07 20:12:38 UTC

None of the routines in the trace would set the Interrupt bit in the
processor PSR, which is what's being seen, and I don't understand the
ia64 processor as to how the hardware might set it during fault handling?
This would need an ia64 expert to determine whether it's possible/legitimate.

Comment 3 Dave Anderson 2005-12-07 20:36:28 UTC

The only thing I can suggest, now that you can reproduce it, is to
perhaps try to find the last RHEL4 kernel that it did *not* happen
on (see porkchop.devel.redhat.com:/mnt/redhat/dist/4E-kernel/kernel
for all the post-RHEL4-U2 kernels) and try to narrow it down to
the precise version that introduced it.  There's been a bit of
activity in the ia64 area in the proposed RHEL4-U3, and once the
suspect kernel is determined, the changelog can be checked for
potential patches.

Probably the first thing to do would be to run a RHEL4-U2 kernel.

Comment 4 Doug Chapman 2005-12-08 16:38:53 UTC

Unfortunatly now I cannot seem to reproduce it.  I saw it twice originall
however I now realized I did not reboot between those test runs.  Now that I
have rebooted I have not seen the error.  I will keep my eyes open for it and
see if I can find a better way to reproduce it.

Comment 5 Dave Anderson 2005-12-08 17:05:03 UTC

The strange thing is that the "might sleep" warning is more for the
caller's health vs. the system's health.  

If the semaphore is held and the system blocks, schedule() will
do a spin_lock_irq() on the cpu's runqueue lock -- blocking interrupts -- and
then schedule a new task to run.  The new task will resume executing in
schedule(), and the last thing done there is a spin_unlock_irq() on the
runqueue lock, enabling interrupts.  Therefore when this process is eventually
rescheduled, it will come back into life with IRQ's enabled.  So while the
system can seemingly handle schedule() being called with IRQ's disabled -- and
will forcibly disable them anyway while making the task-switch -- it would only
be a problem if the original task were expecting to run with IRQs disabled.
And that's not the case here.

I *think* the warning is for kernel code that *does* programatically disable
interrupts, and then tries to take a semaphore.  That is bad programming,
because if it blocked, was scheduled out, and then rescheduled, the code would
resume with IRQs disabled.  But in this case, IRQs were *not* disabled by
the code trace shown, but rather by some oddity with the ia64 processor.

So it appears to be harmless, and would appear to have something to do with
ia64 nested TLB miss handling, where the processor can disable interrupts
somehow.  (Check with Larry Woodman -- he understands this stuff...)  

My only question is whether we did some recent patch that made that happen.
In fact, there is a RHEL4-U3 nested dtlb patch from Intel and upstream, but it
dealt with recognizing hugepage nested dtlb misses, whereas the original code
presumed that all nested dtlb misses were the standard page size.

Comment 6 Doug Chapman 2006-05-02 20:46:24 UTC

FYI, just saw this again on a recent RHEL4 kernel - 2.6.9-34.24.  So, not easily
reproduced but certianly still there.

Comment 7 Doug Chapman 2006-07-19 16:15:28 UTC

seems I can hit this a little more easily on my 64CPU ia64 system.  I just saw
it with 2.6.9-42.EL.  Here is the stack trace.

 Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0[expected: 0], irqs_disabled():1

Call Trace:
 [<a000000100016da0>] show_stack+0x80/0xa0
                                sp=e00007009cba79c0 bsp=e00007009cba1360
 [<a000000100016df0>] dump_stack+0x30/0x60
                                sp=e00007009cba7b90 bsp=e00007009cba1348
 [<a00000010006a630>] __might_sleep+0x190/0x260
                                sp=e00007009cba7b90 bsp=e00007009cba1320
 [<a000000100063cf0>] ia64_do_page_fault+0x110/0xbc0
                                sp=e00007009cba7ba0 bsp=e00007009cba12b8
 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
                                sp=e00007009cba7c50 bsp=e00007009cba12b8
 [<a0000001002530d0>] __copy_user+0xb0/0x940
                                sp=e00007009cba7e20 bsp=e00007009cba11f0
 [<a000000100177c70>] simple_read_from_buffer+0xb0/0x100
                                sp=e00007009cba7e20 bsp=e00007009cba11a8
 [<a0000001001a4db0>] proc_info_read+0xf0/0x160
                                sp=e00007009cba7e20 bsp=e00007009cba1160
 [<a000000100123b70>] vfs_read+0x290/0x360
                                sp=e00007009cba7e20 bsp=e00007009cba1110
 [<a0000001001241b0>] sys_read+0x70/0xe0
                                sp=e00007009cba7e20 bsp=e00007009cba1098
 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00007009cba7e30 bsp=e00007009cba1098
 [<a000000000010640>] 0xa000000000010640
                                sp=e00007009cba8000 bsp=e00007009cba1098

Comment 8 Doug Chapman 2006-09-05 16:31:30 UTC

We continue to hit this from time to time so I did a little more investigating.
 I modified the 2.6.9-42 source so that it would panic instead of just giving
the debug message and reproduced the issue.

I found something that seems a bit suspicous however between the inline
functions and the ia64 asm code it is a bit hard to follow so I may be
misinterpreting.

What is happening is the user process is reading from /proc.  The kernel stack
shows we get into proc_info_read which calls simple_read_from_buffer which does
a copy_to_user.  Then some inline code and ia64 asm magic happens and we end up
here:

 #9 [BSP:e000086fe36f91f0] __copy_user at a000000100253110
    include/asm/atomic.h: 45

looking at atomic.h: 45 we see that it is in ia64_atomic_add (so I assume
__copy_user was releasing a semaphore?

     35 static __inline__ int
     36 ia64_atomic_add (int i, atomic_t *v)
     37 {
     38         __s32 old, new;
     39         CMPXCHG_BUGCHECK_DECL
     40
     41         do {
     42                 CMPXCHG_BUGCHECK(v);
     43                 old = atomic_read(v);
     44                 new = old + i;
     45         } while (ia64_cmpxchg(acq, v, old, new, sizeof(atomic_t)) != old);
     46         return new;
     47 }

at this point we hit a page fault which gets to the suspicious bit I noticed. 
In ia64_do_page_fault the first thing it does is this check:

        if (in_atomic() || !mm)
                goto no_context;

where "no_context" is basically just some bail out code.  Since we hit the page
fault while in ia64_atomic_add shouldn't in_atomic() return true?

I will post the full bt -fl output of the stack from "crash" as an attachment. 
I also have a vmcore if anybody wants to take a look at it.

Comment 9 Doug Chapman 2006-09-05 16:33:51 UTC

Created attachment 135588 [details]
bt -fl crash tool output

This is the output from crash bt -fl on a vmcore generated by modifiying the
kernel source to panic instead of just printing the debug output.  I have the
vmunix/vmcore if anybody wants to look at it.

Comment 10 KAMEZAWA Hiroyuki 2007-01-19 04:55:41 UTC

Hi, Doug-san,

We(Fujitsu) met this bug in these days. But it happens on the customer's host
and we don't have enough information.
Do you have the vmcore (in #9) now ?

If you still have crash-dump, can we access it ?
or
please confirm what file your test program "proc_reader" was reading.
you can use "files" command for that.

From your bt -f output, interrupt-enable frag (14bit of PSR reg.) is dropped.
So, some code disabled interrupt before calling copy_user().

Comment 11 Doug Chapman 2007-01-19 16:09:55 UTC

I no longer have this dump.  I have not seen the probelm recently however I have
spent most of my time testing RHEL5.  I should start testing RHEL4 again within
a couple weeks so perhaps I will see this again.

The proc_reader test reads random files from the /proc filesystem so without the
core I don't have a way to know which file it was reading when it hit this problem.

Comment 12 KAMEZAWA Hiroyuki 2007-01-20 02:30:38 UTC

Thank you for quick responce.
I'll do the same kind of test.

Comment 13 Luming Yu 2007-07-27 16:07:20 UTC

Any news on this issue? Is it reproducible with recent rhel4 kernel, or rhel 5
kernel , or upstream?

Comment 14 Kiyoshi Ueda 2007-07-31 15:57:41 UTC

I can see this with 2.6.9-55.EL too.
Not sure with RHEL5 kernel and upstream kernel.

Comment 15 Luming Yu 2007-08-13 06:20:52 UTC

Could you please share a simple test case to me. I can try to re-produce it on
ia64 boxes that I can accesss.

Comment 16 Luming Yu 2007-08-27 13:40:25 UTC

Please share the test case.

Comment 17 Flavio Leitner 2008-06-17 14:23:23 UTC

Hi,

Attaching an IT that shows the same signature twice on different boxes.
RHEL4U3(IPF) - kernel-2.6.9-34.EL
RHEL4U4(IPF) - kernel-2.6.9-42.0.3.EL
each system is running more than 2 year.

No reproducer.

Flavio
(sync'ed sev/prio with IT, marked 4.8.0)

Comment 18 Luming Yu 2008-06-17 15:04:20 UTC

Maybe upgrading to RHEL5 is a lazy solution for this weird problm..
We need a reproducer that can trigger the problem in predicable and reliable
manner in bearable long time to make the issue debug-able. 
So far I don't have. If anyone have a test case that can reproduce the problem
in less than one hour (at most 2 hours), please let me know. This is the key to
fix the problem for RHEL 4. 
Another approach is to add IRQ flag trace support to record IRQ flag
modification history and related IP that probably can help us find out the exact
scenario (or context) from which we could find clue.. But the reproducer is
still the key to  success..

Comment 19 RHEL Program Management 2008-09-03 12:55:37 UTC

Updating PM score.

Comment 20 Peter Martuccelli 2008-09-22 17:52:29 UTC

Without a reproducer this report is going to be closed out as wontfix against R4.8.

Comment 21 Luming Yu 2008-09-25 13:05:07 UTC

I believe I have found the bad code that remains in RHEL 4 that causes the problem.. Thanks to a reproducer from NEC, (Although it takes irregular times to get the problem triggered (various, from less than one hour to nearly 24hours), but I always can see it in 24hours on my 4-socket tiger machine. 

in script_new (arch/ia64/kernel/unwind.c), there is broken code like this:

script_new(..)
{
....

spin_lock_irqsave(&unw.lock, flags);
{...
...
}
spin_unlock(&unw.lock);
if(!write_trylock(&script->lock))
   return NULL;    <-------------It's here where _irqrestore is missed..

Double check with upstream and  RHEL5 kernel, the spin_lock_irqsave(restore) only appears in STAT() in unwind.c, and STAT is actually defined as blank in RHEL 4, 5 and upstream... 

I have clean up the use of the spin_lock_irqsave(restore) in unwind.c, and is testing the kernel , and report back results soon. If it works, I will post patch..

Comment 22 Luming Yu 2008-09-26 01:47:00 UTC

The upstream fix:

http://www.gelato.unsw.edu.au/archives/linux-ia64/0410/11488.html

http://lia64.bkbits.net:8080/linux-ia64-release-2.6.10/?PAGE=cset&REV=4176083fc7oirb3qCEFcLGnDatNa_A

Comment 24 Vivek Goyal 2009-01-07 14:36:19 UTC

Committed in 78.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 30 errata-xmlrpc 2009-05-18 19:20:07 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Comment 32 Sander 2010-02-03 12:43:13 UTC

We _seem_ to be affected by the same problem on a different architecture
(i386), the following error:

Feb  3 04:53:51 ebsdb kernel: Debug: sleeping function called from invalid
context at include/linux/rwsem.h:43
Feb  3 04:53:51 ebsdb kernel: in_atomic():0[expected: 0], irqs_disabled():1
Feb  3 04:53:51 ebsdb kernel:  [<02120c1d>] __might_sleep+0x7d/0x88
Feb  3 04:53:51 ebsdb kernel:  [<0215796c>] rw_vm+0xe4/0x29c
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<02157de3>] get_user_size+0x30/0x57
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<0211b5c4>] __is_prefetch+0x1d5/0x2ba
Feb  3 04:53:51 ebsdb kernel:  [<02138ba8>] search_module_extables+0x5d/0x64
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<0211b9f9>] do_page_fault+0x350/0x5f7
Feb  3 04:53:51 ebsdb kernel:  [<022d43d9>] __cond_resched+0x14/0x39
Feb  3 04:53:51 ebsdb kernel:  [<021442b9>] rmqueue_bulk+0x5b/0x65
Feb  3 04:53:51 ebsdb kernel:  [<02144648>] buffered_rmqueue+0x17d/0x1a5
Feb  3 04:53:51 ebsdb kernel:  [<0211b6a9>] do_page_fault+0x0/0x5f7
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<02131803>] find_task_by_pid_type+0x8/0x1d
Feb  3 04:53:51 ebsdb kernel:  [<0211e04c>] sched_exit+0x1d/0xbc
Feb  3 04:53:51 ebsdb kernel:  [<021241ca>] release_task+0xb6/0xfa
Feb  3 04:53:51 ebsdb kernel:  [<02125d5c>] wait_task_zombie+0x475/0x48b
Feb  3 04:53:51 ebsdb kernel:  [<021262fd>] do_wait+0x183/0x3b8
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<0212dfb9>] sys_rt_sigaction+0x73/0x88
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<021265c5>] sys_wait4+0x27/0x2a
Feb  3 04:53:51 ebsdb kernel:  [<021265db>] sys_waitpid+0x13/0x17
Feb  3 04:53:51 ebsdb kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000000
Feb  3 04:53:51 ebsdb kernel:  printing eip:
Feb  3 04:53:51 ebsdb kernel: 02131675
Feb  3 04:53:51 ebsdb kernel: *pde = 00004001
Feb  3 04:53:51 ebsdb kernel: Oops: 0000 [#1]
Feb  3 04:53:51 ebsdb kernel: SMP 
Feb  3 04:53:51 ebsdb kernel: Modules linked in: mptctl mptbase hpilo(U) nfsd
exportfs autofs4 nfs lockd nfs_acl sunrpc 8021q dm_mirror dm_round_robin
dm_multipath button battery ac ohci_hcd hw_random k8_edac edac_mc tg3
bonding(U) floppy sg ext3 jbd dm_mod cciss sd_mod qla2xxx(U) scsi_mod
qla2xxx_conf(U)
Feb  3 04:53:51 ebsdb kernel: CPU:    0
Feb  3 04:53:51 ebsdb kernel: EIP:    0060:[<02131675>]    Not tainted VLI
Feb  3 04:53:51 ebsdb kernel: EFLAGS: 00010086   (2.6.9-89.0.18.ELhugemem) 
Feb  3 04:53:51 ebsdb kernel: EIP is at find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel: eax: 0f1e1000   ebx: 00002fcf   ecx: 00000000  
edx: c1e7586c
Feb  3 04:53:51 ebsdb kernel: esi: f3581430   edi: 00000000   ebp: c1259ed0  
esp: c1259eac
Feb  3 04:53:51 ebsdb kernel: ds: 007b   es: 007b   ss: 0068
Feb  3 04:53:51 ebsdb kernel: Process hpetfe (pid: 12239, threadinfo=c1259000
task=c1e757b0)
Feb  3 04:53:51 ebsdb kernel: Stack: 00000000 02131803 f3581430 0211e04c
f3581430 f3581430 f3581430 f3581430 
Feb  3 04:53:51 ebsdb kernel:        00000000 00000000 021241ca f3581430
00002fd3 00000000 00000000 02125d5c 
Feb  3 04:53:51 ebsdb kernel:        03000000 00000000 00000003 00000000
a0ff8080 0011a6e2 39e805b0 c1e757b0 
Feb  3 04:53:51 ebsdb kernel: Call Trace:
Feb  3 04:53:51 ebsdb kernel:  [<02131803>] find_task_by_pid_type+0x8/0x1d
Feb  3 04:53:51 ebsdb kernel:  [<0211e04c>] sched_exit+0x1d/0xbc
Feb  3 04:53:51 ebsdb kernel:  [<021241ca>] release_task+0xb6/0xfa
Feb  3 04:53:51 ebsdb kernel:  [<02125d5c>] wait_task_zombie+0x475/0x48b
Feb  3 04:53:51 ebsdb kernel:  [<021262fd>] do_wait+0x183/0x3b8
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<0212dfb9>] sys_rt_sigaction+0x73/0x88
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<021265c5>] sys_wait4+0x27/0x2a
Feb  3 04:53:51 ebsdb kernel:  [<021265db>] sys_waitpid+0x13/0x17
Feb  3 04:53:51 ebsdb kernel: Code: c8 ff 5b 5e c3 53 b9 20 00 00 00 8b 04 85
84 fe 43 02 2b 0d 94 fe 43 02 89 d3 69 d2 01 00 37 9e d3 ea 8b 14 90 85 d2 74
12 8b 0a <0f> 18 01 90 39 5a fc 8d 42 fc 74 06 89 ca eb ea 31 c0 5b c3 55 
Feb  3 04:53:51 ebsdb kernel:  <0>Fatal exception: panic in 5 seconds

This issue has occurred several times in the past months (since the server was
re-installed with RHEL4 instead of RHEL3).

This is a HP Proliant DL585 G1 server (RHEL 4 update 9) with the following
kernel:

Linux ebsdb 2.6.9-89.0.18.ELhugemem #1 SMP Wed Nov 25 06:13:02 EST 2009 i686
athlon i386 GNU/Linux

Further specs:

2 dual-core AMD Opteron 848 processors, 24GB memory (24 * 1GB dimms, ECC).

Note You need to log in before you can comment on or make changes to this bug.