From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7 Description of problem: While running the stress-kernel test suite. An error is printed to /var/log/messages file. Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0[expected: 0], irqs_disabled():1 Call Trace:<ffffffff80132297>{__might_sleep+173} <ffffffff80138041>{profile_task_exit+33} <ffffffff8013928a>{do_exit+34} Version-Release number of selected component (if applicable): kernel-2.6.9-25.EL How reproducible: Sometimes Steps to Reproduce: 1. Run the stress-kernel RPM on the x86_64 smp. 2. Monitor the /var/log/messages file. Actual Results: System continues to run normally. The test passes. Trying to figure out if this is an actual problem or it is benign based on the testing. Expected Results: Unknown at this time Additional info: The Following text is from several emails from Dave Anderson: I note also that RHEL3 doesn't make the irqs_disabled() check in its down_read(). But it still seems unusual at best... Since profile_task_exit() is unconditionally called upon every do_exit() regardless whether profiling is being done, this does seem a bit strange that interrupts are disabled: void profile_task_exit(struct task_struct * task) { down_read(&profile_rwsem); notifier_call_chain(&task_exit_notifier, 0, task); up_read(&profile_rwsem); } For that matter, *any* time *any* semaphore is invoked for reading, the might_sleep() check is invoked: static inline void down_read(struct rw_semaphore *sem) { might_sleep(); rwsemtrace(sem,"Entering down_read"); __down_read(sem); rwsemtrace(sem,"Leaving down_read"); } So that implies that semaphores should never be taken when IRQs are disabled -- although I'm not really clear on what the ramifications would be if the process does sleep and then goes on to schedule() with IRQ's disabled. It appears that upon the process switch, it will unconditionally do a spin_unlock_irq(rq->lock) upon rescheduling thenext task. And when it eventually wakes up, it will also do an unconditional spin_unlock_irq(rq->lock), so when it returns back to do_exit() it will be with IRQ's enabled again. But, again, entering do_exit() with IRQs disabled is, at a minimum, extremely rare. AFAICT it's harmless, but it would be interesting to know why that exiting process has IRQ's disabled, and perhaps what the process is. I don't understand why the back trace only goes back as far as do_exit() unless it was called from the x86_64 entry.S code, which is hard to understand but seems to have something to do with kernel threads. For that, the kernel will have to be instrumented, and the symptom reproduced. So I don't know. Jason should probably make the call as to whether it's worth the effort to pursue.
> I don't understand why the back trace only goes > back as far as do_exit() unless it was called from the x86_64 > entry.S code, which is hard to understand but seems to have > something to do with kernel threads As it turns out, this is not related to the kernel-threads-related call to do_exit() in the x86_64 entry.S, but rather the other call to do_exit() in entry.S. This is the key point from the debug kernel: > do_exit: task: crashme-308 parent: crashme-32539 -> exit code -9999 from > entry.S The call to do_exit() came from here in entry.S: iret_label: iretq .section __ex_table,"a" .quad iret_label,bad_iret .previous .section .fixup,"ax" /* force a signal here? this matches i386 behaviour */ /* running with kernel gs */ bad_iret: movq $-9999,%rdi /* better code? */ jmp do_exit .previous which triggered this debug code at the top of do_exit(): --- linux-2.6.9/kernel/exit.c.orig +++ linux-2.6.9/kernel/exit.c @@ -789,6 +789,12 @@ asmlinkage NORET_TYPE void do_exit(long struct task_struct *tsk = current; int group_dead; + if (code == -9999) { + printk("do_exit: task: %s-%d parent: %s-%d -> exit code -9999 from entry.S\n", + current->comm, current->pid, + current->parent->comm, current->parent->pid); + } + profile_task_exit(tsk); if (unlikely(in_interrupt())) My guess is that crashme program had a bogus user-space return address location on its stack, which was subsequently loaded and attempted from the iret instruction with IRQs disabled, faulted, went back through "bad_iret", printed the warning message, and killed the process? Given that it's a crashme-generated issue, which does a good job of modifying its own address space in order to wreak havoc, it wouldn't seem to be a problem.
Following up on his RHEL3 fix, Ernie is working on a RHEL4 patch to prevent this from occuring.
The patch in bug 183489 comment #9 will also address this bug (although the underlying problem is somewhat different). It would be better to keep these two bug reports separate (i.e., don't dup).
committed in stream U4 build 34.12. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/