Red Hat Bugzilla – Bug 430452
kernel hangs when hitting sysrq-w repeatedly
Last modified: 2008-09-04 12:31:57 EDT
Description of problem:
Kernel hangs when hitting sysrq-w repeatedly. It seems deadlock between sysrq-w
and haldaemon's CD-ROM drive polling.
/* IRQ disabled */
/* call_lock taken */
while(atomic_read(&data.started) != cpus)
/* waiting for IPI response */
/* waiting for call_lock */
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Check whether haldaemon is running
3.Hit alt-sysrq-w repeatedly
Shows result of sysrq-w normally.
It also occurs on Red Hat Enterprise Linux 5.1 (ex. by hitting sysrq-w
repeatedly when mount/umount done repeatedly).
It can be avoided by changing spin_lock(&call_lock) to spin_trylock(&call_lock)
Could you please try a kernel package and provide some test report? They are
available on: http://people.redhat.com/ivecera/rhel-4-ivtest/
I could test, but is corresponding patch or SRPM available? I would like to
comprehend how it is treated for doing the test.
Created attachment 307045 [details]
Ok, I'm putting the patch that solves this issue. I tried to reproduce the bug
on kernel-smp-2.6.9-67.EL with "success". The problem is that the
smp_call_function is called when IRQs are disabled.
The upstream introduced similar functionality for SysRq+L (see
h=5045bcae0fb466a1dbb6af0036e56901fd7aafb7) but this one doesn't use
smp_call_function directly but uses schedule_work. The same approach I used in
I did some tests by myself and the problem seems to be solved but I would like
to ask you for some testing. The patched kernels (for i686 and x86_64) are
located at: http://people.redhat.com/ivecera/rhel-4-ivtest/
I've tested kernel-smp-2.6.9-70.EL.ivtest.3.i686.rpm and also applying your
patch to my kernel manually, and confirmed that problem doesn't reproduce
anymore. I think this issue is solved now. Thanks for your reply.
Unfortunately the proposed patch was rejected by other engineers. The reason is
the sysrq-w was designed to run in interrupt context anf has always been a "use
in case of an emergency" option. It should only be used by an
administrator/service personnel with console access if the the system is already
frozen in some manner. It was never meant to be beaten on continuously as you
are doing; if you do that, eventually you will catch another cpu at just the
This feature will be removed in RHEL-6.
Created attachment 310406 [details]
sysrq-w deadlock fix patch
OK, then how about just changing spin_lock() to spin_trylock() in
smp_call_function()? This still runs in interrupt context. (FYI: Attached patch
is how I avoided this problem on x86/x86_64 doing so.) This problem may occur
just hitting sysrq-w once if it was done in exact timing. So, I think it should
be fixed somehow until it is removed...
Your patch probably solves this issue but it is completely out of upstream. Our engineers don't want to fix it this way. The sysrq-w functionality was mainly used as RedHat debugging tool in former times and will be removed.