Red Hat Bugzilla – Bug 143875
need noapic, or else keventd spins wildly
Last modified: 2007-11-30 17:07:05 EST
Description of problem:
Something regressed in the 2.4.21-27(.0.1).EL kernel. Between then and
2.4.21-15.EL, I'd noticed that keventd started spinning on a cpu.
Going back to 2.4.21-15.EL makes the problem go away.
If I specify "noapic" as a boot parameter, then keventd behaves again,
but that really shouldn't be necessary.
Version-Release number of selected component (if applicable):
always, at least on the amd64 machine I've got
Steps to Reproduce:
1. See above
keventd shouldn't spin on a cpu, even w/o "noapic"
I am seeing similar behavior on several 2.4.21-32.EL x86_64 systems. keventd
will start running out of control. I can't reproduce it on call, but I will try
to get more info when it comes up again. Any requests for info to gather?
My U5 systems were seeing this consistantly, however after passing the noapic
option to the kernel as suggested the issue disappeared. Any explanation for
Created attachment 119972 [details]
Alt+SysRq+T when keventd spins
The above messages were generated with a SUN w2100z machine, with the latest BIOS.
Jim, let me know whether you need more information.
This is interesting... according to the attached logfiles, the reporter
attempted to turn off APIC by specifying "apic=off". The correct parameter to
use for this purpose is "noapic". In theory, "apic=off" should have no effect
whatsoever, yet the reporter claims that it makes a difference.
I really could use a sysreport on the affected system for more data. I could
also use some indications as to how I can reproduce. What platforms does this
problem occur on? Does it need a certain system load or application mix to kick
this off? What are the BIOS revs for the affected platforms?
I don't see any evidence of the string apic=off in any files that I can see
attached to this issue. Everyone is saying they used noapic to make things
better. Or perhaps there are non-public files?
Putting into NEEDINFO state. Please answer questions in comment #14.
Looking over the sysreport, the sheer number of ACPI interrupts suggests that
someone is "holding the button down" as it were. I suspect it's a polarity or
edge-versus-level triggering issue. Continuing to investigate.
I'm still waiting for access to hardware on which this problem can be
reproduced. If anyone has sent us a w2100z I haven't seen it...
Hardware needed to reproduce this problem is on a FedEx truck for delivery to
the Raleigh office today. We will provide an update when it is installed and
A W2100z has been procured and is available for you. Please see Comment #36 for
instructions on how to access it.
Do any of the reporters use machines other than SUN x86-64 machines, and more
precisely, "W2100z" workstations?
One of the other support engineers and I were talking about this issue, and one
of the items that came up was that we thought perhaps the nvidia module might be
registering a callback for a custom ACPI event, which was why our local system
here hasn't seen the problem yet. Above and beyond that we thought it would be
nice to know what exactly the acpi event was that was taking up so much of
keventd's time. To this end, I think we can discover the answer to these
questions, if we had a sysrq-c generated vmcore of one of the systems
experiencing high keventd utilization. With a vmcore, we could walk the
tq_structs on the keventd task list to see which structs call
acpi_os_execute_deferred. From there we can see what data pointer they pass to
the function, which should represent (amont other items) the callback function
registered for that event. This should give us some idea of which ACPI event is
being generated so often, and what part of the kernel is handling it (nvidia
module, other module, kernel proper, etc.).
A sysrq-c is prefered here, but if need be I can build an instrumented kernel to
print out the callback pointer when triggered.
Based on comments in IT89256, I'm beginning to suspect it has something to do
with the ACPI method that handles the SMBALERT interrupt. The output of the
instrumented kernel shows a giant flood of calls to the _L26 method for GPE 0.
This suggests that the interrupt is not getting handled and the status is not
The folks at sunsupport produced an instrumented kernel which shows the flood of
calls to this method. It *doesn't* show any error returns from the method
invocation, so I believe that the method *is* getting called, but is returning
success without doing the right thing. This means that either the OS is
invoking the method incorrectly, or the method itself has a bug.
Is there sufficient information in the meantime to
a) confirm this is an OS issue
b) confirm this is a w2100z issue ?
If we have the hardware in-house I will do a quick binary search to find the
problem. I'll take the kernels between 2.4.21-15 and 2.4.21-27 and determine
which one cause this problem. There are only a handfull of x86_64 changes
between those 2 kernels that could have caused this problem.
I have been poking at this issue remotely from Westford to the system in RDU,
and have determined that the problem cropped up between 2.4.21-15 and 2.4.21-19.
We are arranging to have a W2100z sent directly to Westford so I can chase this
problem down more efficiently.
To facilitate debugging, I just received a w2100z from Sun. I started digging
in. There appears to have been significant changes in ACPI SCI setup and
handling between U2 (.15) and U3 (.19), which is where this problem starts to
occur. I noticed that using overrides to set up the SCI interrupt as
edge-triggered active-high made the problem go away, but this is exactly the
opposite of what the ACPI spec says the interrupt should be. Continuing to
I've done a bunch of digging around in response to what is described in Comment
55, and this is what I come up with:
When I generate a thermal event by disconnecting the case fan, an ACPI event is
generated on GPE 22 and handled via acpi_irq(). Further down the call chain,
acpi_ev_gpe_dispatch() is called which queues the control method to be executed
on behalf of GPE 22. It then calls acpi_hw_clear_gpe() to clear the
Once the interrupt handler is done the queued control method is invoked via
acpi_ev_asynch_execute_gpe_method(). Once it evaluates the method it too calls
acpi_hw_clear_gpe() to clear the level-triggered event.
The question I have is: what *else* needs to be done in order to clear this
event? Read a particular register from a particular chip? I thought ACPI was
supposed to abstract this all away; the handler method *should* take care of this...
I have done some more experimenting, and here is what I have found regarding the
behavior of different RHEL releases:
- 32-bit RHEL3 does not show this issue at all, mainly because it has *very*
limited ACPI support (basically only for a few system-configuration things).
- 64-bit RHEL3 does not show this issue prior to Update 3. In Update 3 we added
code to the ACPI driver to set up SCI handling.
- 64-bit RHEL4 does not show this issue in this way. When I run my usual test
case (boot the system, then pull the case fan to generate a thermal event) the
system takes the event, reports the temperature out of spec (68C) and shuts down.
After some more poking around, I find that RHEL4 builds the ACPI thermal driver
into the kernel (CONFIG_ACPI_THERMAL=y) wherease RHEL3 builds it as a module
(CONFIG_ACPI_THERMAL=m). If I go back and boot up RHEL3 and do a "modprobe
thermal" and *then* pull the case fan, the system shuts down just as RHEL4 does.
I'll do some more investigating to see what an appropriate solution would be.
In the meantime, adding "modprobe thermal" to your system startup files would be
one way to work around the issue.
Steffen, RHEL3 is now closed. Please ask the customer to use the work-around
in comment #68 or upgrade to RHEL4.