232525 – X60s does not boot with nmi_watchdog=1

Bug 232525 - X60s does not boot with nmi_watchdog=1

Summary: X60s does not boot with nmi_watchdog=1

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Brian Maly
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-03-15 23:22 UTC by Matthew Booth
Modified:	2007-12-11 19:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-12-11 19:16:18 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Matthew Booth 2007-03-15 23:22:18 UTC

Description of problem:
If I add nmi_watchdog=1 to the kernel command line when booting, it hangs
forever at:

ACPI: Found ECDT

This is *almost* 100% guaranteed. There seems to be a rain dance you can do to
get it to boot. It goes something like:

* Remove the power cord
* Boot the laptop
* Wait for grub
* Re-insert power cord while grub is still running

However once the laptop is up, if you remove the power cord, reinserting it will
cause the laptop to hang immediately and automatically reboot after a few
seconds. Removing nmi_watchdog=1 causes the problem to go away. The xen kernel
boots fine with or without nmi_watchdog=1.

Version-Release number of selected component (if applicable):
kernel-2.6.18-8.1.1.el5

Comment 1 Patrick C. F. Ernzer 2007-03-28 09:33:20 UTC

same hang on X60 with Core2Duo (type 1706-GMG)
did not bother trying the rain dance

Comment 2 Patrick C. F. Ernzer 2007-03-28 09:34:27 UTC

Forgot to add in Comment #1:
This is with the x86_64 version of RHEL5

Comment 3 Jarod Wilson 2007-09-12 16:05:06 UTC

Does this still occur with the latest RHEL5.1 beta kernels? If so, can the
problem be reproduced with a recent Fedora kernel? (i.e., is there a fix
upstream we need to hunt down?)

Comment 4 Patrick C. F. Ernzer 2007-09-12 16:58:39 UTC

On 1706-GMG with Fedora rawhide x86_64 and kernel 2.6.23-0.164.rc5.fc8 I still
cannot boot if I use nmi_watchdog=1. Machine boots fine without this.

Comment 5 Qian Cai 2007-10-03 09:21:36 UTC

I have seen the same problem on my laptop T43 + i386 + RHEL5-Client, but I have
seen it when bootting kernel 2.6.18-8.1.8.el5PAE with parameter including both
"nmi_watchdog=1" and "crashkernel=128M@16M". If substituting with
"nmi_watchdog=2" or without "crashkernel" para, there is no problem as well. I
can confirm that there is no such problem when running kernel 2.6.18-8.el5.

Comment 6 Qian Cai 2007-10-03 09:23:52 UTC

I have tried on the latest released 5.0.z kernel, 2.6.18-8.1.14.el5, and the
problem is still there.

Comment 7 Qian Cai 2007-10-04 01:42:06 UTC

I have observed the same hang even in 2.6.18-8.el5, but only when attached a USB
disk and had "nmi_watchdog=1" before booting.

Comment 8 Brian Maly 2007-11-29 19:58:58 UTC

Does this problem occur on the X60 if "nmi_watchdog=2" is used instead?

Comment 9 Patrick C. F. Ernzer 2007-12-03 13:08:26 UTC

FWIW: X60 with Core2Duo (type 1706-GMG) and Fedora 8, kernel x86_64 2.6.23.1-49.fc8
  nmi_watchdog=1 still does not boot
  nmi_watchdog=2 does boot (although
/usr/share/doc/kernel-doc-2.6.23/Documentation/nmi_watchdog.txt tells me this
should not work)

Comment 10 Brian Maly 2007-12-03 19:23:19 UTC

Using nmi_watchdog=2 is fine as long as NMI in /proc/interrupts increments
perodically. The docs on NMI are very vague. nmi_watchdog is very hardware
specific. Some hardware only works with nmi_watchdog=1 and other hardware only
works with nmi_watchdog=2. Predicting which to use on which hardware really is
more of a guessing game.

Can you see if NMI increments in /proc/interrupts if nmi_watchdog=2 is used?

That being said, the hanging at "ACPI: Found ECDT" may be a seperate issue. Some
of these thinkpads had ACPI problems relating to ACPI battery state object
breakage. I see that in Comment #1 that removing the power cord sometimes makes
a difference. It seemes related so I figured I would mention it.

Comment 11 Don Zickus 2007-12-04 15:21:59 UTC

Upstream is trying to deprecate nmi_watchdog=1, as the preferred method is to
use the local apic (nmi_watchdog=2) as opposed to the ioapic (nmi_watchdog=1). 
It's no surprise nmi_watchdog=1 doesn't work upstream on a Core2Duo.

Also, RHEL-5.0 the nmi won't work on Core2Duo, you will need RHEL-5.1 due to bz
221671.  But I haven't heard of any reports of ACPI issues when using different
nmi settings.

Comment 12 Patrick C. F. Ernzer 2007-12-04 15:51:20 UTC

Matthew,
do you still have RHEL 5(.1) on the affected box and can you test? Mine is F8
x86_64, so my testing is only of limited value

Comment 13 Jarod Wilson 2007-12-04 16:08:40 UTC

Note that there's nothing stopping you from installing and booting a RHEL5 kernel on an F8 system for 
the purposes of this test...

Comment 14 Matthew Booth 2007-12-04 22:03:43 UTC

I'm currently running kernel 2.6.18-53.1.4.el5 (5.1), i386, on the laptop I
originally reported this on. Following previous discussion, I have booted with
the following kernel command line:

ro root=/dev/vg_local/root crashkernel=64M@16M audit=1 nmi_watchdog=2

This worked fine for me the 1 time I've tried it.

Note You need to log in before you can comment on or make changes to this bug.