Bug 132719 (i8xx_tco-autoload)

Summary:

watchdog i8xx_tco causing machine to reboot

Product:

[Fedora] Fedora

Reporter:

Colin Charles <byte>

Component:

kernel

Assignee:

Dave Jones <davej>

Status:

CLOSED UPSTREAM

QA Contact:

Severity:

medium

Docs Contact:

Priority:

medium

Version:

rawhide

CC:

alan, laroche, lawbar, marius.andreiana, nigel, notting, pfrields, pza, reuben-redhatbugzilla, wim, wtogami

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-10-21 16:04:07 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

123268

Attachments:

Description	Flags
Dmesg output for compaq laptop which reboots after gnome start	none
lspci -v output for compaq laptop which reboots after gnome start	none
i8xx_tco.c debug version	none

Description Colin Charles 2004-09-16 08:45:43 UTC

Description of problem: Install rawhide (0911 tree) onto an x86 box,
with a GIGABYTE '8I845GVM-RZ' i845GV motherboard, and find that it
reboots in under 2 minutes during the startup process

Version-Release number of selected component (if applicable):
kernel-2.6.8-1.541.i686.rpm

How reproducible:
100%

Steps to Reproduce:
1. Install Fedora
2. Let it boot
  
Actual results:
It reboots during the boot process, which is under 2 minutes

Expected results:
Fedora Core boots up and is stable

Additional info:
So the solution is to boot into single user mode, rmmod i8xx_tco
(before it hits the 2m mark), Ctrl+d, and then I get a usable Fedora
system

Is there a way to blacklist i8xx_tco from loading? 

The motherboard is a run-of-the-mill cheapish one, but is very popular

Comment 1 Bill Nottingham 2004-09-22 19:25:27 UTC

*** Bug 133209 has been marked as a duplicate of this bug. ***

Comment 2 Bill Nottingham 2004-09-23 06:14:45 UTC

*** Bug 133321 has been marked as a duplicate of this bug. ***

Comment 3 Florian La Roche 2004-09-23 07:51:20 UTC

Alan/Arjan, can this be fixed by removing the one line in the
kernel that says MODULE_DEVICE_TABLE() in this kernel driver?

Should still allow the module to work, but not autoload it due
to certain pci ids being present.

Comment 4 Arjan van de Ven 2004-09-23 07:53:14 UTC

that is the wrong fix.
What we need is a blacklist in kudzu to make sure certain modules are
never autoloaded. We need the same for the firmware flashing modules
etc etc etc.

Comment 5 Alan Cox 2004-09-23 09:09:34 UTC

Disagree. I had a talk with rusty a long time ago and we came to the
conclusion that there is probably a case for 

MODULE_NO_AUTOLOAD

hints in the kernel. That is where they belong so there is only one
data set

Comment 6 Florian La Roche 2004-09-23 11:47:39 UTC

I would suspect the behaviour for our user space tools between removing
the MODULE_DEVICE_TABLE() line and adding a MODULE_NO_AUTOLOAD would
be exactly the same.

Given this was discussed "a long time ago" who is fixing this hopefully
for FC3test3?

I ahve changed rc.sysinit to remove this module for "other", but the
number of hacks in initscripts is really big.

Comment 7 Bill Nottingham 2004-09-23 15:57:03 UTC

Well, there's two sorts of autoload we want to avoid:

- the detrimental to autoload (i8xx_tco, etc)
- the 'impolite' to autoload (*fb)

*fb modules are easy to exclude b/c of their PCI class. But excluding
all the 'other' to get rid of i8xx_tco means you lose the hwrandom
driver and othre random assorted things.

Comment 8 Florian La Roche 2004-09-23 16:09:42 UTC

I've done something similar to the following in rc.sysinit:
other=`echo $other | sed 's/i8xx_tcp//'`

Can probably be done nicer and could then also cover *fb.

This would fix this shortterm as I suspect the
MODULE_NO_AUTOLOAD discussion will take quite some time
until it is finished and all players agree on a common plan here?

For PCI class detection: the one line removal in the kernel also
removes pci ids. They currently show up in
"modprobe -c | grep i8xx_tco", so maybe the *fb problems are
the same type of items as the i8xx_tco?  (??)

Comment 9 Colin Charles 2004-09-26 09:07:55 UTC

Worthwhile to note that as notting mentioned on IRC (in case folks
want the quick fix), adding the following to /etc/modprobe.conf is a
temporary workaround:

install i8xx_tco /bin/true

Comment 10 Bill Nottingham 2004-09-28 17:45:00 UTC

*** Bug 133606 has been marked as a duplicate of this bug. ***

Comment 11 Bill Nottingham 2004-09-29 01:12:14 UTC

*** Bug 134016 has been marked as a duplicate of this bug. ***

Comment 12 Bill Nottingham 2004-10-01 21:43:12 UTC

initscripts-7.86-1 will read the normal hotplug /etc/hotplug/blacklist
file. (Which, as of hwdata-0.140-1, will have i8xx_tco in it.)

Comment 13 Alan Cox 2004-10-01 22:12:53 UTC

There is a much more serious unanswered question. When the i810_tco
driver was loaded - who opened the file. Someone has to open the file
for the timer to start and if someone did every other watchdog is
going to fall foul of this if deployed (eg softdog for telco boxes).

Comment 14 Bill Nottingham 2004-10-01 22:33:38 UTC

That I don't understand. I have a box here that loads i8xx_tco, and it
*never* auto-reboots.

Comment 15 Bill Nottingham 2004-10-01 22:42:37 UTC

On those that are seeing the problem, if you:

a) boot without it loaded
b) load it by hand

does it then reboot 30 seconds later?

If so, can you do those steps, and run 'lsof | grep /dev/watchdog'
at some point before it reboots?

Comment 16 Bill Nottingham 2004-10-01 22:44:47 UTC

Booting with 'init s' (while still having it autoloaded) and
performing the same lsof will also work. Assuming it boots to single
user mode fast enough. :)

Comment 17 Reuben Farrelly 2004-10-01 22:53:54 UTC

I booted into single user mode and the module hadn't loaded as I had
it aliases in modprobe.conf. Undid this, modprobed the module (and
confirmed with lsmod that it had now loaded), and _60_ seconds later
it rebooted spontaneously.
The whole time while waiting, the results of 'lsof | grep
/dev/watchdog' showed no match.

Comment 18 Nigel Metheringham 2004-10-06 09:19:14 UTC

In my case something in a gnome session start/login appears to be
tickling the watchdog.  The module would load on startup, but the
machine would  be stable until you login (gdm graphic login), or until
you issue startx from a text console login.  30 seconds later the box
reboots.

I have not retried this in the last couple of weeks and have applied
current rawhide updates.  Will try and reproduce later.

Comment 19 Alan Cox 2004-10-06 09:25:53 UTC

Nigel - would be interested to know in your case if before you start
the gnome stuff you do the following

rmmod i810_tco
modprobe softdog

then see if you get the same reboot behaviour.

Comment 20 Nigel Metheringham 2004-10-08 21:24:05 UTC

Followup to Alan's query in comment #19.

Using the then current kernel (590), my laptop still reboots 30
seconds after starting gnome if i810_tco is loaded.

If I do not load i810_tco, and load softdog instead, gnome does not
cause a reboot.  The default timeout for softdog is 60, that for
i810_tco is 30 seconds, so I repeated this with the softdog timeout
explicitly set to 30 seconds - no reboot.
I cannot identify anything kicking /dev/watchdog
Is there some other means that could set i810_tco off?

I guess I could try instrumenting the module a little and retrying it
- see if I can find who/what is kicking it.

Comment 21 Alan Cox 2004-10-08 21:53:39 UTC

Cool, its not kernel directly and its not us opening ./dev/watchdog in
error (that was my big concern). Looks like an i81x X server bug, must
be touching something related ??

X folks (and if X folks cant see it to fix it then we need to
workaround it)

Comment 22 Reuben Farrelly 2004-10-08 21:58:05 UTC

I'm not running any sort of X (see my posting above) and yet I still
have the problem of spontaneous reboots when i8xx_tco is loaded..

Comment 23 Larry Bartash 2004-10-09 01:06:27 UTC

Further testing on my system goes as follows:

If I load i8xx_tco manually, after the system has booted (whether in X
or with init s), no spontaneous reboots occur. No output from lsof |
grep /dev/watchdog

If I comment "#" the i8xx_tco entry in /etc/hotplug/blacklist, the
watchdog module is still not loaded. Hmm, load_module() in rc.sysinit
ain't that smart. OK whatever, I delete the entry. Reboot, now the
module is loaded during the boot.

If the module i8xx_tco is loaded automatically in rc.sysinit during
the boot, (whether in X or with init s), the system WILL spontaneously
reboot. Still no output from lsof | grep /dev/watchdog prior to it
rebooting.

Hmm, if I boot with init s, then rmmod i8xx_tco right away, no
spontaneous rebooting occurs. I can then edit /etc/hotplug/blacklist
to put the i8xx_tco back in.

This above tested with kernel-2.6.8-1.541 and kernel-2.6.8-1.603, both
yield same results.

What can be gleamed from this?

Comment 24 Nigel Metheringham 2004-10-11 08:49:16 UTC

Created attachment 104997 [details]
Dmesg output for compaq laptop which reboots after gnome start

Comment 25 Nigel Metheringham 2004-10-11 08:50:22 UTC

Created attachment 104998 [details]
lspci -v output for compaq laptop which reboots after gnome start

Comment 26 Nigel Metheringham 2004-10-11 08:53:43 UTC

Re Alan's Comment #21
The laptop thats showing the reboot after gnome start problem, uses
a radeon card - its not all 8xx based stuff.  The dmesg & lspci stuff
is attached above.

Comment 27 Wim Van Sebroeck 2004-10-11 19:14:09 UTC

Normally when a watchdog driver is loaded, it makes sure that the
watchdog driver isn't active. It's only after you open /dev/watchdog
(and thus tickle the watchdog) that you really start/activate/kick the
watchdog. From then on it counts down

Also: I checked the attachment of comment #24 : it doesn't contain the
load of the watchdog module (which should be visible in dmesg like
i.e. "i8xx TCO timer: initialized (0x0460). heartbeat=30 sec
(nowayout=0)" ).

I don't think that it is the watchdog module itself that is causing
the problem. I'll create a debug version/patch tomorrow evening so
that we can easily read the timer's value and see wether or not it is
really the TCO that is causing the reboot.

Comment 31 Wim Van Sebroeck 2004-10-14 20:09:03 UTC

Hmm, the module is indeed not behaving like I thought it would be...
after initialization it does a tco_timer_keepalive() instead of what I
would expect: tco_timer_stop()...
I'll do some testing together with Reuben Farrelly...

Comment 32 Wim Van Sebroeck 2004-10-14 20:22:13 UTC

Created attachment 105233 [details]
i8xx_tco.c debug version

In attachment the debug version of i8xx_tco.c .
(Note: i patched the initialization so that it stop's the watchdog).

Comment 33 Reuben Farrelly 2004-10-15 22:32:41 UTC

When running the debug version I get this:

tco_timer_set_heartbeat: heartbeat=30
tco_timer_stop: val=8
i8xx TCO timer: initialized (0x1060). heartbeat=30 sec (nowayout=0)

... and the problem has gone away.  The module is still loaded,
running and after 7 mins uptime neither the box has reloaded, nor has
anything been logged.  This looks good.

Comment 34 Reuben Farrelly 2004-10-21 04:53:49 UTC

I see a fix has been committed to the mainline kernel:

ChangeSet 1.1988.69.31, 2004/10/17 20:35:47+02:00, wim
[WATCHDOG] v2.6.9-rc3 i8xx_tco.c-stop_reboot-patch
Fix for Bugzilla Bug 132719: "watchdog i8xx_tco causing machine to
reboot."

Is it safe to close this bugzilla report as an UPSTREAM fixed?

Comment 35 Bill Nottingham 2004-10-21 16:04:07 UTC

Yup.And if it went into 2.6.9-rc3, please test with 2.6.9-1.640.

Comment 36 Reuben Farrelly 2004-10-21 18:49:13 UTC

The patch went in post 2.6.9 release so it won't be in there.. :(