|Summary:||watchdog i8xx_tco causing machine to reboot|
|Product:||[Fedora] Fedora||Reporter:||Colin Charles <byte>|
|Component:||kernel||Assignee:||Dave Jones <davej>|
|Status:||CLOSED UPSTREAM||QA Contact:|
|Version:||rawhide||CC:||alan, laroche, lawbar, marius.andreiana, nigel, notting, pfrields, pza, reuben-redhatbugzilla, wim, wtogami|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2004-10-21 16:04:07 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Colin Charles 2004-09-16 08:45:43 UTC
Description of problem: Install rawhide (0911 tree) onto an x86 box, with a GIGABYTE '8I845GVM-RZ' i845GV motherboard, and find that it reboots in under 2 minutes during the startup process Version-Release number of selected component (if applicable): kernel-2.6.8-1.541.i686.rpm How reproducible: 100% Steps to Reproduce: 1. Install Fedora 2. Let it boot Actual results: It reboots during the boot process, which is under 2 minutes Expected results: Fedora Core boots up and is stable Additional info: So the solution is to boot into single user mode, rmmod i8xx_tco (before it hits the 2m mark), Ctrl+d, and then I get a usable Fedora system Is there a way to blacklist i8xx_tco from loading? The motherboard is a run-of-the-mill cheapish one, but is very popular
Comment 1 Bill Nottingham 2004-09-22 19:25:27 UTC
*** Bug 133209 has been marked as a duplicate of this bug. ***
Comment 2 Bill Nottingham 2004-09-23 06:14:45 UTC
*** Bug 133321 has been marked as a duplicate of this bug. ***
Comment 3 Florian La Roche 2004-09-23 07:51:20 UTC
Alan/Arjan, can this be fixed by removing the one line in the kernel that says MODULE_DEVICE_TABLE() in this kernel driver? Should still allow the module to work, but not autoload it due to certain pci ids being present.
Comment 4 Arjan van de Ven 2004-09-23 07:53:14 UTC
that is the wrong fix. What we need is a blacklist in kudzu to make sure certain modules are never autoloaded. We need the same for the firmware flashing modules etc etc etc.
Comment 5 Alan Cox 2004-09-23 09:09:34 UTC
Disagree. I had a talk with rusty a long time ago and we came to the conclusion that there is probably a case for MODULE_NO_AUTOLOAD hints in the kernel. That is where they belong so there is only one data set
Comment 6 Florian La Roche 2004-09-23 11:47:39 UTC
I would suspect the behaviour for our user space tools between removing the MODULE_DEVICE_TABLE() line and adding a MODULE_NO_AUTOLOAD would be exactly the same. Given this was discussed "a long time ago" who is fixing this hopefully for FC3test3? I ahve changed rc.sysinit to remove this module for "other", but the number of hacks in initscripts is really big.
Comment 7 Bill Nottingham 2004-09-23 15:57:03 UTC
Well, there's two sorts of autoload we want to avoid: - the detrimental to autoload (i8xx_tco, etc) - the 'impolite' to autoload (*fb) *fb modules are easy to exclude b/c of their PCI class. But excluding all the 'other' to get rid of i8xx_tco means you lose the hwrandom driver and othre random assorted things.
Comment 8 Florian La Roche 2004-09-23 16:09:42 UTC
I've done something similar to the following in rc.sysinit: other=`echo $other | sed 's/i8xx_tcp//'` Can probably be done nicer and could then also cover *fb. This would fix this shortterm as I suspect the MODULE_NO_AUTOLOAD discussion will take quite some time until it is finished and all players agree on a common plan here? For PCI class detection: the one line removal in the kernel also removes pci ids. They currently show up in "modprobe -c | grep i8xx_tco", so maybe the *fb problems are the same type of items as the i8xx_tco? (??)
Comment 9 Colin Charles 2004-09-26 09:07:55 UTC
Worthwhile to note that as notting mentioned on IRC (in case folks want the quick fix), adding the following to /etc/modprobe.conf is a temporary workaround: install i8xx_tco /bin/true
Comment 10 Bill Nottingham 2004-09-28 17:45:00 UTC
*** Bug 133606 has been marked as a duplicate of this bug. ***
Comment 11 Bill Nottingham 2004-09-29 01:12:14 UTC
*** Bug 134016 has been marked as a duplicate of this bug. ***
Comment 12 Bill Nottingham 2004-10-01 21:43:12 UTC
initscripts-7.86-1 will read the normal hotplug /etc/hotplug/blacklist file. (Which, as of hwdata-0.140-1, will have i8xx_tco in it.)
Comment 13 Alan Cox 2004-10-01 22:12:53 UTC
There is a much more serious unanswered question. When the i810_tco driver was loaded - who opened the file. Someone has to open the file for the timer to start and if someone did every other watchdog is going to fall foul of this if deployed (eg softdog for telco boxes).
Comment 14 Bill Nottingham 2004-10-01 22:33:38 UTC
That I don't understand. I have a box here that loads i8xx_tco, and it *never* auto-reboots.
Comment 15 Bill Nottingham 2004-10-01 22:42:37 UTC
On those that are seeing the problem, if you: a) boot without it loaded b) load it by hand does it then reboot 30 seconds later? If so, can you do those steps, and run 'lsof | grep /dev/watchdog' at some point before it reboots?
Comment 16 Bill Nottingham 2004-10-01 22:44:47 UTC
Booting with 'init s' (while still having it autoloaded) and performing the same lsof will also work. Assuming it boots to single user mode fast enough. :)
Comment 17 Reuben Farrelly 2004-10-01 22:53:54 UTC
I booted into single user mode and the module hadn't loaded as I had it aliases in modprobe.conf. Undid this, modprobed the module (and confirmed with lsmod that it had now loaded), and _60_ seconds later it rebooted spontaneously. The whole time while waiting, the results of 'lsof | grep /dev/watchdog' showed no match.
Comment 18 Nigel Metheringham 2004-10-06 09:19:14 UTC
In my case something in a gnome session start/login appears to be tickling the watchdog. The module would load on startup, but the machine would be stable until you login (gdm graphic login), or until you issue startx from a text console login. 30 seconds later the box reboots. I have not retried this in the last couple of weeks and have applied current rawhide updates. Will try and reproduce later.
Comment 19 Alan Cox 2004-10-06 09:25:53 UTC
Nigel - would be interested to know in your case if before you start the gnome stuff you do the following rmmod i810_tco modprobe softdog then see if you get the same reboot behaviour.
Comment 20 Nigel Metheringham 2004-10-08 21:24:05 UTC
Followup to Alan's query in comment #19. Using the then current kernel (590), my laptop still reboots 30 seconds after starting gnome if i810_tco is loaded. If I do not load i810_tco, and load softdog instead, gnome does not cause a reboot. The default timeout for softdog is 60, that for i810_tco is 30 seconds, so I repeated this with the softdog timeout explicitly set to 30 seconds - no reboot. I cannot identify anything kicking /dev/watchdog Is there some other means that could set i810_tco off? I guess I could try instrumenting the module a little and retrying it - see if I can find who/what is kicking it.
Comment 21 Alan Cox 2004-10-08 21:53:39 UTC
Cool, its not kernel directly and its not us opening ./dev/watchdog in error (that was my big concern). Looks like an i81x X server bug, must be touching something related ?? X folks (and if X folks cant see it to fix it then we need to workaround it)
Comment 22 Reuben Farrelly 2004-10-08 21:58:05 UTC
I'm not running any sort of X (see my posting above) and yet I still have the problem of spontaneous reboots when i8xx_tco is loaded..
Comment 23 Larry Bartash 2004-10-09 01:06:27 UTC
Further testing on my system goes as follows: If I load i8xx_tco manually, after the system has booted (whether in X or with init s), no spontaneous reboots occur. No output from lsof | grep /dev/watchdog If I comment "#" the i8xx_tco entry in /etc/hotplug/blacklist, the watchdog module is still not loaded. Hmm, load_module() in rc.sysinit ain't that smart. OK whatever, I delete the entry. Reboot, now the module is loaded during the boot. If the module i8xx_tco is loaded automatically in rc.sysinit during the boot, (whether in X or with init s), the system WILL spontaneously reboot. Still no output from lsof | grep /dev/watchdog prior to it rebooting. Hmm, if I boot with init s, then rmmod i8xx_tco right away, no spontaneous rebooting occurs. I can then edit /etc/hotplug/blacklist to put the i8xx_tco back in. This above tested with kernel-2.6.8-1.541 and kernel-2.6.8-1.603, both yield same results. What can be gleamed from this?
Comment 24 Nigel Metheringham 2004-10-11 08:49:16 UTC
Created attachment 104997 [details] Dmesg output for compaq laptop which reboots after gnome start
Comment 25 Nigel Metheringham 2004-10-11 08:50:22 UTC
Created attachment 104998 [details] lspci -v output for compaq laptop which reboots after gnome start
Comment 26 Nigel Metheringham 2004-10-11 08:53:43 UTC
Re Alan's Comment #21 The laptop thats showing the reboot after gnome start problem, uses a radeon card - its not all 8xx based stuff. The dmesg & lspci stuff is attached above.
Comment 27 Wim Van Sebroeck 2004-10-11 19:14:09 UTC
Normally when a watchdog driver is loaded, it makes sure that the watchdog driver isn't active. It's only after you open /dev/watchdog (and thus tickle the watchdog) that you really start/activate/kick the watchdog. From then on it counts down Also: I checked the attachment of comment #24 : it doesn't contain the load of the watchdog module (which should be visible in dmesg like i.e. "i8xx TCO timer: initialized (0x0460). heartbeat=30 sec (nowayout=0)" ). I don't think that it is the watchdog module itself that is causing the problem. I'll create a debug version/patch tomorrow evening so that we can easily read the timer's value and see wether or not it is really the TCO that is causing the reboot.
Comment 31 Wim Van Sebroeck 2004-10-14 20:09:03 UTC
Hmm, the module is indeed not behaving like I thought it would be... after initialization it does a tco_timer_keepalive() instead of what I would expect: tco_timer_stop()... I'll do some testing together with Reuben Farrelly...
Comment 32 Wim Van Sebroeck 2004-10-14 20:22:13 UTC
Created attachment 105233 [details] i8xx_tco.c debug version In attachment the debug version of i8xx_tco.c . (Note: i patched the initialization so that it stop's the watchdog).
Comment 33 Reuben Farrelly 2004-10-15 22:32:41 UTC
When running the debug version I get this: tco_timer_set_heartbeat: heartbeat=30 tco_timer_stop: val=8 i8xx TCO timer: initialized (0x1060). heartbeat=30 sec (nowayout=0) ... and the problem has gone away. The module is still loaded, running and after 7 mins uptime neither the box has reloaded, nor has anything been logged. This looks good.
Comment 34 Reuben Farrelly 2004-10-21 04:53:49 UTC
I see a fix has been committed to the mainline kernel: ChangeSet 1.1988.69.31, 2004/10/17 20:35:47+02:00, email@example.com [WATCHDOG] v2.6.9-rc3 i8xx_tco.c-stop_reboot-patch Fix for Bugzilla Bug 132719: "watchdog i8xx_tco causing machine to reboot." Is it safe to close this bugzilla report as an UPSTREAM fixed?
Comment 35 Bill Nottingham 2004-10-21 16:04:07 UTC
Yup.And if it went into 2.6.9-rc3, please test with 2.6.9-1.640.
Comment 36 Reuben Farrelly 2004-10-21 18:49:13 UTC
The patch went in post 2.6.9 release so it won't be in there.. :(