Description of problem: Install rawhide (0911 tree) onto an x86 box, with a GIGABYTE '8I845GVM-RZ' i845GV motherboard, and find that it reboots in under 2 minutes during the startup process Version-Release number of selected component (if applicable): kernel-2.6.8-1.541.i686.rpm How reproducible: 100% Steps to Reproduce: 1. Install Fedora 2. Let it boot Actual results: It reboots during the boot process, which is under 2 minutes Expected results: Fedora Core boots up and is stable Additional info: So the solution is to boot into single user mode, rmmod i8xx_tco (before it hits the 2m mark), Ctrl+d, and then I get a usable Fedora system Is there a way to blacklist i8xx_tco from loading? The motherboard is a run-of-the-mill cheapish one, but is very popular
*** Bug 133209 has been marked as a duplicate of this bug. ***
*** Bug 133321 has been marked as a duplicate of this bug. ***
Alan/Arjan, can this be fixed by removing the one line in the kernel that says MODULE_DEVICE_TABLE() in this kernel driver? Should still allow the module to work, but not autoload it due to certain pci ids being present.
that is the wrong fix. What we need is a blacklist in kudzu to make sure certain modules are never autoloaded. We need the same for the firmware flashing modules etc etc etc.
Disagree. I had a talk with rusty a long time ago and we came to the conclusion that there is probably a case for MODULE_NO_AUTOLOAD hints in the kernel. That is where they belong so there is only one data set
I would suspect the behaviour for our user space tools between removing the MODULE_DEVICE_TABLE() line and adding a MODULE_NO_AUTOLOAD would be exactly the same. Given this was discussed "a long time ago" who is fixing this hopefully for FC3test3? I ahve changed rc.sysinit to remove this module for "other", but the number of hacks in initscripts is really big.
Well, there's two sorts of autoload we want to avoid: - the detrimental to autoload (i8xx_tco, etc) - the 'impolite' to autoload (*fb) *fb modules are easy to exclude b/c of their PCI class. But excluding all the 'other' to get rid of i8xx_tco means you lose the hwrandom driver and othre random assorted things.
I've done something similar to the following in rc.sysinit: other=`echo $other | sed 's/i8xx_tcp//'` Can probably be done nicer and could then also cover *fb. This would fix this shortterm as I suspect the MODULE_NO_AUTOLOAD discussion will take quite some time until it is finished and all players agree on a common plan here? For PCI class detection: the one line removal in the kernel also removes pci ids. They currently show up in "modprobe -c | grep i8xx_tco", so maybe the *fb problems are the same type of items as the i8xx_tco? (??)
Worthwhile to note that as notting mentioned on IRC (in case folks want the quick fix), adding the following to /etc/modprobe.conf is a temporary workaround: install i8xx_tco /bin/true
*** Bug 133606 has been marked as a duplicate of this bug. ***
*** Bug 134016 has been marked as a duplicate of this bug. ***
initscripts-7.86-1 will read the normal hotplug /etc/hotplug/blacklist file. (Which, as of hwdata-0.140-1, will have i8xx_tco in it.)
There is a much more serious unanswered question. When the i810_tco driver was loaded - who opened the file. Someone has to open the file for the timer to start and if someone did every other watchdog is going to fall foul of this if deployed (eg softdog for telco boxes).
That I don't understand. I have a box here that loads i8xx_tco, and it *never* auto-reboots.
On those that are seeing the problem, if you: a) boot without it loaded b) load it by hand does it then reboot 30 seconds later? If so, can you do those steps, and run 'lsof | grep /dev/watchdog' at some point before it reboots?
Booting with 'init s' (while still having it autoloaded) and performing the same lsof will also work. Assuming it boots to single user mode fast enough. :)
I booted into single user mode and the module hadn't loaded as I had it aliases in modprobe.conf. Undid this, modprobed the module (and confirmed with lsmod that it had now loaded), and _60_ seconds later it rebooted spontaneously. The whole time while waiting, the results of 'lsof | grep /dev/watchdog' showed no match.
In my case something in a gnome session start/login appears to be tickling the watchdog. The module would load on startup, but the machine would be stable until you login (gdm graphic login), or until you issue startx from a text console login. 30 seconds later the box reboots. I have not retried this in the last couple of weeks and have applied current rawhide updates. Will try and reproduce later.
Nigel - would be interested to know in your case if before you start the gnome stuff you do the following rmmod i810_tco modprobe softdog then see if you get the same reboot behaviour.
Followup to Alan's query in comment #19. Using the then current kernel (590), my laptop still reboots 30 seconds after starting gnome if i810_tco is loaded. If I do not load i810_tco, and load softdog instead, gnome does not cause a reboot. The default timeout for softdog is 60, that for i810_tco is 30 seconds, so I repeated this with the softdog timeout explicitly set to 30 seconds - no reboot. I cannot identify anything kicking /dev/watchdog Is there some other means that could set i810_tco off? I guess I could try instrumenting the module a little and retrying it - see if I can find who/what is kicking it.
Cool, its not kernel directly and its not us opening ./dev/watchdog in error (that was my big concern). Looks like an i81x X server bug, must be touching something related ?? X folks (and if X folks cant see it to fix it then we need to workaround it)
I'm not running any sort of X (see my posting above) and yet I still have the problem of spontaneous reboots when i8xx_tco is loaded..
Further testing on my system goes as follows: If I load i8xx_tco manually, after the system has booted (whether in X or with init s), no spontaneous reboots occur. No output from lsof | grep /dev/watchdog If I comment "#" the i8xx_tco entry in /etc/hotplug/blacklist, the watchdog module is still not loaded. Hmm, load_module() in rc.sysinit ain't that smart. OK whatever, I delete the entry. Reboot, now the module is loaded during the boot. If the module i8xx_tco is loaded automatically in rc.sysinit during the boot, (whether in X or with init s), the system WILL spontaneously reboot. Still no output from lsof | grep /dev/watchdog prior to it rebooting. Hmm, if I boot with init s, then rmmod i8xx_tco right away, no spontaneous rebooting occurs. I can then edit /etc/hotplug/blacklist to put the i8xx_tco back in. This above tested with kernel-2.6.8-1.541 and kernel-2.6.8-1.603, both yield same results. What can be gleamed from this?
Created attachment 104997 [details] Dmesg output for compaq laptop which reboots after gnome start
Created attachment 104998 [details] lspci -v output for compaq laptop which reboots after gnome start
Re Alan's Comment #21 The laptop thats showing the reboot after gnome start problem, uses a radeon card - its not all 8xx based stuff. The dmesg & lspci stuff is attached above.
Normally when a watchdog driver is loaded, it makes sure that the watchdog driver isn't active. It's only after you open /dev/watchdog (and thus tickle the watchdog) that you really start/activate/kick the watchdog. From then on it counts down Also: I checked the attachment of comment #24 : it doesn't contain the load of the watchdog module (which should be visible in dmesg like i.e. "i8xx TCO timer: initialized (0x0460). heartbeat=30 sec (nowayout=0)" ). I don't think that it is the watchdog module itself that is causing the problem. I'll create a debug version/patch tomorrow evening so that we can easily read the timer's value and see wether or not it is really the TCO that is causing the reboot.
Hmm, the module is indeed not behaving like I thought it would be... after initialization it does a tco_timer_keepalive() instead of what I would expect: tco_timer_stop()... I'll do some testing together with Reuben Farrelly...
Created attachment 105233 [details] i8xx_tco.c debug version In attachment the debug version of i8xx_tco.c . (Note: i patched the initialization so that it stop's the watchdog).
When running the debug version I get this: tco_timer_set_heartbeat: heartbeat=30 tco_timer_stop: val=8 i8xx TCO timer: initialized (0x1060). heartbeat=30 sec (nowayout=0) ... and the problem has gone away. The module is still loaded, running and after 7 mins uptime neither the box has reloaded, nor has anything been logged. This looks good.
I see a fix has been committed to the mainline kernel: ChangeSet 1.1988.69.31, 2004/10/17 20:35:47+02:00, wim [WATCHDOG] v2.6.9-rc3 i8xx_tco.c-stop_reboot-patch Fix for Bugzilla Bug 132719: "watchdog i8xx_tco causing machine to reboot." Is it safe to close this bugzilla report as an UPSTREAM fixed?
Yup.And if it went into 2.6.9-rc3, please test with 2.6.9-1.640.
The patch went in post 2.6.9 release so it won't be in there.. :(