Bug 126391
Summary: | both 'update' kernels for fedora 2 fail to boot with SCSI | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | William W. Austin <waustin> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 2 | CC: | rob, zaitcev |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | athlon | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-08-05 12:28:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
William W. Austin
2004-06-21 05:10:43 UTC
I got a problem that I think is related: On a fresh installed and updated FC2 system (without scsi card) everything works fine and I can mount an usb harddisk. When plugging in a scsi controller (tested two different adaptec both needing the aic7xxx module) with a hard disk attached, kudzu configures it at the next boot but I cannot mount the scsi disk and the aic7xxx module isn't loaded. I'm not able to load the module unless I remove the ohci_hcd module first. Then the controller and disk appear fine but obviously the usb harddisk can't be mounted anymore. That's quite a problem, not being able to use scsi and usb at the same time and something that worked fine on FC1 and before. Tonight I tried loading the latest kernel (kernel-2.6.6-1.435.2.3.i686.rpm + source + docs, btw) and the problem persists. (FWIW I have 4 machines with root on /dev/hda1 and additional scsi drives and all 4 exhibit the same problem). The boot cycle appears normal - the scsi cards (2 in the machine) are detected (29160, 2940), and the devices attached to them are detected, the disk partitions are listed, etc. However after loading the ehci_hcd and ohci_hcd modules, when the system attempts to enable the first scsi swap partition, it hangs. Again, eventually the timeout occurs and the system fails to find ANY scsi partitions, giving root a chance to log in and fix the problem. But of course at this point there is no fix. At this point (single user, attempting to fix problem), if I do a rmmod sd_mod, the system locks completely, requiring a hard reboot. It is also effectively not possible to do an rmmod aic7xxx at this point, since if this is done and then an attempt is made to reload it, the system hangs, again requiring a hard reboot. IMHO this is probably a serious bug since it effectively disables systems which have both ide and scsi drives attached. One one machine I tried disabling ALL usb on the MB and removing the two lines alias usb-controller ehci-hcd alias usb-controller1 ohci-hcd from /etc/modprobe.conf, but I must have missed something because they were still loaded and the problem didn't change. I'm running a system with all partitions on SCSI drives attached to an Adaptec 2940 U2W controller, and this affects me as well. I'm unable to boot with any kernel above 2.6.5-1.358, and disabling USB isn't an option for me. I repeated my experiment of disabling all USB in the motherboard bios and this time was able to boot the new kernels successfully - however, for me disabling usb is not an option either. (I monitor the network UPS's via USB, not to mention other USB devices.) FWIW, all 4 machines which exhibit this behaviour (all the machines I have access to in terms of installing a new kernel, anyway) have the same MB, a gigabyte 7nnxp, all with the same bios (nVidia, f17). All have adaptec 29160 controllers, and 2 have a second scsi controller, one is an adaptec 2940U and the other is a 2940UW. With the bios disabled, the modules which seem to trigger the problem (ohci_hcd and ehci_hcd) did load (but did nothing of course). The directory /proc/bus/usb was created, but was empty (as expected). All 4 boxes have 1Gb memory with athlon xp 2500+ cpu's. If you need further info to help track this one down, please contact me. Thanks. I forgot to mention in comment #4 above that someone had written me, suggesting that if I used the open source nv driver instead of the proprietary nvidia driver for my video board, the problem would go away. It doesn't. Also removing the nvidia video board (5200 ultra, 128mb geforce4) altogether didn't help, but as the only replacement board I had was an older board with a non-accelerated nvidia chip, this may not have been a fair test. Interestingly, I got the system to start booting. What I've now found is that it seems to be kudzu that's actually hanging my system. Once I did a chkconfig --level 3,5 kudzu off, the system started booting. Running kudzu from the prompt also hangs the system unless I specify -s. More experimentation: kudzu is not a factor. If I remove the two lines alias usb-controller ehci-hcd alias usb-controller1 ohci-hcd from /etc/modprobe.conf and append them to the end of /etc/modules.conf, the system *appears* to boot normally. HOWEVER after booting, the only thing that you can do with the scsi drives is a df <-options> on them (didn't try an unmount). Any attempt to do, for instance, ls -laF <mount point of any scsi drive> causes the scsi bus to be reset. Several such attmpts cause a kernel panic. Repeating: this is repeatable on Adaptech 29160 and 2940UW boards with gigabyte ga-7nnxp (multiple systems, almost identical), and swapping memory, other cards [even removing all other cards except video] does not make a difference. If I disable all USB on the MB, then I can safely and successfully boot any of the non-smp kernels (2.6.6-1.427, 435); however, this is not an option. Also if I boot the 435 kernel and wait for the timeout so that I eventually get the chance to log in (can't find scsi drives ... fsck), if I do an rmmod of the ohci_hcd and ehci_hcd modules, it doesn't help. If after removing these 2, I do an rmmod aic7xxx, the system may hang. If it doesn't and I do a modprobe aic7xxx, it invariably hangs. I have been unable to track this one down in the code - clearly something in the usb modules is interfering with the scsi module, but I can't find it. HELP!!!?!?!?! please. Nothing new to add (lots of other approaches tried, but all failed); however, I have had email from two other people having problems on athlon systems using the aic7xxx driver. This bug (126391) *may* be related to 125887, but it's hard to tell from this end. William, a serial console or a netconsole dump might be useful. Do NOT drop it into the comments box, please. <stern.edu> [PATCH] USB: Fix endianness bug in UHCI driver This patch fixes a byte-swapping error in the UHCI driver. It has been present since 2.6.6 and only got tracked down just now! Thanks a lot to Michel Roelofs for all his help and testing. This should be pushed through to Linus in time to appear in 2.6.8, if possible. Guess we're mud :p Last night I downloaded the 2.6.7-1.494.2.2 kernel from the updates directory, and it fixes the boot-hang problem. The system now boots without the boot-hang and without the failure to find the scsi parititions. A new (or previously hidden?) problem remains that loading the kpilot daemon or trying to access my palm pilot via jpilot (the palm pilot is connected via usb) slows the system to a crawl - playing with it for several hours, it *acts* almost like an interrupt conflict, but I haven't been able to track it down further yet. None the less, this current bug can probably be closed since the system now boots successfully -- if I can isolate the usb/slowdown situation further, I will create a new bugzilla report on it. Thanks for the hard work - I appreciate it (and so do all my fedora 2 boxen). :-) |