From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) Description of problem: Various errors occur after installation to my SCSI drive, that do not happen when installing to IDE drive. Errors are completely random, i.e., I may be kicked out to a login prompt immediately after logging in, or some programs may be broken, or I may not notice anything until I try to run X, or various programs in X. It appears VERY unstable, and poorly configured, and not usable, which is not my experience with Red Hat distros. SCSI drive works fine with Windows (not sure that means anything...) and I'm pretty well-versed in the ways of SCSI setup; tried disabling write cache, tried another SCSI drive, only way it will run right is to install to an IDE drive. Version-Release number of selected component (if applicable): Default SMP kernel suppled with RH 7.1 How reproducible: Always Steps to Reproduce: 1. Install RH 7.1 with SMP kernel(no issues apparent or present during install) or, I believe, the standard kernel will reproduce (not sure). 2. Try to boot it up. It may. It may not. 3. Various errors occur; file system reports not unmounted cleanly after standard shutdown now -r, may not be able to log in after first boot, etc. See below. Actual Results: I am a retail owner of both RedHat 5.1 and 6.2; 6.2 would not run for me, because of similar issues, so I downloaded and burned 7.1, thinking the SCSI module would be updated & maybe it's fixed. These CD's are OK, I don't get any read errors from them or anything, and I burned them at 2x just to be on the safe side. I will be as thorough as possible in describing this problem. This problem is also reproducible on RH 6.2, although to a somewhat lesser degree. Description below is STRICTLY based on experience with RH 7.1. ************** Symptoms: RANDOM Running the install, created /boot partition on HDA, / , swap, and /usr and /home as well, to the Seagate Barracuda drive, which is SDD. (I tried both WITH and WITHOUT write-cache enabled, also tried installing to IBM U3 drive with same partitioning scheme) I seem to be getting a LOT of filesystem errors. I have run the install about 25 times, and frequently, if I can get the machine to boot at all, and after issuing the shutdown now -r command, the sytem reboots and claims partitions were NOT UNMOUNTED CLEANLY. FSCK finds huge amounts of errors. I've tried one of the IBM Ultra3 drives as the install target with the same results. Various things happen or are broken after numerous installs, I don't think specifics are relevant, but it is all indicating to me that files are being written corrupt to the disk. Couple of examples: enter "root" and my password at login prompt, and I am immediately kicked out to a login prompt. Reinstalled, and *could log in* after selecting NOTHING different in the installer. Tried to run X, which I set up during install, and the system said it failed to connect to the server. Reinstalled *again* and was able to get into X this time. Again, EXACT SAME OPTIONS in the installer. Tried to download updates, as I realized something was whacked, and I install them (using Red Hat Network) and install completed. Nothing in X runs seems to run anymore that relies on GTK. So, I went to the dir that had the packages in it, and ussued rpm - Uvh * and I get "core dump" in response. ONE package at a time would work, though. So, I issued shutdown now -r again, restarted, file system reported as corrupt once more, kernel panic, could not mount root filesystem. Ran FSCK from recovery, rebooted, everything's hosed. Reinstalled again, and again, and again, with similar results, but it's never EXACTLY the same, but I ALWAYS have the "not unmounted cleanly" after a few reboots if the system is usable at all. Something is really borked here. Could this be a bug in the kernel module? Note that the patch provided at Justin Gibbs site apparently won't do me any good, because I'd have to have a system up and running before I applied the patch, wouldn't I? He's done some updates to the code that are more recent than RH 7.1. I swear I read somewhere that there were issues with 2940 U2W and either U2 drives, or U3 drives on U2 controllers, but I can't remember where I saw that. I am going bonkers trying to get this running. Perhaps there's a way to install his patch before setup is actually up and running? PLEASE HELP! I'll try it! I really want to use my SCSI drive for Linux! Summary: CANNOT INSTALL USABLE ON SCSI SUBSYSTEM Expected Results: RH 7.1 should install and run as per normal on this system. 2940U2W is fully supported, and I have used this exact same card (with only UW drives attached) to run RH 6.2 for a long time. Additional info: System description: Asus CUVX-D, 512 MB CAS-2 PC133 SDRAM, 2 PIII 1 GHz Coppermines (sequential serial numbers!), w/ 3COM 3c905b-TX, Sound Blaster Live, Geforce DDR, Adaptec 2940U2W, running vanilla RH 7.1 SMP kernel (must disable MPS 1.4 in BIOS to avoid UNKNOWN IO-APIC error at boot, AND IT WORKS WITH 6.2! But that's another story!) Drives connected to SCSI card: 2 IBM-PSG Tornado 9 WLS, ID 0 and 1(Ultra3/SCSI160) 1 Fujitsu MAE3182LP U2 LVD drive ID 2 1 Seagate Cheetah ST39173LW U2 LVD drive ID 3 The above drives are on the U2 bus, using an U2 cable, with integrated active terminator. On The UW bus of the same card are: 1 IBM DGHS18U UW SCSI drive ID 8 1 Fujitsu MAA3182SP UW SCSI drive ID 9 These drives are in an external enclosure, with an active terminator on the enclosure. SCSI card is at ID7, termination is *enabled* on the U2 bus (not automatic), termination is set to automatic on the UW bus (in case the drive box isn't connected.) Nothing configured oddly in SCSI BIOS, pretty much set to the defaults. All LVD/U3 drives report running in LVD mode, all drives on UW bus (of course) report running in SE mode. Connected to integrated IDE: 1 Maxtor 91024D4 (primary master) 2 Maxtor 92048U8 drive (primary slave and secondary master) 1 TEAC CD-w54E CD-RW drive (secondary slave) NOTE: I installed to the END of HDA, the Maxtor IDE drive. Used same options that FAIL with SCSI drive, and it runs beautifully! BUT I WANT TO USE MY SCSI DRIVE! Further note: This SCSI subsystem runs both Windows 2000, and Windows XP (which WAS installed on the IBM that I tested RH 7.1 on a couple of times), with ABSOLUTELY no problems (using these drives as the OS drive). I think, perhaps, this should indicate that this is not a cabling/termination issue. All SCSI ID's are non-conflicting, both buses are terminated at both ends (the SCSI card should be terminating both buses, and the ends of both chains are terminated.) Even further note: I am choosing the following options in setup, as far as the lines for Lilo and such... it's installing Lilo into the root superblock of HDA2 (/boot) using the default ide-scsi parameter line that the installer fills in for me. I also am leaving "use linear mode" checked, because I tried disabling that and it wouldn't boot at all. I'm pretty much just using what the installer selects for me, thinking that it probably knows best. Oh, and using Boot Magic 7 as the boot loader to hook into LILO at the root superblock of HDA2. What the heck am I doing wrong, or is this a bug in the driver module? If it's a bug, how do I install an updated module at boot-time?
First off, unless the SMP kernel won't boot with a 1.4 MPS table, go ahead and change it back to that. The error message you are talking about is mostly so that we get notices of new apic IDs, but it shouldn't keep it from working and the benefits of a 1.4 table outweigh the harmless message. Now, as to the easy part of my answer. If you want to try Justin Gibbs' driver, then boot the installer using the command: expert noprobe dd and when it gets to the point when it asks for a driver disk you can insert the disk you make by downloading the latest driver image from Justin's ftp site. Then you can select that driver from the list of available SCSI drivers (make sure you scan the entire list of SCSI driver because whichever one comes off of the driver update disk should get appended to the end of the list, so don't immediately grab the first Adaptec driver you find). That should allow you to try Justin's driver. If you don't want to download Justin's latest driver, or if he doesn't have a recent driver disk for a 7.1 install, then drop the dd part off of the boot line, and select the New Adaptec SCSI driver from the list of available drivers and it should load Justin's driver instead of mine (I think 6.1.13 is in 7.1, but I could be wrong on that, it may be older). Now, if that doesn't work, then I'm guessing that your actual problem is related to PCI or main bus corruption resulting from possible PCI caching options set in your BIOS. I could go through the BIOS disabling all of the PCI speedup options and see if things start working OK. If that helps, then add things back one at a time until you find the culprit of how your system is running. That gives several things to try, I'm marking this bug as NEEDINFO until you can get the results back to us. I suspect that the actual case with your machine is that there is motherboard related corruption that Windows works around with some sort of motherboard blacklist entry that we don't know about and hoping that changing things in the BIOS will correct the problem.
Thanks so much for the info, Doug! Enabling the MPS 1.4 setting actually causes a hard lock (keyboard lights for caps lock and such don't even respond...) but that works just dandy with the kernel from RH 6.2. What do you make of that? Should I file a separate bug report? Anyway, about your advice... I do have "PCI 2.1 delayed transaction" enabled in the BIOS, and a couple of other options enabled that are NOT enabled by default (tweaking...) I will investigate these settings, honestly I don't have a clue what some of them do (AWARD BIOS for non-Intel chipsets always have some oddball settings, this is my first non-Intel chip set mainboard) None of these settings seemed to adversely affect performance in Win2K, but haven't seemed to HELP anything, either. I hadn't even considered that it could be a PCI bus issue... Will test with ALL that stuff disabled (default BIOS config), and the ORIGINAL driver, and get back to you. This install is only 2 days old, so I don't mind nuking it if I can get it up and running on SCSI! If that does not work, I will try Justin's driver. His driver page is kind of foreboding about using the driver images vs. the patch... I took that to mean there was something really complicated about using it. It sounds fairly easy; I'll just make the driver disk in my working Linux install before I blow it away (have to, / is where /boot will go when running from SCSI.) Is there a good way to get DMESG output into a text file so I can send that along if it dumps still? Appreciate the info! I love Red Hat 7.1... when it's working!
OK... I've changed this to "NOTABUG"- I'm not sure if that's right, but here's what I did. I don't know how to use Justin's driver, because all he has on his page is a source .gz file, or the patch .gz file. Maybe I'm missing the obvious, but I don't see it described as a driver disk image, and I don't know how to make one from the patch, if you can.(?) BUT... I disabled PCI 2.1 Delayed Transaction in the BIOS (it is disabled by default, I had enabled it- seems like I'd want that...) and I booted up from the install CD and use the expert noprobe line. I installed the provided driver- YOUR driver, NOT the NEW EXPERIMENTAL driver, and yes I realize doing noprobe was relatively pointless in that respect... but anyway, I did a very basic install. It booted OK, and as a test, I copied the entire /usr directory over to the /home partition, then IMMEDIATELY rebooted with a shutdown now -r. On rebooting, home reported clean. I issued a umount /home, ran FSCK manually, and it also reported clean (not sure how much difference that makes vs the bootup check). I did this several times. No problems. X runs. X programs all run. No issues. I have now reinstalled using my typical selection of packages, which is about a 1.4 GB install, and added some updates and stuff, and it appears to be working. Working very well, I might add. 2 installs in a row that work perfectly fine is FAR better than anything I've been able to do prior to switching that setting. I think that's a fair test, and I think your analysis of it being a PCI bus/main bus corruption problem was dead-on. I am on the one hand very impressed at how easily you solved that, and disappointed that I didn't think of that... :) This seems to be one of those situations where X (Linux) seems to be at fault, but in fact, X is merely revealing a flaw or bug in Y (my BIOS settings, BIOS, or mainboard) that Z (Windows 2000) did not reveal. It seems that Windows 2000 compensates for the havoc that the PCI 2.1 delayed transaction feature in my BIOS causes on the PCI bus; I assumed, what with my mainboard stating it is PCI 2.1 compliant, that I would WANT this feature enabled, yet its caused me DAYS of frustration. Ahem, perhas that's why it's disabled by default. :) You are great, Doug. Thanks so much, I don't know how I can thank you enough; really. Now I can get down to the task of learning more about the OS.