Red Hat Bugzilla – Bug 54520
AIC78xx module on 2940U2W w/U3 drives has SEVERE filesystem corruption, various random failures on multipe installs with the same options picked
Last modified: 2007-04-18 12:37:34 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Description of problem:
Various errors occur after installation to my SCSI drive, that do not
happen when installing to IDE drive. Errors are completely random, i.e., I
may be kicked out to a login prompt immediately after logging in, or some
programs may be broken, or I may not notice anything until I try to run X,
or various programs in X. It appears VERY unstable, and poorly configured,
and not usable, which is not my experience with Red Hat distros.
SCSI drive works fine with Windows (not sure that means anything...) and
I'm pretty well-versed in the ways of SCSI setup; tried disabling write
cache, tried another SCSI drive, only way it will run right is to install
to an IDE drive.
Version-Release number of selected component (if applicable):
Default SMP kernel suppled with RH 7.1
Steps to Reproduce:
1. Install RH 7.1 with SMP kernel(no issues apparent or present during
install) or, I believe, the standard kernel will reproduce (not sure).
2. Try to boot it up. It may. It may not.
3. Various errors occur; file system reports not unmounted cleanly after
standard shutdown now -r, may not be able to log in after first boot, etc.
Actual Results: I am a retail owner of both RedHat 5.1 and 6.2; 6.2 would
not run for me, because of similar issues, so I downloaded and burned 7.1,
thinking the SCSI module would be updated & maybe it's fixed. These CD's
are OK, I don't get any read errors from them or anything, and I burned
them at 2x just to be on the safe side.
I will be as thorough as possible in describing this problem. This problem
is also reproducible on RH 6.2, although to a somewhat lesser degree.
Description below is STRICTLY based on experience with RH 7.1.
Running the install, created /boot partition on HDA, / , swap, and /usr
and /home as well, to the Seagate Barracuda drive, which is SDD. (I tried
both WITH and WITHOUT write-cache enabled, also tried installing to IBM U3
drive with same partitioning scheme)
I seem to be getting a LOT of filesystem errors. I have run the install
about 25 times, and frequently, if I can get the machine to boot at all,
and after issuing the shutdown now -r command, the sytem reboots and
claims partitions were NOT UNMOUNTED CLEANLY. FSCK finds huge amounts of
I've tried one of the IBM Ultra3 drives as the install target with the
Various things happen or are broken after numerous installs, I don't think
specifics are relevant, but it is all indicating to me that files are
being written corrupt to the disk.
Couple of examples: enter "root" and my password at login prompt, and I am
immediately kicked out to a login prompt.
Reinstalled, and *could log in* after selecting NOTHING different in the
installer. Tried to run X, which I set up during install, and the system
said it failed to connect to the server.
Reinstalled *again* and was able to get into X this time. Again, EXACT
SAME OPTIONS in the installer. Tried to download updates, as I realized
something was whacked, and I install them (using Red Hat Network) and
install completed. Nothing in X runs seems to run anymore that relies on
GTK. So, I went to the dir that had the packages in it, and ussued rpm -
Uvh * and I get "core dump" in response. ONE package at a time would work,
though. So, I issued shutdown now -r again, restarted, file system
reported as corrupt once more, kernel panic, could not mount root
filesystem. Ran FSCK from recovery, rebooted, everything's hosed.
Reinstalled again, and again, and again, with similar results, but it's
never EXACTLY the same, but I ALWAYS have the "not unmounted cleanly"
after a few reboots if the system is usable at all.
Something is really borked here.
Could this be a bug in the kernel module? Note that the patch provided at
Justin Gibbs site apparently won't do me any good, because I'd have to
have a system up and running before I applied the patch, wouldn't I? He's
done some updates to the code that are more recent than RH 7.1.
I swear I read somewhere that there were issues with 2940 U2W and either
U2 drives, or U3 drives on U2 controllers, but I can't remember where I
saw that. I am going bonkers trying to get this running. Perhaps there's a
way to install his patch before setup is actually up and running? PLEASE
HELP! I'll try it! I really want to use my SCSI drive for Linux!
Summary: CANNOT INSTALL USABLE ON SCSI SUBSYSTEM
Expected Results: RH 7.1 should install and run as per normal on this
system. 2940U2W is fully supported, and I have used this exact same card
(with only UW drives attached) to run RH 6.2 for a long time.
Asus CUVX-D, 512 MB CAS-2 PC133 SDRAM, 2 PIII 1 GHz Coppermines
(sequential serial numbers!), w/ 3COM 3c905b-TX, Sound Blaster Live,
Geforce DDR, Adaptec 2940U2W, running vanilla RH 7.1 SMP kernel (must
disable MPS 1.4 in BIOS to avoid UNKNOWN IO-APIC error at boot, AND IT
WORKS WITH 6.2! But that's another story!)
Drives connected to SCSI card:
2 IBM-PSG Tornado 9 WLS, ID 0 and 1(Ultra3/SCSI160)
1 Fujitsu MAE3182LP U2 LVD drive ID 2
1 Seagate Cheetah ST39173LW U2 LVD drive ID 3
The above drives are on the U2 bus, using an U2 cable, with integrated
On The UW bus of the same card are:
1 IBM DGHS18U UW SCSI drive ID 8
1 Fujitsu MAA3182SP UW SCSI drive ID 9
These drives are in an external enclosure, with an active terminator on
SCSI card is at ID7, termination is *enabled* on the U2 bus (not
automatic), termination is set to automatic on the UW bus (in case the
drive box isn't connected.) Nothing configured oddly in SCSI BIOS, pretty
much set to the defaults. All LVD/U3 drives report running in LVD mode,
all drives on UW bus (of course) report running in SE mode.
Connected to integrated IDE:
1 Maxtor 91024D4 (primary master)
2 Maxtor 92048U8 drive (primary slave and secondary master)
1 TEAC CD-w54E CD-RW drive (secondary slave)
NOTE: I installed to the END of HDA, the Maxtor IDE drive. Used same
options that FAIL with SCSI drive, and it runs beautifully! BUT I WANT TO
USE MY SCSI DRIVE!
Further note: This SCSI subsystem runs both Windows 2000, and Windows XP
(which WAS installed on the IBM that I tested RH 7.1 on a couple of
times), with ABSOLUTELY no problems (using these drives as the OS drive).
I think, perhaps, this should indicate that this is not a
cabling/termination issue. All SCSI ID's are non-conflicting, both buses
are terminated at both ends (the SCSI card should be terminating both
buses, and the ends of both chains are terminated.)
Even further note: I am choosing the following options in setup, as far as
the lines for Lilo and such... it's installing Lilo into the root
superblock of HDA2 (/boot) using the default ide-scsi parameter line that
the installer fills in for me. I also am leaving "use linear mode"
checked, because I tried disabling that and it wouldn't boot at all. I'm
pretty much just using what the installer selects for me, thinking that it
probably knows best.
Oh, and using Boot Magic 7 as the boot loader to hook into LILO at the
root superblock of HDA2.
What the heck am I doing wrong, or is this a bug in the driver module? If
it's a bug, how do I install an updated module at boot-time?
First off, unless the SMP kernel won't boot with a 1.4 MPS table, go ahead and
change it back to that. The error message you are talking about is mostly so
that we get notices of new apic IDs, but it shouldn't keep it from working and
the benefits of a 1.4 table outweigh the harmless message.
Now, as to the easy part of my answer. If you want to try Justin Gibbs' driver,
then boot the installer using the command:
expert noprobe dd
and when it gets to the point when it asks for a driver disk you can insert the
disk you make by downloading the latest driver image from Justin's ftp site.
Then you can select that driver from the list of available SCSI drivers (make
sure you scan the entire list of SCSI driver because whichever one comes off of
the driver update disk should get appended to the end of the list, so don't
immediately grab the first Adaptec driver you find). That should allow you to
try Justin's driver. If you don't want to download Justin's latest driver, or
if he doesn't have a recent driver disk for a 7.1 install, then drop the dd part
off of the boot line, and select the New Adaptec SCSI driver from the list of
available drivers and it should load Justin's driver instead of mine (I think
6.1.13 is in 7.1, but I could be wrong on that, it may be older).
Now, if that doesn't work, then I'm guessing that your actual problem is related
to PCI or main bus corruption resulting from possible PCI caching options set in
your BIOS. I could go through the BIOS disabling all of the PCI speedup options
and see if things start working OK. If that helps, then add things back one at
a time until you find the culprit of how your system is running.
That gives several things to try, I'm marking this bug as NEEDINFO until you can
get the results back to us. I suspect that the actual case with your machine is
that there is motherboard related corruption that Windows works around with some
sort of motherboard blacklist entry that we don't know about and hoping that
changing things in the BIOS will correct the problem.
Thanks so much for the info, Doug!
Enabling the MPS 1.4 setting actually causes a hard lock (keyboard lights for
caps lock and such don't even respond...) but that works just dandy with the
kernel from RH 6.2. What do you make of that? Should I file a separate bug
Anyway, about your advice...
I do have "PCI 2.1 delayed transaction" enabled in the BIOS, and a couple of
other options enabled that are NOT enabled by default (tweaking...)
I will investigate these settings, honestly I don't have a clue what some of
them do (AWARD BIOS for non-Intel chipsets always have some oddball settings,
this is my first non-Intel chip set mainboard)
None of these settings seemed to adversely affect performance in Win2K, but
haven't seemed to HELP anything, either.
I hadn't even considered that it could be a PCI bus issue...
Will test with ALL that stuff disabled (default BIOS config), and the ORIGINAL
driver, and get back to you. This install is only 2 days old, so I don't mind
nuking it if I can get it up and running on SCSI!
If that does not work, I will try Justin's driver. His driver page is kind of
foreboding about using the driver images vs. the patch... I took that to mean
there was something really complicated about using it. It sounds fairly easy;
I'll just make the driver disk in my working Linux install before I blow it
away (have to, / is where /boot will go when running from SCSI.)
Is there a good way to get DMESG output into a text file so I can send that
along if it dumps still?
Appreciate the info! I love Red Hat 7.1... when it's working!
OK... I've changed this to "NOTABUG"- I'm not sure if that's right, but here's
what I did. I don't know how to use Justin's driver, because all he has on his
page is a source .gz file, or the patch .gz file. Maybe I'm missing the
obvious, but I don't see it described as a driver disk image, and I don't know
how to make one from the patch, if you can.(?)
BUT... I disabled PCI 2.1 Delayed Transaction in the BIOS (it is disabled by
default, I had enabled it- seems like I'd want that...) and I booted up from
the install CD and use the expert noprobe line.
I installed the provided driver- YOUR driver, NOT the NEW EXPERIMENTAL driver,
and yes I realize doing noprobe was relatively pointless in that respect... but
anyway, I did a very basic install. It booted OK, and as a test, I copied the
entire /usr directory over to the /home partition, then IMMEDIATELY rebooted
with a shutdown now -r.
On rebooting, home reported clean. I issued a umount /home, ran FSCK manually,
and it also reported clean (not sure how much difference that makes vs the
bootup check). I did this several times. No problems. X runs. X programs all
run. No issues.
I have now reinstalled using my typical selection of packages, which is about a
1.4 GB install, and added some updates and stuff, and it appears to be working.
Working very well, I might add. 2 installs in a row that work perfectly fine is
FAR better than anything I've been able to do prior to switching that setting.
I think that's a fair test, and I think your analysis of it being a PCI
bus/main bus corruption problem was dead-on. I am on the one hand very
impressed at how easily you solved that, and disappointed that I didn't think
of that... :)
This seems to be one of those situations where X (Linux) seems to be at fault,
but in fact, X is merely revealing a flaw or bug in Y (my BIOS settings, BIOS,
or mainboard) that Z (Windows 2000) did not reveal.
It seems that Windows 2000 compensates for the havoc that the PCI 2.1 delayed
transaction feature in my BIOS causes on the PCI bus; I assumed, what with my
mainboard stating it is PCI 2.1 compliant, that I would WANT this feature
enabled, yet its caused me DAYS of frustration. Ahem, perhas that's why it's
disabled by default. :)
You are great, Doug. Thanks so much, I don't know how I can thank you enough;
really. Now I can get down to the task of learning more about the OS.