Bug 55797
Summary: | (SCSI CCISS)error in shutdown - post-glibc 2.2.4 upgrade | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Frank Hirtz <fhirtz> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.1 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:39:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Frank Hirtz
2001-11-06 21:14:49 UTC
Which filesystems are in use ? also can you give the output of lsmod ? Also, a description of the process load on the machine would be useful. We would also like to know if any swapfiles are in use, or if only partitions are in use. > Which filesystems are in use ? ext2 > The output of lsmod On IBM x330, which hangs every time: Module Size Used by ide-cd 27168 0 (autoclean) cdrom 28736 0 (autoclean) [ide-cd] nfs 83648 1 (autoclean) lockd 53968 1 (autoclean) [nfs] sunrpc 69520 1 (autoclean) [nfs lockd] autofs 12128 1 (autoclean) openafs 477472 2 e100 60800 1 (autoclean) ipchains 41856 0 ips 39680 4 aic7xxx 115120 0 (unused) sd_mod 11824 4 scsi_mod 100896 3 [ips aic7xxx sd_mod] On Compaq which just gives an error that clears: Module Size Used by ide-cd 27168 0 (autoclean) cdrom 28736 0 (autoclean) [ide-cd] sg 30064 0 (autoclean) (unused) nfs 83648 1 (autoclean) lockd 53968 1 (autoclean) [nfs] sunrpc 69520 1 (autoclean) [nfs lockd] autofs 12128 1 (autoclean) openafs 477472 2 e100 60800 1 (autoclean) ipchains 41856 0 cpqarray 17840 4 sd_mod 11824 0 (unused) scsi_mod 100896 2 [sg sd_mod] > Also, a description of the process load on the machine would be useful. > We would also like to know if any swapfiles are in use, or if only > partitions are in use. The load is almost nothing - just start and reboot even without anything interesting running. Just one 2Gb swap partition in use, with barely any of it used at all. Anything else? Since you're not doing something extreme, I would have expected to have seen this much more in bugreports. There are 2 modules in common between the systems: e100 and openafs. Both are not common in Red Hat Linux installs as well. Is it possible to 1) use eepro100 and 2) not use openafs (in two separate steps) to see if either one is the culprit ? Using the eepro100 is worse - the swap error occurs and the machine hangs and never reboots - and it happens every time. Haven't tried taking afs out of the equation yet, but that'll be next. Going back to our current config (ext2, initrd with ips, cpqarray and cciss) showed that if I switch the order of cpqarray and cciss, then machines with the corresponding raid driver for their card loading last don't experience those errors at all. Moving the cpqarray and cciss to the beginning of the "--with=" arguments when creating the initrd allows those cards to work, while not generating any swap errors, regardless of what their order is. Still need to check on the machine with the IBM raid card in it to make sure that the resulting initrd still works for it, as well as checking on a machine with no raid cards at all, to see if they can reboot successfully (using the initrd we have in production right now causes such machines to hang on reboot, with the swap error during shutdown). Here's the way we currently create our initrd: $ mkinitrd --with=scsi_mod --with=sd_mod --with=aic7xxx --with=ips --with=cpqarray --with=cciss /boot/initrd.new.img 2.4.9-6smp That version causes no errors for the machines that use the cciss driver, the swap error (but successful reboot) on machines that use the cpqarray, swap errors and hangs for machines that use the ips driver, and swap errors and hangs for machines w/ none of those drivers needed. If we remove the ips-related "--with=" arguments from the command above, and use just "--with=cpqarray --with=cciss", we still get the swap errors that didn't exist at all prior to going to glibc-2.2.2, so the glibc update is definitely causing this. Short of going back to glibc-2.2.2 I can't verify that for certain though. Tried it on a stock 7.1 box with just the kernel (2.4.9-12smp) and necessary utils upgraded (mkinitrd, e2fsprogs, filesystem). It still fails. It looks like glibc is not the root of this. Keeping everything else constant and using the mkinitrd line posted above this error will occur with the 2.4.9-6 and 2.4.9-12 kernels, but not on the 2.4.2-2.4.7-10(The stock 7.2 kernel) kernels. I have found a couple more bits of info on this. Order really does make a difference. The one order that seems to work for all systems, whether they have a Compaq, IBM or no raid controller at all, is if they are loaded as: $ mkinitrd --with=cpqarray --with=cciss --with=scsi_mod --with=sd_mod --with=aic7xxx --with=ips /boot/initrd.new.img 2.4.9-6smp or $ mkinitrd --with=cciss --with=cpqarray --with=scsi_mod --with=sd_mod --with=aic7xxx --with=ips /boot/initrd.new.img 2.4.9-6smp And the ordering that works properly also results in no swap being used right after boot time (for machines with enough ram to hold everything started) where the initrds that result in problems will show some swap in use, even though it is surely unnecessary - something is getting stuck in memory while the initrd is loaded, or something like that, and that causes their reboots to sometimes hang or show swap errors... It seems that this issue only happens when the cciss driver is used. mkinitrd --with=cciss --with=scsi_mod --with=sd_mod --with=aic7xxx /boot/initrd.test.img 2.4.9-16smp (Machine has an adaptec card, but it seems to do this on any machine) Reboot. swapoff /dev/sdc1 Result: swap_dup: swap entry overflow. Info: #free total used free shared buffers cached Mem: 125764 50944 74820 0 6852 21316-/+ buffers/cache: 22776 102988 Swap: 265032 8 265024 #swapon -s Filename Type Size Used Priority /dev/sdc1 partition 265032 8 -1 #swapoff /dev/sdc1 swap_dup: swap entry overflow --Hangs console-- (continued on different console) #free total used free shared buffers cached Mem: 125764 51676 74088 0 6852 21328 -/+ buffers/cache: 22776 102988 Swap: 0 0 0 #swapon -s Filename Type Size Used Priority /dev/sdc1 partition 265032 4 -1 The following patch into 2.4.9-12 will fix the problem. --- linux/drivers/block/cciss.c~ Tue Oct 30 20:01:03 2001 +++ linux/drivers/block/cciss.c Tue Oct 30 20:02:05 2001 @@ -156,11 +156,17 @@ } + +static void cleanup_cciss_module(void); + EXPORT_NO_SYMBOLS; static int __init init_cciss_module(void) { - - return ( cciss_init()); + int i; + if (i = cciss_init() ) { + cleanup_cciss_module(); + } + return (i); } static void __exit cleanup_cciss_module(void) Can someone confirm this fix was folded into later trees It's in the current RHEL 2.1 tree, but does not appear to be in the current Taroon sources. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |