Description of problem: Hardware: DELL PE 2650 w/ PERC 3/Di SCSI-RAID, 1 GB RAM, 1 RAID-5 Array. Latest Firmware (BIOS A17, 3/Di 2.7-1 Build 3170, Backplane 1.01). We are running a couple of identically configured 2650's with fully patched RH9 running squid-2.5 as primary application. Since we upgraded from 2.4.18-27 to 2.4.20-24, we are getting uptimes from 3 days to 20 days. The servers crash with absolutely no message on the console or somewhere else in the file system. Systems are not running out of disk space. The servers are basically very high-volume Squid 2.5 HTTP-Proxy-Servers, serving something around 120 HTTP-Proxy-Requests per second at peak times. No kernel tuning. No other applications. When crashing, they are completey locked up, no reboot, no network, no screen, no message on console, no nothing. The error seems to be hardware independend, though. We swapped hardware to an old Transtec machine, completely differend hardware setup. 2.40.20-*-kernels still crash, while 2.4.18-*'s are fine. The error occurs both in 2.40.20.* SMP- and non-SMP kernels. The error occurs only when using Squid, while the version of Squid seems to be irrelevant. We have a bunch of other RH9 machines without Squid, and the do not crash, even when using 2.4.20-*. [root@proxy root]# uname -a Linux proxy 2.4.20-27.9smp #1 SMP Thu Dec 11 13:15:04 EST 2003 i686 i686 i386 GNU/Linux [root@proxy root]# lsmod Module Size Used by Not tainted autofs 13684 0 (autoclean) (unused) tg3 53064 2 keybdev 2976 0 (unused) mousedev 5688 0 (unused) hid 22404 0 (unused) input 6208 0 [keybdev mousedev hid] usb-ohci 22248 0 (unused) usbcore 82816 1 [hid usb-ohci] ext3 73408 7 jbd 56368 7 [ext3] aacraid 32676 8 sd_mod 13452 16 scsi_mod 110872 2 [aacraid sd_mod] Any help is welcome. Regards, Sven Version-Release number of selected component (if applicable): How reproducible: Always. Steps to Reproduce: 1. Compile and Install Squid-2.5STABLE3 from squid-cache.org on a fully patched RH9 machine. 2. Put a very high load on it (120 Request's per second) 3. Wait until crash, usually 3 to 20 days. Actual results: Server totally locks up. No reboot, no network, no screen, no message on serial console, no nothing. Expected results: Squid shouldn't crash the kernel. Additional info: I am not quite sure about the exact first kernel version when the error occured for the first time. (It was definitely a 2.4.20-*).
2.4.20-27 had a few VM stability fixes included, would be worth testing that. (Actually, go straight to 2.4.20-28 which also has a security fix)
Oh, a little misunderstanding here. I meant that we see this behaviour since we moved from 2.4.18-* to 2.4.20-*, including *all* further 2.4.20-* versions since then up to 2.4.20-27. So 2.4.20-28 will probably not fix this problem. Sven
I seem to have a similiar problem when going from 2.4.18-* to 2.4.20-*. I have painfully installed an USB printer HP PSC 2175 all-in-one. Once I have installed kernel 2.4.20-* this kernel crashes immediately once the printer is switced on.
We've seen the problem too. Our systems were up for 6 months prior to the recent 2.4.20.x updates, and now they hang as described every couple of days.
end-of-life'd product.