Description of Problem: RH 7.0 / kernel 2.2.16-22 freezes up and requires reboot. Version-Release number of selected component (if applicable): Linux bnfs01.photopoint.com 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown Red Hat Linux release 7.0 (Guinness) How Reproducible: Occurs, apparently, at random intervals (including twice today). Steps to Reproduce: 1. - 2. - 3. - Actual Results: - Expected Results: - Additional Information: My main question is: Would this be a kernel issue, or a hardware issue? Snip out of /var/log/messages for one occurance: Nov 1 04:08:11 bnfs01 sshd2[23268]: protocol version not supported in local: 'Illegal protocol version.' *** Nov 1 04:08:35 bnfs01 kernel: kfree: Bad obj 82eb6340 *** Nov 1 04:08:35 bnfs01 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 *** Nov 1 04:08:35 bnfs01 kernel: current->tss.cr3 = 04fad000, %cr3 = 04fad000 *** Nov 1 04:08:35 bnfs01 kernel: *pde = 00000000 *** Nov 1 04:08:35 bnfs01 kernel: Oops: 0002 *** Nov 1 04:08:35 bnfs01 kernel: CPU: 0 *** Nov 1 04:08:35 bnfs01 kernel: EIP: 0010:[kfree+403/424] *** Nov 1 04:08:35 bnfs01 kernel: EFLAGS: 00010286 *** Nov 1 04:08:35 bnfs01 kernel: eax: 0000001b ebx: c56307c0 ecx: 0000001e edx: 0000001d *** Nov 1 04:08:35 bnfs01 kernel: esi: 82eb6340 edi: c336d760 ebp: 000002aa esp: c0c55e68 *** Nov 1 04:08:35 bnfs01 kernel: ds: 0018 es: 0018 ss: 0018 *** Nov 1 04:08:35 bnfs01 kernel: Process updatedb (pid: 23103, process nr: 68, stackpage=c0c55000) *** Nov 1 04:08:35 bnfs01 kernel: Stack: c56307c0 c2eb62e0 c336d760 000002aa c2eb62e0 c336d760 c01313c4 82eb6340 *** Nov 1 04:08:35 bnfs01 kernel: c0c55ed0 c0c55ed0 c021e264 00000fff c0c55ed0 00000001 00000fff c013235b *** Nov 1 04:08:35 bnfs01 kernel: fffff2ab 00000fff 00000000 c025a5c0 c021e264 c025a5c0 c3477180 c327cdd0 *** Nov 1 04:08:35 bnfs01 kernel: Call Trace: [prune_dcache+220/300] [try_to_free_inodes+199/264] [grow_inodes+30/384] [get_new_inode+173/280] [get_new_inode+185/280] [iget+88/96] [ext2_lookup+84/124] *** Nov 1 04:08:35 bnfs01 kernel: [real_lookup+79/160] [lookup_dentry+296/488] [__namei+40/88] [sys_newlstat+14/96] [system_call+52/56] [startup_32+43/285] *** Nov 1 04:08:35 bnfs01 kernel: Code: c7 05 00 00 00 00 00 00 00 00 83 c4 08 5b 5e 5f 5d 83 c4 08 Nov 1 07:08:36 bnfs01 syslogd 1.3-3: restart. Snip out of /var/log/messages for another occurance: Nov 1 11:12:29 bnfs01 kernel: svc: unknown program 100227 (me 100003) *** Nov 1 15:26:55 bnfs01 kernel: kfree: Bad obj 80aeefa0 *** Nov 1 15:26:55 bnfs01 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 *** Nov 1 15:26:55 bnfs01 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000 *** Nov 1 15:26:55 bnfs01 kernel: *pde = 00000000 *** Nov 1 15:26:55 bnfs01 kernel: Oops: 0002 *** Nov 1 15:26:55 bnfs01 kernel: CPU: 0 *** Nov 1 15:26:55 bnfs01 kernel: EIP: 0010:[kfree+403/424] *** Nov 1 15:26:55 bnfs01 kernel: EFLAGS: 00010282 *** Nov 1 15:26:55 bnfs01 kernel: eax: 0000001b ebx: c0541720 ecx: 00000000 edx: 0000003b *** Nov 1 15:26:55 bnfs01 kernel: esi: 80aeefa0 edi: c2eb2440 ebp: 0000041c esp: c6105df8 *** Nov 1 15:26:55 bnfs01 kernel: ds: 0018 es: 0018 ss: 0018 *** Nov 1 15:26:55 bnfs01 kernel: Process nfsd (pid: 573, process nr: 40, stackpage=c6105000) *** Nov 1 15:26:55 bnfs01 kernel: Stack: c0541720 c2eb62e0 c2eb2440 0000041c c2eb62e0 c2eb2440 c01313c4 80aeefa0 *** Nov 1 15:26:55 bnfs01 kernel: c6105e60 c6105e60 c021e264 00000dd7 c6105e60 00000001 00000dd7 c013235b *** Nov 1 15:26:55 bnfs01 kernel: fffff43b 00000dd7 00000000 c025a280 c021e264 c025a280 c11954e0 c6762330 *** Nov 1 15:26:55 bnfs01 kernel: Call Trace: [prune_dcache+220/300] [try_to_free_inodes+199/264] [grow_inodes+30/384] [inet_sendmsg+0/144] [get_new_inode+185/280] [iget+88/96] [ext2_lookup+84/124] *** Nov 1 15:26:55 bnfs01 kernel: [real_lookup+79/160] [lookup_dentry+296/488] [<c8163ac4>] [<c816a960>] [<c8161b08>] [<c816a960>] [<c8161437>] [<c816a960>] *** Nov 1 15:26:55 bnfs01 kernel: [<c814f468>] [<c816ad20>] [<c816a88c>] [<c8161235>] [kernel_thread+35/48] *** Nov 1 15:26:55 bnfs01 kernel: Code: c7 05 00 00 00 00 00 00 00 00 83 c4 08 5b 5e 5f 5d 83 c4 08 Nov 1 16:52:34 bnfs01 syslogd 1.3-3: restart. cat /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 1 cpu MHz : 551.265 cache size : 256 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr xmm bogomips : 1101.00 lspci -v: 00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03) Subsystem: Asustek Computer, Inc.: Unknown device 8024 Flags: bus master, medium devsel, latency 64 Memory at e4000000 (32-bit, prefetchable) Capabilities: <available only to root> 00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03) (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 64 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: dd800000-dfefffff Prefetchable memory behind bridge: e3f00000-e3ffffff 00:04.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02) Flags: bus master, medium devsel, latency 0 00:04.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01) (prog-if 80 [Master]) Flags: bus master, medium devsel, latency 32 I/O ports at b800 00:04.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01) (prog-if 00 [UHCI]) Flags: bus master, medium devsel, latency 0, IRQ 12 I/O ports at b400 00:04.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02) Flags: medium devsel 00:0b.0 PCI bridge: Distributed Processing Technology PCI Bridge (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, medium devsel, latency 32 Bus: primary=00, secondary=02, subordinate=02, sec-latency=32 Capabilities: <available only to root> 00:0b.1 I2O: Distributed Processing Technology SmartRAID V Controller (rev 02) (prog-if 01) Subsystem: Distributed Processing Technology: Unknown device c05a Flags: bus master, medium devsel, latency 64, IRQ 10 BIST result: 00 Memory at e0000000 (32-bit, prefetchable) Capabilities: <available only to root> 00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74) Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC Flags: bus master, medium devsel, latency 32, IRQ 12 I/O ports at a800 Memory at dd000000 (32-bit, non-prefetchable) Capabilities: <available only to root> 01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage Pro AGP 1X/2X (rev 5c) (prog-if 00 [VGA]) Subsystem: ATI Technologies Inc: Unknown device 0084 Flags: bus master, stepping, medium devsel, latency 64, IRQ 11 Memory at de000000 (32-bit, non-prefetchable) I/O ports at d800 Memory at dd800000 (32-bit, non-prefetchable) Expansion ROM at e3fe0000 [disabled] Capabilities: <available only to root>
Created attachment 36121 [details] lspci -vvv
Created attachment 36122 [details] df -k
Created attachment 36123 [details] dmesg
Since the calltrace mentions inodes and ext2, I guessed it could be some disk failure, or maybe a bug in the scsi driver. I've been browsing around the Net quite a bit, looking for similar occurances, but haven't found any that looks like the same problem. Here's one example: http://www.uwsg.indiana.edu/hypermail/linux/kernel/9907.1/0823.html Any help/comments/input would be greatly appreciated. :-)
Can you also attach the output of "lsmod" ?
Created attachment 36183 [details] lsmod
Created attachment 36184 [details] lsof (just fyi, about 50k)
Created attachment 36185 [details] top
Created attachment 36186 [details] ps auxw
cat /etc/modules.conf: alias scsi_hostadapter dpt_i2o alias eth0 3c59x options 3c59x options=4 full_duplex=1 debug=1 alias parport_lowlevel parport_pc alias eth1 3c90x alias usb-controller usb-uhci
Here's a (potentially) missing piece of information: The lsmod output shows a dpt_i2o, which is the driver for the Adaptec ATA RAID, Model 2400A, that's in the box. Relevant links are: http://linux.adaptec.com/ http://www.adaptec.com/worldwide/support/driverdetail.html?cat=%2fOperating+System%2fLinux&filekey=aar2400_linux_v221_drv.rpm Adaptec's driver is specifically for RH 7.0, which the reason we're using that in this case. I hope their driver isn't the the cause of the failure, but I'll send them a link to this page, so they know.
The later 2.4.x kernels have DPT i2o support as standard and somewhat cleaned up. Please re-open the bug if the problem is still occuring with 2.4.x kernels. Thanks