Description of problem: We have bought a new computer installed redhat linux 8.0 and put it in production environment as a webserver. After 2 days it did stop responding to network requests on all tcp/ip ports. But it still responded to pings. After a reboot everything was okay again, but in /var/log/messages there were several error-messages like this: ---START--------------------------------------------- Mar 9 20:51:34 Pager2 kernel: Unable to handle kernel paging request at virtual address 3e6b6e6d Mar 9 20:51:34 Pager2 kernel: printing eip: Mar 9 20:51:34 Pager2 kernel: c013d210 Mar 9 20:51:34 Pager2 kernel: *pde = 00000000 Mar 9 20:51:34 Pager2 kernel: Oops: 0000 Mar 9 20:51:34 Pager2 kernel: eepro100 ipt_REJECT iptable_filter ip_tables mousedev keybdev hid input usb-ohci usbcore raid1 Mar 9 20:51:34 Pager2 kernel: CPU: 0 Mar 9 20:51:34 Pager2 kernel: EIP: 0010:[<c013d210>] Not tainted Mar 9 20:51:34 Pager2 kernel: EFLAGS: 00010206 Mar 9 20:51:34 Pager2 kernel: Mar 9 20:51:34 Pager2 kernel: EIP is at page_remove_rmap [kernel] 0x50 (2.4.18-18.8.0) Mar 9 20:51:34 Pager2 kernel: eax: 3e6b6e69 ebx: cf38fc58 ecx: c16cdf20 edx: d18c11bc Mar 9 20:51:34 Pager2 kernel: esi: c03006c0 edi: 00100000 ebp: 1f1b2025 esp: cd409ecc Mar 9 20:51:34 Pager2 kernel: ds: 0018 es: 0018 ss: 0018 Mar 9 20:51:34 Pager2 kernel: Process httpd (pid: 15575, stackpage=cd409000) Mar 9 20:51:34 Pager2 kernel: Stack: ded759a0 df5f4f38 c16a81a0 c03006c0 d18c11bc 0006f000 c012c8a3 c16a81a0 Mar 9 20:51:34 Pager2 kernel: c036e460 00000001 00000000 00000028 42400000 c290c424 42100000 00000000 Mar 9 20:51:34 Pager2 kernel: c012ac7b d7385320 c290c420 42000000 00100000 42400000 00000000 42100000 Mar 9 20:51:34 Pager2 kernel: Call Trace: [<c012c8a3>] zap_pte_range [kernel] 0x113 (0xcd409ee4)) Mar 9 20:51:34 Pager2 kernel: [<c012ac7b>] do_zap_page_range [kernel] 0x8b (0xcd409f0c)) Mar 9 20:51:34 Pager2 kernel: [<c012b198>] zap_page_range [kernel] 0x58 (0xcd409f40)) Mar 9 20:51:34 Pager2 kernel: [<c012e031>] exit_mmap [kernel] 0xd1 (0xcd409f64)) Mar 9 20:51:34 Pager2 kernel: [<c0119419>] mmput [kernel] 0x39 (0xcd409f8c)) Mar 9 20:51:34 Pager2 kernel: [<c011e026>] do_exit [kernel] 0xa6 (0xcd409f9c)) Mar 9 20:51:34 Pager2 kernel: [<c011e203>] sys_exit [kernel] 0x13 (0xcd409fb8)) Mar 9 20:51:34 Pager2 kernel: [<c0109127>] system_call [kernel] 0x33 (0xcd409fc0)) Mar 9 20:51:34 Pager2 kernel: Mar 9 20:51:34 Pager2 kernel: Mar 9 20:51:34 Pager2 kernel: Code: 39 50 04 74 17 89 c3 8b 00 85 c0 75 f3 8d 76 00 8b 5c 24 10 ---END---------------------------------------------- because those errors occured at the time of most load, we change network card and did a stress test to this server. After a night of loadverage > 20 we could read this error message (9 times) in /var/log/messages: ---START------------------------------------------- Mar 14 03:50:49 Pager2 kernel: Unable to handle kernel NULL pointer dereference at virtual address 000001f8 Mar 14 03:50:49 Pager2 kernel: printing eip: Mar 14 03:50:49 Pager2 kernel: c013d210 Mar 14 03:50:49 Pager2 kernel: *pde = 00000000 Mar 14 03:50:49 Pager2 kernel: Oops: 0000 Mar 14 03:50:49 Pager2 kernel: 8139too mii ipt_REJECT iptable_filter ip_tables mousedev keybdev hid input usb-ohci usbcore raid1 Mar 14 03:50:49 Pager2 kernel: CPU: 0 Mar 14 03:50:49 Pager2 kernel: EIP: 0010:[<c013d210>] Not tainted Mar 14 03:50:49 Pager2 kernel: EFLAGS: 00010202 Mar 14 03:50:49 Pager2 kernel: Mar 14 03:50:49 Pager2 kernel: EIP is at page_remove_rmap [kernel] 0x50 (2.4.18-18.8.0) Mar 14 03:50:49 Pager2 kernel: eax: 000001f4 ebx: c3cca578 ecx: c2299648 edx: f34eb8b8 Mar 14 03:50:49 Pager2 kernel: esi: c03007c0 edi: 00021000 ebp: 55065025 esp: d183decc Mar 14 03:50:50 Pager2 kernel: ds: 0018 es: 0018 ss: 0018 Mar 14 03:50:50 Pager2 kernel: Process httpd (pid: 523, stackpage=d183d000) Mar 14 03:50:50 Pager2 kernel: Stack: f6897b80 f6806568 c2299610 c03007c0 f34eb8b8 00002000 c012c8a3 c2299610 Mar 14 03:50:50 Pager2 kernel: ffffffff 0000ff41 002bc000 00000003 40400000 f1eb7404 4024d000 00000000 Mar 14 03:50:50 Pager2 kernel: c012ac7b f469d7c0 f1eb7400 4022c000 00021000 4062c000 00000000 4024d000 Mar 14 03:50:50 Pager2 kernel: Call Trace: [<c012c8a3>] zap_pte_range [kernel] 0x113 (0xd183dee4)) Mar 14 03:50:50 Pager2 kernel: [<c012ac7b>] do_zap_page_range [kernel] 0x8b (0xd183df0c)) Mar 14 03:50:50 Pager2 kernel: [<c012b198>] zap_page_range [kernel] 0x58 (0xd183df40)) Mar 14 03:50:50 Pager2 kernel: [<c012e031>] exit_mmap [kernel] 0xd1 (0xd183df64)) Mar 14 03:50:50 Pager2 kernel: [<c0119419>] mmput [kernel] 0x39 (0xd183df8c)) Mar 14 03:50:50 Pager2 kernel: [<c011e026>] do_exit [kernel] 0xa6 (0xd183df9c)) Mar 14 03:50:50 Pager2 kernel: [<c011e203>] sys_exit [kernel] 0x13 (0xd183dfb8)) Mar 14 03:50:50 Pager2 kernel: [<c0109127>] system_call [kernel] 0x33 (0xd183dfc0)) Mar 14 03:50:50 Pager2 kernel: Mar 14 03:50:50 Pager2 kernel: Mar 14 03:50:50 Pager2 kernel: Code: 39 50 04 74 17 89 c3 8b 00 85 c0 75 f3 8d 76 00 8b 5c 24 10 ------END---------------------------------------------------- But the server did not crash. The error messages appeared all in a time range of 5 minutes and after that the server kept running without any problems for 5 hours, until we stopped the test. Version-Release number of selected component (if applicable): Kernelversion: 2.4.18-18.8.0 How reproducible: probably always Steps to Reproduce: 1. 2. 3. Actual results: Error messages in logfile and probably a crash Expected results: no error messages, no crash Additional info: If you need additional info just tell me what you need
I have the same problem on RH 9 with 2.4.20-9 Athlon XP 2100+ , Asus A7S333 (I have one other machine with same MB, and RH 7.3 2.4.18-19.7.x, working fine) It's running only in test environment, load average < 1.0 First time we got the problem (ocurred 3 times in this week) : --------------------------------------------------- Apr 13 13:31:56 cintra kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000074 Apr 13 13:31:56 cintra kernel: printing eip: Apr 13 13:31:56 cintra kernel: c0140dbb Apr 13 13:31:56 cintra kernel: *pde = 00000000 Apr 13 13:31:56 cintra kernel: Oops: 0000 Apr 13 13:31:56 cintra kernel: cmpci soundcore nfs lockd sunrpc smbfs it87 i2c-proc i2c-isa i2c-core autofs ipchains cls_route cls_u32 cls_fw sch_prio sch_sfq sch_tbf sch_cbq e100 keybdev m Apr 13 13:31:56 cintra kernel: CPU: 0 Apr 13 13:31:56 cintra kernel: EIP: 0060:[<c0140dbb>] Not tainted Apr 13 13:31:56 cintra kernel: EFLAGS: 00010202 Apr 13 13:31:56 cintra kernel: Apr 13 13:31:56 cintra kernel: EIP is at page_referenced [kernel] 0x227 (2.4.20-9) Apr 13 13:31:56 cintra kernel: eax: c1d1f930 ebx: 00000064 ecx: 00000000 edx: 00000001 Apr 13 13:31:56 cintra kernel: esi: 0000000e edi: c2522400 ebp: 00000000 esp: f7fc3f84 Apr 13 13:31:56 cintra kernel: ds: 0068 es: 0068 ss: 0068 Apr 13 13:31:56 cintra kernel: Process kscand/HighMem (pid: 8, stackpage=f7fc3000) Apr 13 13:31:56 cintra kernel: Stack: f61c9800 00000000 00000001 f7fc3fb4 c24ca368 c24ca368 c030420c c24ca34c Apr 13 13:31:56 cintra kernel: 00000003 c0139b0e f7fc2000 c0124b2c 00000001 00000003 f7fc2000 c0304080 Apr 13 13:31:56 cintra kernel: f7fc2000 c013a994 c0304080 00000003 00000000 c0256613 000009c4 c013a898 Apr 13 13:31:56 cintra kernel: Call Trace: [<c0139b0e>] scan_active_list [kernel] 0x36 (0xf7fc3fa8)) Apr 13 13:31:56 cintra kernel: [<c0124b2c>] process_timeout [kernel] 0x0 (0xf7fc3fb0)) Apr 13 13:31:56 cintra kernel: [<c013a994>] kscand [kernel] 0xfc (0xf7fc3fc8)) Apr 13 13:31:56 cintra kernel: [<c013a898>] kscand [kernel] 0x0 (0xf7fc3fe0)) Apr 13 13:31:56 cintra kernel: [<c0107389>] kernel_thread_helper [kernel] 0x5 (0xf7fc3ff0)) Apr 13 13:31:56 cintra kernel: Apr 13 13:31:56 cintra kernel: Apr 13 13:31:56 cintra kernel: Code: 8b 41 74 39 41 60 0f 43 54 24 04 45 4e 89 54 24 04 0f 89 3e --------------------------------------------------- After this message in log, the services remain working some time and then stop (TCP Services, CRON, etc...), also we cant login on the console.
Same problem this night with kernel from : http://people.redhat.com/arjanv/testkernels/ When I got someone to reboot the machine I will try to install the kernel from 7.3.
A "me too" report. The process was performing a CPU-intensive model fit, and writing a few megabytes to a NetApp filer. I'm guessing that it is flakiness of the hardware under load. memtest86 shows nothing. lm_sensors shows: Idle: CPU1: 51 CPU2: 45.0 VRM2: 44.5 VRM1: 30 AGP: 20.0 DDR: 18.5 Oops: CPU1: 61 CPU2: 52.5 VRM2: 51.5 VRM1: 38 AGP: 24.5 DDR: 23.0 I am considering installing the i686 kernel instead, as I've seen it improve stability on older Tyan Thunder boards, presumably because of less stress on the memory controller. Tyan Thunder S2469 Dual AMD Athlon(tm) MP 2600+ 512MB ECC RAM, 1GB SWAP 2x Fujitsu MAN3367MP (SW RAID1) RH 8.0, 2.4.20-20.8smp athlon Unable to handle kernel paging request at virtual address 00732a00 printing eip: c0143c6e *pde = 00000000 Oops: 0000 nls_iso8859-1 radeon agpgart binfmt_misc parport_pc lp parport autofs nfs lockd sunrpc e1000 iptable_filter ip_tables ide-cd cdrom eeprom w83781d i2c-proc i2c CPU: 1 EIP: 0010:[<c0143c6e>] Not tainted EFLAGS: 00010246 EIP is at swap_info_get [kernel] 0x1e (2.4.20-20.8smp) eax: 00732a00 ebx: 00000025 ecx: 00000000 edx: 00732a00 esi: 00000000 edi: 00732a00 ebp: c26473a4 esp: cad8fe08 ds: 0018 es: 0018 ss: 0018 Process lsfit (pid: 30263, stackpage=cad8f000) Stack: c03d1260 c26473a0 00000026 00000025 00000000 c014402d 00732a00 00732b00 00000025 00000026 0004d000 c013358c 00732a00 0804c000 c26473a0 c1482ff8 00000026 1c17c045 08400000 d1a01084 0819c000 00000000 c013134b c03ac8a0 Call Trace: [<c014402d>] free_swap_and_cache [kernel] 0x1d (0xcad8fe1c)) [<c013358c>] zap_pte_range [kernel] 0x22c (0xcad8fe34)) [<c013134b>] zap_page_range [kernel] 0x10b (0xcad8fe60)) [<c0134d34>] exit_mmap [kernel] 0xc4 (0xcad8fea4)) [<c011e4cb>] mmput [kernel] 0x5b (0xcad8fec8)) [<c0123e16>] do_exit [kernel] 0xd6 (0xcad8fed8)) [<c012aa73>] sig_exit [kernel] 0xb3 (0xcad8fef4)) [<c012ac74>] dequeue_signal [kernel] 0x64 (0xcad8fefc)) [<c0108e4f>] do_signal [kernel] 0x1bf (0xcad8ff14)) [<e098c540>] nfs_file_write [nfs] 0x90 (0xcad8ff6c)) [<c014bd9e>] sys_write [kernel] 0xfe (0xcad8ff94)) [<c011b1d0>] do_page_fault [kernel] 0x0 (0xcad8ffb8)) [<c0109198>] signal_return [kernel] 0x14 (0xcad8ffc0)) Code: 3b 0d 44 12 3d c0 0f 83 b6 00 00 00 8d 04 cd 00 00 00 00 29
It suxx bug :( [root@cyber bor]# cat /etc/redhat-release Red Hat Linux release 8.0 (Psyche) [root@cyber bor]# uname -a Linux cyber.kiev.farlep.net 2.4.20-20.8 #1 Mon Aug 18 14:59:07 EDT 2003 i686 i686 i386 GNU/Linux In my logs start it bug Mar 11 05:11:42 cyber syslogd 1.4.1: restart. Mar 11 06:53:28 cyber named[27502]: lame server resolving 'dyn-230.criscom.net' (in 'criscom.net'?): 212.110.152.2#53 Mar 11 07:51:11 cyber modprobe: modprobe: Can't locate module net-pf-22 Mar 11 07:51:42 cyber last message repeated 520 times Mar 11 07:52:43 cyber last message repeated 1147 times Mar 11 07:53:44 cyber last message repeated 1063 times Mar 11 07:54:45 cyber last message repeated 1100 times Mar 11 07:55:35 cyber last message repeated 819 times Mar 11 07:55:35 cyber kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 Mar 11 07:55:35 cyber kernel: printing eip: Mar 11 07:55:35 cyber kernel: cbb597ee Mar 11 07:55:35 cyber kernel: *pde = 00000000 Mar 11 07:55:35 cyber kernel: Oops: 0000 Mar 11 07:55:35 cyber kernel: ipt_LOG ipt_state ip_conntrack ipt_REJECT iptable_filter ip_tables eepro100 mii mousedev keybdev hid input usb-uhci usbcore ext3 jbd Mar 11 07:55:35 cyber kernel: CPU: 0 Mar 11 07:55:35 cyber kernel: EIP: 0010:[<cbb597ee>] Not tainted Mar 11 07:55:35 cyber kernel: EFLAGS: 00010286 Mar 11 07:55:35 cyber kernel: ......... x many times and after I have Segmentation fault :( After many tries I do su and REBOOT
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/