Bug 86114
Summary: | "Unable to handle kernel NULL pointer dereference at virtual address 000001f8" while stress test | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Mathias Retzlaff <bug> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 | CC: | bill, ivanfmartinez, pfrields, riel |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:40:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mathias Retzlaff
2003-03-14 10:07:53 UTC
I have the same problem on RH 9 with 2.4.20-9 Athlon XP 2100+ , Asus A7S333 (I have one other machine with same MB, and RH 7.3 2.4.18-19.7.x, working fine) It's running only in test environment, load average < 1.0 First time we got the problem (ocurred 3 times in this week) : --------------------------------------------------- Apr 13 13:31:56 cintra kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000074 Apr 13 13:31:56 cintra kernel: printing eip: Apr 13 13:31:56 cintra kernel: c0140dbb Apr 13 13:31:56 cintra kernel: *pde = 00000000 Apr 13 13:31:56 cintra kernel: Oops: 0000 Apr 13 13:31:56 cintra kernel: cmpci soundcore nfs lockd sunrpc smbfs it87 i2c-proc i2c-isa i2c-core autofs ipchains cls_route cls_u32 cls_fw sch_prio sch_sfq sch_tbf sch_cbq e100 keybdev m Apr 13 13:31:56 cintra kernel: CPU: 0 Apr 13 13:31:56 cintra kernel: EIP: 0060:[<c0140dbb>] Not tainted Apr 13 13:31:56 cintra kernel: EFLAGS: 00010202 Apr 13 13:31:56 cintra kernel: Apr 13 13:31:56 cintra kernel: EIP is at page_referenced [kernel] 0x227 (2.4.20-9) Apr 13 13:31:56 cintra kernel: eax: c1d1f930 ebx: 00000064 ecx: 00000000 edx: 00000001 Apr 13 13:31:56 cintra kernel: esi: 0000000e edi: c2522400 ebp: 00000000 esp: f7fc3f84 Apr 13 13:31:56 cintra kernel: ds: 0068 es: 0068 ss: 0068 Apr 13 13:31:56 cintra kernel: Process kscand/HighMem (pid: 8, stackpage=f7fc3000) Apr 13 13:31:56 cintra kernel: Stack: f61c9800 00000000 00000001 f7fc3fb4 c24ca368 c24ca368 c030420c c24ca34c Apr 13 13:31:56 cintra kernel: 00000003 c0139b0e f7fc2000 c0124b2c 00000001 00000003 f7fc2000 c0304080 Apr 13 13:31:56 cintra kernel: f7fc2000 c013a994 c0304080 00000003 00000000 c0256613 000009c4 c013a898 Apr 13 13:31:56 cintra kernel: Call Trace: [<c0139b0e>] scan_active_list [kernel] 0x36 (0xf7fc3fa8)) Apr 13 13:31:56 cintra kernel: [<c0124b2c>] process_timeout [kernel] 0x0 (0xf7fc3fb0)) Apr 13 13:31:56 cintra kernel: [<c013a994>] kscand [kernel] 0xfc (0xf7fc3fc8)) Apr 13 13:31:56 cintra kernel: [<c013a898>] kscand [kernel] 0x0 (0xf7fc3fe0)) Apr 13 13:31:56 cintra kernel: [<c0107389>] kernel_thread_helper [kernel] 0x5 (0xf7fc3ff0)) Apr 13 13:31:56 cintra kernel: Apr 13 13:31:56 cintra kernel: Apr 13 13:31:56 cintra kernel: Code: 8b 41 74 39 41 60 0f 43 54 24 04 45 4e 89 54 24 04 0f 89 3e --------------------------------------------------- After this message in log, the services remain working some time and then stop (TCP Services, CRON, etc...), also we cant login on the console. Same problem this night with kernel from : http://people.redhat.com/arjanv/testkernels/ When I got someone to reboot the machine I will try to install the kernel from 7.3. A "me too" report. The process was performing a CPU-intensive model fit, and writing a few megabytes to a NetApp filer. I'm guessing that it is flakiness of the hardware under load. memtest86 shows nothing. lm_sensors shows: Idle: CPU1: 51 CPU2: 45.0 VRM2: 44.5 VRM1: 30 AGP: 20.0 DDR: 18.5 Oops: CPU1: 61 CPU2: 52.5 VRM2: 51.5 VRM1: 38 AGP: 24.5 DDR: 23.0 I am considering installing the i686 kernel instead, as I've seen it improve stability on older Tyan Thunder boards, presumably because of less stress on the memory controller. Tyan Thunder S2469 Dual AMD Athlon(tm) MP 2600+ 512MB ECC RAM, 1GB SWAP 2x Fujitsu MAN3367MP (SW RAID1) RH 8.0, 2.4.20-20.8smp athlon Unable to handle kernel paging request at virtual address 00732a00 printing eip: c0143c6e *pde = 00000000 Oops: 0000 nls_iso8859-1 radeon agpgart binfmt_misc parport_pc lp parport autofs nfs lockd sunrpc e1000 iptable_filter ip_tables ide-cd cdrom eeprom w83781d i2c-proc i2c CPU: 1 EIP: 0010:[<c0143c6e>] Not tainted EFLAGS: 00010246 EIP is at swap_info_get [kernel] 0x1e (2.4.20-20.8smp) eax: 00732a00 ebx: 00000025 ecx: 00000000 edx: 00732a00 esi: 00000000 edi: 00732a00 ebp: c26473a4 esp: cad8fe08 ds: 0018 es: 0018 ss: 0018 Process lsfit (pid: 30263, stackpage=cad8f000) Stack: c03d1260 c26473a0 00000026 00000025 00000000 c014402d 00732a00 00732b00 00000025 00000026 0004d000 c013358c 00732a00 0804c000 c26473a0 c1482ff8 00000026 1c17c045 08400000 d1a01084 0819c000 00000000 c013134b c03ac8a0 Call Trace: [<c014402d>] free_swap_and_cache [kernel] 0x1d (0xcad8fe1c)) [<c013358c>] zap_pte_range [kernel] 0x22c (0xcad8fe34)) [<c013134b>] zap_page_range [kernel] 0x10b (0xcad8fe60)) [<c0134d34>] exit_mmap [kernel] 0xc4 (0xcad8fea4)) [<c011e4cb>] mmput [kernel] 0x5b (0xcad8fec8)) [<c0123e16>] do_exit [kernel] 0xd6 (0xcad8fed8)) [<c012aa73>] sig_exit [kernel] 0xb3 (0xcad8fef4)) [<c012ac74>] dequeue_signal [kernel] 0x64 (0xcad8fefc)) [<c0108e4f>] do_signal [kernel] 0x1bf (0xcad8ff14)) [<e098c540>] nfs_file_write [nfs] 0x90 (0xcad8ff6c)) [<c014bd9e>] sys_write [kernel] 0xfe (0xcad8ff94)) [<c011b1d0>] do_page_fault [kernel] 0x0 (0xcad8ffb8)) [<c0109198>] signal_return [kernel] 0x14 (0xcad8ffc0)) Code: 3b 0d 44 12 3d c0 0f 83 b6 00 00 00 8d 04 cd 00 00 00 00 29 It suxx bug :( [root@cyber bor]# cat /etc/redhat-release Red Hat Linux release 8.0 (Psyche) [root@cyber bor]# uname -a Linux cyber.kiev.farlep.net 2.4.20-20.8 #1 Mon Aug 18 14:59:07 EDT 2003 i686 i686 i386 GNU/Linux In my logs start it bug Mar 11 05:11:42 cyber syslogd 1.4.1: restart. Mar 11 06:53:28 cyber named[27502]: lame server resolving 'dyn-230.criscom.net' (in 'criscom.net'?): 212.110.152.2#53 Mar 11 07:51:11 cyber modprobe: modprobe: Can't locate module net-pf-22 Mar 11 07:51:42 cyber last message repeated 520 times Mar 11 07:52:43 cyber last message repeated 1147 times Mar 11 07:53:44 cyber last message repeated 1063 times Mar 11 07:54:45 cyber last message repeated 1100 times Mar 11 07:55:35 cyber last message repeated 819 times Mar 11 07:55:35 cyber kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 Mar 11 07:55:35 cyber kernel: printing eip: Mar 11 07:55:35 cyber kernel: cbb597ee Mar 11 07:55:35 cyber kernel: *pde = 00000000 Mar 11 07:55:35 cyber kernel: Oops: 0000 Mar 11 07:55:35 cyber kernel: ipt_LOG ipt_state ip_conntrack ipt_REJECT iptable_filter ip_tables eepro100 mii mousedev keybdev hid input usb-uhci usbcore ext3 jbd Mar 11 07:55:35 cyber kernel: CPU: 0 Mar 11 07:55:35 cyber kernel: EIP: 0010:[<cbb597ee>] Not tainted Mar 11 07:55:35 cyber kernel: EFLAGS: 00010286 Mar 11 07:55:35 cyber kernel: ......... x many times and after I have Segmentation fault :( After many tries I do su and REBOOT Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |