Bug 168605
Summary: | Fedora Core 4 Kernel Oops whenever any system activity on TYAN S2892 Thunder K8SE Motherboard | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Vladimir Kangin <v> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | error27, pfrields, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
URL: | http://www.kangin.org/FC4-TYAN_K8SE_S2892-Oopsing.txt | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-05-05 01:33:23 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vladimir Kangin
2005-09-17 21:05:16 UTC
Please note that problem do not appeir with kernels kernel-2.6.12-1.1447_FC4 and kernel-2.6.11-1.1369_FC4, but kernels kernel-smp-2.6.11-1.1369_FC4 and kernel-smp-2.6.12-1.1447_FC4 with SMP support Oops-ing. Vladimir Kangin http://people.redhat.com/wtogami/temp/1398/ Can you do the same tests on 1398 and report back your findings? http://people.redhat.com/davej/kernels/Fedora/FC4/ Please also try 1455+ from here. Dear Warren, It seems working well with 1398. Would you like to get feedback with regards to 1455+ too? Could you give a lights about faced problem please. Thanking in advance, Vladimir Kangin Dear Warren, I were to fast to comrfirm success :( # wget http://www.gtlib.cc.gatech.edu/pub/fedora.redhat/linux/core/4/x86_64/iso/FC4-x86_64-DVD.iso --07:42:29-- http://www.gtlib.cc.gatech.edu/pub/fedora.redhat/linux/core/4/x86_64/iso/FC4-x86_64-DVD.iso => `FC4-x86_64-DVD.iso' Resolving www.gtlib.cc.gatech.edu... 130.207.108.135, 130.207.108.136, 130.207.108.134 Connecting to www.gtlib.cc.gatech.edu[130.207.108.135]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,932,850,688 [text/plain] 11% [===========> ] 337,846,248 546.46K/s ETA 1:17:06Killed cat /var/log/messages Sep 18 07:52:33 s1-ams kernel: Unable to handle kernel paging request at 0000000000004b30 RIP: Sep 18 07:52:33 s1-ams kernel: <ffffffff80169c22>{free_pages+210} Sep 18 07:52:33 s1-ams kernel: PGD 139231067 PUD 1396bb067 PMD 0 Sep 18 07:52:33 s1-ams kernel: Oops: 0000 [1] SMP Sep 18 07:52:33 s1-ams kernel: CPU 1 Sep 18 07:52:33 s1-ams kernel: Modules linked in: md5 ipv6 parport_pc lp parport autofs4 rfcomm l2cap bluetooth sunrpc pcmcia yenta_socket rsrc_nonstatic Sep 18 07:52:33 s1-ams kernel: Pid: 2944, comm: wget Not tainted 2.6.12-1.1398_FC4smp Sep 18 07:52:33 s1-ams kernel: RIP: 0010:[<ffffffff80169c22>] <ffffffff80169c22>{free_pages+210} Sep 18 07:52:33 s1-ams kernel: RSP: 0018:ffff81013f569db0 EFLAGS: 00010216 Sep 18 07:52:33 s1-ams kernel: RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000003 Sep 18 07:52:33 s1-ams kernel: RDX: 0000000000120c00 RSI: 0000000000000000 RDI: ffff810120c00000 Sep 18 07:52:33 s1-ams kernel: RBP: ffff810120c00010 R08: 0000000000000018 R09: 0000000000000000 Sep 18 07:52:33 s1-ams kernel: R10: 0000000000000000 R11: ffffffff80319c70 R12: ffff810120c00010 Sep 18 07:52:33 s1-ams kernel: R13: ffff810120c00000 R14: 0000000000000041 R15: 0000000000000104 Sep 18 07:52:33 s1-ams kernel: FS: 00002aaaaaab6c40(0000) GS:ffffffff8050d800(0000) knlGS:0000000000000000 Sep 18 07:52:33 s1-ams kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 18 07:52:33 s1-ams kernel: CR2: 0000000000004b30 CR3: 0000000137d4f000 CR4: 00000000000006e0 Sep 18 07:52:33 s1-ams kernel: Process wget (pid: 2944, threadinfo ffff81013f568000, task ffff81013ef2e840) Sep 18 07:52:33 s1-ams kernel: Stack: ffffffff8019d9d5 0000000000000008 ffff8101366fc7c0 0000000000000010 Sep 18 07:52:33 s1-ams kernel: 0000000000000004 000000000000003c ffffffff8019df7b 412262b752f1a9fc Sep 18 07:52:33 s1-ams kernel: ffff81013f569f40 ffff81013f569f08 Sep 18 07:52:33 s1-ams kernel: Call Trace:<ffffffff8019d9d5>{poll_freewait+85} <ffffffff8019df7b>{do_select+1179} Sep 18 07:52:33 s1-ams kernel: <ffffffff8019d9f0>{__pollwait+0} <ffffffff8019e5cd>{sys_select+637} Sep 18 07:52:33 s1-ams kernel: <ffffffff8010ebf6>{tracesys+209} Sep 18 07:52:33 s1-ams kernel: Sep 18 07:52:33 s1-ams kernel: Code: 49 8b 89 30 4b 00 00 48 39 ca 72 49 48 b8 ff ff ff 7f ff ff Sep 18 07:52:33 s1-ams kernel: RIP <ffffffff80169c22>{free_pages+210} RSP <ffff81013f569db0> Sep 18 07:52:33 s1-ams kernel: CR2: 0000000000004b30 Sep 18 07:52:34 s1-ams kernel: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Sep 18 07:52:34 s1-ams kernel: in_atomic():0, irqs_disabled():1 Sep 18 07:52:34 s1-ams kernel: Sep 18 07:52:34 s1-ams kernel: Call Trace:<ffffffff8013abd5>{profile_task_exit+21} <ffffffff8013bff2>{do_exit+34} Sep 18 07:52:34 s1-ams kernel: <ffffffff80265f18>{do_unblank_screen+40} <ffffffff80124286>{do_page_fault+1926} Sep 18 07:52:34 s1-ams kernel: <ffffffff8035ac32>{thread_return+0} <ffffffff8010f5b5>{error_exit+0} Sep 18 07:52:34 s1-ams kernel: <ffffffff80319c70>{tcp_poll+0} <ffffffff80169c22>{free_pages+210} Sep 18 07:52:34 s1-ams kernel: <ffffffff8019d9d5>{poll_freewait+85} <ffffffff8019df7b>{do_select+1179} Sep 18 07:52:34 s1-ams kernel: <ffffffff8019d9f0>{__pollwait+0} <ffffffff8019e5cd>{sys_select+637} Sep 18 07:52:34 s1-ams kernel: <ffffffff8010ebf6>{tracesys+209} Will try 1455+ in a minute. Regards, Vladimir Kangin Dear Warren, Sep 18 08:14:28 s1-ams kernel: Unable to handle kernel paging request at 0000000000004b30 RIP: Sep 18 08:14:29 s1-ams kernel: <ffffffff8016239a>{free_pages+210} Sep 18 08:14:29 s1-ams kernel: PGD 13fa2e067 PUD 1382b5067 PMD 0 Sep 18 08:14:29 s1-ams kernel: Oops: 0000 [1] SMP The same problem but system last bit longer :) 13% out of DVD iso file. Please advice, Vladimir Kangin Please retry with the latest kernel errata released yesterday (2.6.12-1.1456_FC4) There's also a more experimental newer test kernel at http://people.redhat.com/davej/kernels/Fedora/FC4/ Dear Dave, The same problem with kernel-smp 2.6.12-1.1456_FC4 ;( Vladimir Kangin Mass update to all FC4 bugs: An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream kernel (2.6.13.2). As there were ~3500 changes upstream between this and the previous kernel, it's possible your bug has been fixed already. Please retest with this update, and update this bug if necessary. Thanks. I saw this bug on a couple new dual opteron systems today. One system was running a stock FC4 kernel and the other was running a 2.6.11-1.35 FC3 kernel and also it happens on the FC2 kernel as well. The unique thing was that both systems had 2 sticks of 2G RAM on the first CPU. The systems do not crash in FC3 and FC4 when I use "numa=off" but they still crash in FC2 when I use that. The systems do not crash when one stick of RAM is on the first CPU and one is on the second CPU. The systems do not crash when I run 2,4 or 6 sticks of 1G RAM. I did try to install the newest errata kernel but it failed for an unrelated reason and I didn't have time to debug it. I searched through bugzilla and bug 165285, and bug 168907 also have a similar error message with the where it failed paging at 0000000000004b30. I'll try figure out the issues with the errata kernel tomorrow but I just wanted to post this before I forgot. Could the other people seeing this bug post what RAM they were using? Hi Dan, It is a 2 sticks of 2G RAM on the first CPU. The memory sticks from Kingston are following: KVR400D4R3A/2G @GB PC3200 REG CL3 ECC 183 - Pin DIMM I will try to use "numa=off" on one of the system for test. Rock on. This bug reminds me of bug 160135. In bug 160135 the error address was 00000000000018f0. The comment in #13 says "the crash here is happening in 'pfn_to_page' which is called by virt_to_page. It seems that NODE_DATA(nid) is NULL and thus node_start_pfn is NULL." A null nid would probably explains all the leading zeros in this bug as well. The comment #66 from 160135 is someone with our bug here. ;) Anyway... It might already be fixed in the newest errata kernel if I can make that work. Hi Dan, with numa=off the kernel is stable! What is the impact of this parametr to kernel performance? Vladimir 2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you. This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you. Closing per previous comment. |