From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4 Description of problem: After FC4 installation complete any high activity of the system cause Kernel Oops-ing with following dump: http://www.kangin.org/FC4-TYAN_K8SE_S2892-Oopsing.txt That happens in any attempt to wget any iso file, or make an update of the system with 'yum update'. Update of the Kernel didn't solve the problem, and the second equal system have same behaviour. Version-Release number of selected component (if applicable): kernel-smp-2.6.11-1.1369_FC4 kernel-smp-2.6.12-1.1447_FC4 kernel-2.6.12-1.1447_FC4 kernel-2.6.11-1.1369_FC4 How reproducible: Always Steps to Reproduce: 1. Install FC4 on TYAN S2892 Thunder K8SE with 2x AMD Opteron 252. Note that the rest of part are irrelevant - been tested with different set of memory, disks and etc. and always result the same. 2. Actual Oops-ing: Option 1. Then run "yum update" and wait till "Apply Transuctions" and then it will crash. Option 2. Just run wget http://www.gtlib.cc.gatech.edu/pub/fedora.redhat/linux/core/4/x86_64/iso/FC4-x86_64-DVD.iso and wait till 7-10% of download and then it will crash. Actual Results: System completely hungs with one of the two errors: kernel: Unable to handle kernel paging request at 0000000000004b20 RIP: (Cause hard crush in a moment!) or kernel: Unable to handle kernel paging request at 0000000000004b30 RIP: (Cause crush but system still pingable for a period of time!) Expected Results: Do not crash :) Additional info: http://bugzilla.kernel.org/show_bug.cgi?id=5272
Please note that problem do not appeir with kernels kernel-2.6.12-1.1447_FC4 and kernel-2.6.11-1.1369_FC4, but kernels kernel-smp-2.6.11-1.1369_FC4 and kernel-smp-2.6.12-1.1447_FC4 with SMP support Oops-ing. Vladimir Kangin
http://people.redhat.com/wtogami/temp/1398/ Can you do the same tests on 1398 and report back your findings? http://people.redhat.com/davej/kernels/Fedora/FC4/ Please also try 1455+ from here.
Dear Warren, It seems working well with 1398. Would you like to get feedback with regards to 1455+ too? Could you give a lights about faced problem please. Thanking in advance, Vladimir Kangin
Dear Warren, I were to fast to comrfirm success :( # wget http://www.gtlib.cc.gatech.edu/pub/fedora.redhat/linux/core/4/x86_64/iso/FC4-x86_64-DVD.iso --07:42:29-- http://www.gtlib.cc.gatech.edu/pub/fedora.redhat/linux/core/4/x86_64/iso/FC4-x86_64-DVD.iso => `FC4-x86_64-DVD.iso' Resolving www.gtlib.cc.gatech.edu... 130.207.108.135, 130.207.108.136, 130.207.108.134 Connecting to www.gtlib.cc.gatech.edu[130.207.108.135]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,932,850,688 [text/plain] 11% [===========> ] 337,846,248 546.46K/s ETA 1:17:06Killed cat /var/log/messages Sep 18 07:52:33 s1-ams kernel: Unable to handle kernel paging request at 0000000000004b30 RIP: Sep 18 07:52:33 s1-ams kernel: <ffffffff80169c22>{free_pages+210} Sep 18 07:52:33 s1-ams kernel: PGD 139231067 PUD 1396bb067 PMD 0 Sep 18 07:52:33 s1-ams kernel: Oops: 0000 [1] SMP Sep 18 07:52:33 s1-ams kernel: CPU 1 Sep 18 07:52:33 s1-ams kernel: Modules linked in: md5 ipv6 parport_pc lp parport autofs4 rfcomm l2cap bluetooth sunrpc pcmcia yenta_socket rsrc_nonstatic Sep 18 07:52:33 s1-ams kernel: Pid: 2944, comm: wget Not tainted 2.6.12-1.1398_FC4smp Sep 18 07:52:33 s1-ams kernel: RIP: 0010:[<ffffffff80169c22>] <ffffffff80169c22>{free_pages+210} Sep 18 07:52:33 s1-ams kernel: RSP: 0018:ffff81013f569db0 EFLAGS: 00010216 Sep 18 07:52:33 s1-ams kernel: RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000003 Sep 18 07:52:33 s1-ams kernel: RDX: 0000000000120c00 RSI: 0000000000000000 RDI: ffff810120c00000 Sep 18 07:52:33 s1-ams kernel: RBP: ffff810120c00010 R08: 0000000000000018 R09: 0000000000000000 Sep 18 07:52:33 s1-ams kernel: R10: 0000000000000000 R11: ffffffff80319c70 R12: ffff810120c00010 Sep 18 07:52:33 s1-ams kernel: R13: ffff810120c00000 R14: 0000000000000041 R15: 0000000000000104 Sep 18 07:52:33 s1-ams kernel: FS: 00002aaaaaab6c40(0000) GS:ffffffff8050d800(0000) knlGS:0000000000000000 Sep 18 07:52:33 s1-ams kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 18 07:52:33 s1-ams kernel: CR2: 0000000000004b30 CR3: 0000000137d4f000 CR4: 00000000000006e0 Sep 18 07:52:33 s1-ams kernel: Process wget (pid: 2944, threadinfo ffff81013f568000, task ffff81013ef2e840) Sep 18 07:52:33 s1-ams kernel: Stack: ffffffff8019d9d5 0000000000000008 ffff8101366fc7c0 0000000000000010 Sep 18 07:52:33 s1-ams kernel: 0000000000000004 000000000000003c ffffffff8019df7b 412262b752f1a9fc Sep 18 07:52:33 s1-ams kernel: ffff81013f569f40 ffff81013f569f08 Sep 18 07:52:33 s1-ams kernel: Call Trace:<ffffffff8019d9d5>{poll_freewait+85} <ffffffff8019df7b>{do_select+1179} Sep 18 07:52:33 s1-ams kernel: <ffffffff8019d9f0>{__pollwait+0} <ffffffff8019e5cd>{sys_select+637} Sep 18 07:52:33 s1-ams kernel: <ffffffff8010ebf6>{tracesys+209} Sep 18 07:52:33 s1-ams kernel: Sep 18 07:52:33 s1-ams kernel: Code: 49 8b 89 30 4b 00 00 48 39 ca 72 49 48 b8 ff ff ff 7f ff ff Sep 18 07:52:33 s1-ams kernel: RIP <ffffffff80169c22>{free_pages+210} RSP <ffff81013f569db0> Sep 18 07:52:33 s1-ams kernel: CR2: 0000000000004b30 Sep 18 07:52:34 s1-ams kernel: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Sep 18 07:52:34 s1-ams kernel: in_atomic():0, irqs_disabled():1 Sep 18 07:52:34 s1-ams kernel: Sep 18 07:52:34 s1-ams kernel: Call Trace:<ffffffff8013abd5>{profile_task_exit+21} <ffffffff8013bff2>{do_exit+34} Sep 18 07:52:34 s1-ams kernel: <ffffffff80265f18>{do_unblank_screen+40} <ffffffff80124286>{do_page_fault+1926} Sep 18 07:52:34 s1-ams kernel: <ffffffff8035ac32>{thread_return+0} <ffffffff8010f5b5>{error_exit+0} Sep 18 07:52:34 s1-ams kernel: <ffffffff80319c70>{tcp_poll+0} <ffffffff80169c22>{free_pages+210} Sep 18 07:52:34 s1-ams kernel: <ffffffff8019d9d5>{poll_freewait+85} <ffffffff8019df7b>{do_select+1179} Sep 18 07:52:34 s1-ams kernel: <ffffffff8019d9f0>{__pollwait+0} <ffffffff8019e5cd>{sys_select+637} Sep 18 07:52:34 s1-ams kernel: <ffffffff8010ebf6>{tracesys+209} Will try 1455+ in a minute. Regards, Vladimir Kangin
Dear Warren, Sep 18 08:14:28 s1-ams kernel: Unable to handle kernel paging request at 0000000000004b30 RIP: Sep 18 08:14:29 s1-ams kernel: <ffffffff8016239a>{free_pages+210} Sep 18 08:14:29 s1-ams kernel: PGD 13fa2e067 PUD 1382b5067 PMD 0 Sep 18 08:14:29 s1-ams kernel: Oops: 0000 [1] SMP The same problem but system last bit longer :) 13% out of DVD iso file. Please advice, Vladimir Kangin
Please retry with the latest kernel errata released yesterday (2.6.12-1.1456_FC4) There's also a more experimental newer test kernel at http://people.redhat.com/davej/kernels/Fedora/FC4/
Dear Dave, The same problem with kernel-smp 2.6.12-1.1456_FC4 ;( Vladimir Kangin
Mass update to all FC4 bugs: An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream kernel (2.6.13.2). As there were ~3500 changes upstream between this and the previous kernel, it's possible your bug has been fixed already. Please retest with this update, and update this bug if necessary. Thanks.
I saw this bug on a couple new dual opteron systems today. One system was running a stock FC4 kernel and the other was running a 2.6.11-1.35 FC3 kernel and also it happens on the FC2 kernel as well. The unique thing was that both systems had 2 sticks of 2G RAM on the first CPU. The systems do not crash in FC3 and FC4 when I use "numa=off" but they still crash in FC2 when I use that. The systems do not crash when one stick of RAM is on the first CPU and one is on the second CPU. The systems do not crash when I run 2,4 or 6 sticks of 1G RAM. I did try to install the newest errata kernel but it failed for an unrelated reason and I didn't have time to debug it. I searched through bugzilla and bug 165285, and bug 168907 also have a similar error message with the where it failed paging at 0000000000004b30. I'll try figure out the issues with the errata kernel tomorrow but I just wanted to post this before I forgot. Could the other people seeing this bug post what RAM they were using?
Hi Dan, It is a 2 sticks of 2G RAM on the first CPU. The memory sticks from Kingston are following: KVR400D4R3A/2G @GB PC3200 REG CL3 ECC 183 - Pin DIMM I will try to use "numa=off" on one of the system for test.
Rock on. This bug reminds me of bug 160135. In bug 160135 the error address was 00000000000018f0. The comment in #13 says "the crash here is happening in 'pfn_to_page' which is called by virt_to_page. It seems that NODE_DATA(nid) is NULL and thus node_start_pfn is NULL." A null nid would probably explains all the leading zeros in this bug as well. The comment #66 from 160135 is someone with our bug here. ;) Anyway... It might already be fixed in the newest errata kernel if I can make that work.
Hi Dan, with numa=off the kernel is stable! What is the impact of this parametr to kernel performance? Vladimir
2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing per previous comment.