Description of problem: machine freezes (crashes) directly after/during boot, or a couple of minutes after boot. Sometimes I see BUG and/or calltraces printed on screen. Usually nothing on the logs.. but I found one BUG from the logs (attached) for 1.2937. Version-Release number of selected component (if applicable): kernel-xen-2.6.20-1.2933.fc6.i686.rpm Same machine worked well with fc6 2.6.18 and 2.6.19 kernels. How reproducible: Always. Steps to Reproduce: 1. Install kernel-xen-2.6.20-1.2933.fc6.i686.rpm 2. Reboot 3. Wait Actual results: Screen is frozen (but you can see the picture), nothing happens, keyboard doesn't work, network doesn't work, system is crashed. Additional info: I also tried test kernel 1.2937 for fc6.. with that kernel the machine reboots itself instead of freezing.. Machine was/is idle during these tests. Only process usually running is md-raid1 reconstruction (because of the crashes). Hardware is Intel P4 with i955x chipset, ahci sata disks, md raid1 + lvm, 2 GB RAM, dom0_mem=256M.
Created attachment 150914 [details] kernel-xen-2.6.20-1.2937.fc6.i686.rpm BUG log
I have same problems. Log says nothing. Any hints how can I get more debug-usefull information? -A-
I have same problem after updating kernel-xen to 2.6.20-1.2933.fc6.i686.rpm, domO reboots after boot and domU(s) stuck/unresponsive (have to reboot it) after running for a while. I have restored the previous working kernel i-e 2.6.19-1.2911.6.5.fc6xen on both host and vms. Askar.
I can also confirm the problem. Numerous kernel oops, spurious reboots, entirely unstable.
BUG: unable to handle kernel paging request at virtual address cddd800c printing eip: c0548cbb 0ddd8000 -> *pde = 00000000:72fd9001 0ddd9000 -> *pme = 00000000:03066067 00066000 -> *pte = 00000000:72fd8061 Oops: 0003 [#1] SMP CPU: 1 EIP: 0061:[<c0548cbb>] Not tainted VLI EFLAGS: 00010017 (2.6.20-1.2937.fc6xen #1) EIP is at evtchn_do_upcall+0x55/0x97 eax: 00000006 ebx: 00000000 ecx: cddd7fe4 edx: fffffefa esi: 00000001 edi: f5416000 ebp: fffffffe esp: cddd7fc4 ds: 007b es: 007b ss: 0069 Process modprobe (pid: 856, ti=cddd7000 task=cdeb4930 task.ti=cddd7000) Stack: 00000000 00000001 00000000 cddd7fac cddd7fe4 cddd7000 c0404ff2 cddd7fe4 00876402 00000073 00000212 bfd055d8 0000007b 00000000 00000000 Call Trace: [<c0404ff2>] hypervisor_callback+0x46/0x50 ======================= Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c f7 d2 <89> 51 28 89 c8 e8 f0 da eb ff eb 05 e8 22 2e 00 00 8b 44 24 04 It's blowing up at drivers/xen/core/evtchn.c:235: do_IRQ(irq, regs); which is a macro -- the part that dies exapands to: (regs)->orig_eax = ~(irq); regs is 0xcddd7fe4 which is too close to the end of stack and regs->orig_eax is beyond the stack by 12 bytes. How in the world that happened I can't say. Maybe the critical region fixup is broken?
I am seeing a slightly different traceback (2933 kernel) which may be the same problem Mar 25 17:13:52 xenda kernel: iret exception: 0000 [#5] Mar 25 17:13:52 xenda kernel: SMP Mar 25 17:13:52 xenda kernel: last sysfs file: /block/ram0/range Mar 25 17:13:52 xenda kernel: Modules linked in: nfs lockd nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc xennet ipv6 dm_mirror dm_multipath dm_mod parport_pc lp parport pcspkr xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd Mar 25 17:13:52 xenda kernel: CPU: 0 Mar 25 17:13:52 xenda kernel: EIP: 0000:[<45fb2d2a>] Not tainted VLI Mar 25 17:13:52 xenda kernel: EFLAGS: 00010000 (2.6.20-1.2933.fc6xen #1) Mar 25 17:13:52 xenda kernel: EIP is at 0x45fb2d2a Mar 25 17:13:52 xenda kernel: eax: 00000000 ebx: 008fb402 ecx: 00000073 edx: 00000286 Mar 25 17:13:52 xenda kernel: esi: bfb75ce8 edi: 0000007b ebp: 00000000 esp: ce90601c Mar 25 17:13:52 xenda kernel: ds: 0000 es: 0000 ss: 0069 Mar 25 17:13:52 xenda kernel: Process cfagent (pid: 5814, ti=ce905000 task=c03a6170 task.ti=ce905000) Mar 25 17:13:52 xenda kernel: Stack: 00000170 00000000 00000000 00067047 00067049 0006704b 0006704c 0006704d Mar 25 17:13:52 xenda kernel: 0006704e 00067051 00067063 00067104 000671e9 000671ea 000671eb 000671ec Mar 25 17:13:52 xenda kernel: 00000000 00000000 ac447517 00060fdd 00000000 00000000 00000000 00000000 Mar 25 17:13:52 xenda kernel: Call Trace: Mar 25 17:13:52 xenda kernel: BUG: unable to handle kernel paging request at virtual address 0006704c Mar 25 17:13:52 xenda kernel: printing eip: Mar 25 17:13:52 xenda kernel: c04055c2 Mar 25 17:13:52 xenda kernel: 0376b000 -> *pde = 00000000:0901d001 Mar 25 17:13:52 xenda kernel: 094e5000 -> *pme = 00000000:00000000 Mar 25 17:13:52 xenda kernel: Oops: 0000 [#6] Mar 25 17:13:52 xenda kernel: SMP Mar 25 17:13:52 xenda kernel: last sysfs file: /block/ram0/range Mar 25 17:13:52 xenda kernel: Modules linked in: nfs lockd nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc xennet ipv6 dm_mirror dm_multipath dm_mod parport_pc lp parport pcspkr xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd Mar 25 17:13:52 xenda kernel: CPU: 0 Mar 25 17:13:52 xenda kernel: EIP: 0061:[<c04055c2>] Not tainted VLI Mar 25 17:13:52 xenda kernel: EFLAGS: 00010093 (2.6.20-1.2933.fc6xen #1) Mar 25 17:13:52 xenda kernel: EIP is at dump_trace+0x5c/0x93 Mar 25 17:13:52 xenda kernel: eax: 00067ffd ebx: 0006704c ecx: 039b779e edx: 00ab5f00 Mar 25 17:13:52 xenda kernel: esi: 00000000 edi: 00067000 ebp: c0693fce esp: ce905e7c Mar 25 17:13:52 xenda kernel: ds: 007b es: 007b ss: 0069 Mar 25 17:13:52 xenda kernel: Process cfagent (pid: 5814, ti=ce905000 task=c03a6170 task.ti=ce905000) Mar 25 17:13:52 xenda kernel: Stack: c0693e8e c0693fce 00000018 00000000 c0693fce c0405611 c06e44e0 c0693fce Mar 25 17:13:52 xenda kernel: ce90607f c04056c0 c0693fce c0693fce ce905fe4 ce90601c 00000002 00010000 Mar 25 17:13:52 xenda kernel: ce905fe4 ce90601c c0405856 c0693fce 00000010 c03a6304 000016b6 ce905000 Mar 25 17:13:52 xenda kernel: Call Trace: Mar 25 17:13:52 xenda kernel: [<c0405611>] show_trace_log_lvl+0x18/0x2c Mar 25 17:13:52 xenda kernel: [<c04056c0>] show_stack_log_lvl+0x9b/0xa3 Mar 25 17:13:52 xenda kernel: [<c0405856>] show_registers+0x18e/0x25d Mar 25 17:13:52 xenda kernel: [<c0613405>] notifier_call_chain+0x19/0x29 Mar 25 17:13:52 xenda kernel: [<c0405a58>] die+0x133/0x22f Mar 25 17:13:52 xenda kernel: [<c0406302>] do_iret_error+0xa7/0xb1 Mar 25 17:13:52 xenda kernel: [<c0404e92>] restore_nocheck_notrace+0x7/0xf Mar 25 17:13:52 xenda kernel: [<c0404e93>] restore_nocheck_notrace+0x8/0xf Mar 25 17:13:52 xenda kernel: [<c0404e94>] restore_nocheck_notrace+0x9/0xf Mar 25 17:13:52 xenda kernel: [<c0404e99>] restore_nocheck_notrace+0xe/0xf Mar 25 17:13:52 xenda kernel: [<c042c063>] search_exception_tables+0x14/0x25 Mar 25 17:13:52 xenda kernel: [<c04144ef>] fixup_exception+0xb/0x20 Mar 25 17:13:52 xenda kernel: [<c0611b45>] do_general_protection+0x11c/0x16f Mar 25 17:13:52 xenda kernel: [<c04068d1>] do_IRQ+0xc6/0xdd Mar 25 17:13:52 xenda kernel: [<c0611a29>] do_general_protection+0x0/0x16f Mar 25 17:13:52 xenda kernel: [<c040625b>] do_iret_error+0x0/0xb1 Mar 25 17:13:52 xenda kernel: [<c061162d>] error_code+0x35/0x3c Mar 25 17:13:52 xenda kernel: ======================= Mar 25 17:13:52 xenda kernel: Code: 9a d4 01 00 00 89 df 81 e7 00 f0 ff ff eb 0e 8b 4c 24 18 89 f2 89 e8 ff 51 08 83 c3 04 39 fb 76 29 8d 87 fd 0f 00 00 39 c3 73 1f <8b> 33 89 f0 e8 61 6a 02 00 85 c0 74 e2 eb d5 8b 4f 34 85 c9 74 Mar 25 17:13:52 xenda kernel: EIP: [<c04055c2>] dump_trace+0x5c/0x93 SS:ESP 0069:ce905e7c Mar 25 17:13:52 xenda kernel: <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 Mar 25 17:13:52 xenda kernel: in_atomic():0, irqs_disabled():1 Mar 25 17:13:52 xenda kernel: [<c043059a>] down_read+0x12/0x28 Mar 25 17:13:52 xenda kernel: [<c0438c0a>] acct_collect+0x38/0x13e Mar 25 17:13:52 xenda kernel: [<c041fe2b>] do_exit+0x1b1/0x6f6 Mar 25 17:13:52 xenda kernel: [<c0405b2f>] die+0x20a/0x22f Mar 25 17:13:52 xenda kernel: [<c061326f>] do_page_fault+0xab1/0xc2e Mar 25 17:13:52 xenda kernel: [<c06114ff>] _spin_unlock_irqrestore+0x8/0x16 Mar 25 17:13:52 xenda kernel: [<c06114ff>] _spin_unlock_irqrestore+0x8/0x16 Mar 25 17:13:52 xenda kernel: [<c041d829>] release_console_sem+0x192/0x1d1 Mar 25 17:13:52 xenda kernel: [<c041de9a>] vprintk+0x2de/0x2e8 Mar 25 17:13:52 xenda kernel: [<c06127be>] do_page_fault+0x0/0xc2e Mar 25 17:13:52 xenda kernel: [<c061162d>] error_code+0x35/0x3c Mar 25 17:13:52 xenda kernel: [<c04100d8>] MPBIOS_trigger+0x4b/0xbc Mar 25 17:13:52 xenda kernel: [<c04055c2>] dump_trace+0x5c/0x93 Mar 25 17:13:52 xenda kernel: [<c0405611>] show_trace_log_lvl+0x18/0x2c Mar 25 17:13:52 xenda kernel: [<c04056c0>] show_stack_log_lvl+0x9b/0xa3 Mar 25 17:13:52 xenda kernel: [<c0405856>] show_registers+0x18e/0x25d Mar 25 17:13:52 xenda kernel: [<c0613405>] notifier_call_chain+0x19/0x29 Mar 25 17:13:52 xenda kernel: [<c0405a58>] die+0x133/0x22f Mar 25 17:13:52 xenda kernel: [<c0406302>] do_iret_error+0xa7/0xb1 Mar 25 17:13:52 xenda kernel: [<c0404e92>] restore_nocheck_notrace+0x7/0xf Mar 25 17:13:52 xenda kernel: [<c0404e93>] restore_nocheck_notrace+0x8/0xf Mar 25 17:13:52 xenda kernel: [<c0404e94>] restore_nocheck_notrace+0x9/0xf Mar 25 17:13:52 xenda kernel: [<c0404e99>] restore_nocheck_notrace+0xe/0xf Mar 25 17:13:52 xenda kernel: [<c042c063>] search_exception_tables+0x14/0x25 Mar 25 17:13:52 xenda kernel: [<c04144ef>] fixup_exception+0xb/0x20 Mar 25 17:13:52 xenda kernel: [<c0611b45>] do_general_protection+0x11c/0x16f Mar 25 17:13:52 xenda kernel: [<c04068d1>] do_IRQ+0xc6/0xdd Mar 25 17:13:52 xenda kernel: [<c0611a29>] do_general_protection+0x0/0x16f Mar 25 17:13:52 xenda kernel: [<c040625b>] do_iret_error+0x0/0xb1 Mar 25 17:13:52 xenda kernel: [<c061162d>] error_code+0x35/0x3c Mar 25 17:13:52 xenda kernel: =======================
Comment #5 is from kernel 2937 with the March 21 update applied. So that bug is still unfixed.
I can also confirm the behaviour. It is irrlevant of the architecture as well. I see the behaviour on 32 bit as well as 64 bit machines. 32 bit is Intel CPU and 64 bit is AMD Athlon CPU. Kernel I noticed it on is kernel-xen-2.6.20-1.2933
(In reply to comment #7) > Comment #5 is from kernel 2937 with the March 21 update applied. > > So that bug is still unfixed. > I don't see any acknowledgment from any fedora dev people.
Michael's logs are similar to mine. 2.6.20-1.2933.fc6xen is completely unstable. Reboots spontaneously, sometimes after a few minutes, sometimes a few hours.
I'm seeing similar issues as well, with FC5 kernel 2307. The 'at evtchn_do_upcall' bit in Comment #5 From Chuck Ebbert looks familair. I don't have any logs, because I'm still working to get the machine back up under a previous kernel (fsck on several large filesystems).
Created attachment 151652 [details] syslog of crashes from FC5 2307 xen0 kernel
The symptoms described here look different, but they may have the same cause of bug #233937
Created attachment 151660 [details] System log excerpts
I've backleveled to 2.6.19-1.2911.6.5.fc6xen and my problems have gone away. Completely stable again. I've sent SOME of the errors from my system log in the previous attachment - sorry, I'm new to this Bugzilla thing. All I see in the logs are these errors, and reboots. Sometimes they coincide, sometimes not.
*** Bug 235313 has been marked as a duplicate of this bug. ***
Created attachment 151820 [details] Oops on Asus A7M-266D SMP motherboard. Here is one of the Oopses during an X crash that the system survived. (But not for long, it hung the system a minute later)
I'm getting pretty much the same thing on fc5 2307 xen0, but it's not freezing or rebooting. Here is my Oops: xen kernel: iret exception: 0000 [#2] xen kernel: SMP xen kernel: CPU: 1 xen kernel: EIP: 2868:[<e8000000>] Not tainted VLI xen kernel: EFLAGS: 082444c7 (2.6.20-1.2307.fc5xen0 #1) xen kernel: EIP is at 0xe8000000 xen kernel: eax: 00000000 ebx: 00e29aa1 ecx: 00000073 edx: 00310246 xen kernel: esi: b6e54c98 edi: 0000007b ebp: 00000000 esp: c84be01c xen kernel: ds: 0000 es: 0000 ss: 0069 xen kernel: Process beagle-build-in (pid: 11390, ti=c84bd000 task=f3f15930 task.ti=c84bd000) xen kernel: Stack: 08387e20 042444c7 08299a30 0e2404c7 e8000000 0017284c 299ae0b8 2444c708 xen kernel: 3876000c 24448908 2444c708 2fc9df04 2404c708 00000000 174927e8 9a30b800 xen kernel: 44c70829 76000c24 44890838 44c70824 c9e20424 04c7082f 00000024 4902e800 xen kernel: Call Trace: xen kernel: Oops: 0000 [#3] xen kernel: SMP xen kernel: CPU: 1 xen kernel: EIP: 0061:[<c1005562>] Not tainted VLI xen kernel: EFLAGS: 00310097 (2.6.20-1.2307.fc5xen0 #1) xen kernel: EIP is at dump_trace+0x5c/0x93 xen kernel: eax: 299aeffd ebx: 299ae0b8 ecx: 00d5ab89 edx: 004ce880 xen kernel: esi: 082fcd95 edi: 299ae000 ebp: c125b39e esp: c84bde7c xen kernel: ds: 007b es: 007b ss: 0069 xen kernel: Process beagle-build-in (pid: 11390, ti=c84bd000 task=f3f15930 task.ti=c84bd000) xen kernel: Stack: c125b25e c125b39e 00000018 00000000 c125b39e c10055b1 c12af4e0 c125b39e xen kernel: c84be07f c1005660 c125b39e c125b39e c84bdfe4 c84be01c 00000002 082444c7 xen kernel: c84bdfe4 c84be01c c10057f6 c125b39e 00000010 f3f15adc 00002c7e c84bd000 xen kernel: Call Trace: xen kernel: [<c10055b1>] show_trace_log_lvl+0x18/0x2c xen kernel: [<c1005660>] show_stack_log_lvl+0x9b/0xa3 xen kernel: [<c10057f6>] show_registers+0x18e/0x25d xen kernel: [<c1216dbc>] notifier_call_chain+0x19/0x29 xen kernel: [<c10059f8>] die+0x133/0x22f xen kernel: [<c10062ab>] do_iret_error+0xa7/0xb1 xen kernel: [<c1004e2a>] restore_nocheck_notrace+0x7/0xf xen kernel: [<c1004e2b>] restore_nocheck_notrace+0x8/0xf xen kernel: [<c1004e2c>] restore_nocheck_notrace+0x9/0xf xen kernel: [<c1004e31>] restore_nocheck_notrace+0xe/0xf xen kernel: [<c102f487>] search_exception_tables+0x14/0x25 xen kernel: [<c101739f>] fixup_exception+0xb/0x20 xen kernel: [<c12158e5>] do_general_protection+0x11c/0x16f xen kernel: [<c1006879>] do_IRQ+0xc6/0xdd xen kernel: [<c12157c9>] do_general_protection+0x0/0x16f xen kernel: [<c1006204>] do_iret_error+0x0/0xb1 xen kernel: [<c12153cd>] error_code+0x35/0x3c xen kernel: ======================= xen kernel: Code: 9a f4 01 00 00 89 df 81 e7 00 f0 ff ff eb 0e 8b 4c 24 18 89 f2 89 e8 ff 51 08 83 c3 04 39 fb 76 29 8d 87 fd 0f 00 00 39 c3 73 1f <8b> 33 89 f0 e8 e5 9e 02 00 85 c0 74 e2 eb d5 8b 4f 34 85 c9 74 xen kernel: EIP: [<c1005562>] dump_trace+0x5c/0x93 SS:ESP 0069:c84bde7c
Also seeing similar problems. Adding the info just in case it helps QA catch problems like this before unstable kernels make it to the general public. BUG: unable to handle kernel paging request at virtual address e1b2800c printing eip: c0548ceb 22c7a000 -> *pde = 00000001:1fa6f001 21c6f000 -> *pme = 00000000:06103067 00103000 -> *pte = 80000001:1fd28061 Oops: 0003 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: nfsd exportfs lockd nfs_acl sunrpc ipv6 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_m ultipath video sbs i2c_ec i2c_core dock button battery asus_acpi backlight ac pa rport_pc lp parport sg ata_piix libata pcspkr ide_cd bnx2 serio_raw cdrom serial _core dm_snapshot dm_zero dm_mirror dm_mod megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0061:[<c0548ceb>] Not tainted VLI EFLAGS: 00010017 (2.6.20-1.2933.fc6xen #1) EIP is at evtchn_do_upcall+0x55/0x97 eax: 00000018 ebx: 00000000 ecx: e1b27fe4 edx: ffffffef esi: 00000001 edi: f5416000 ebp: fffffffe esp: e1b27fc4 ds: 007b es: 007b ss: 0069 Process sshd (pid: 6621, ti=e1b27000 task=ed7351b0 task.ti=e1b27000) Stack: 00000000 00000000 00000009 e1b27fac e1b27fe4 e1b27000 c0404ff2 e1b27fe4 00cf1402 00000073 00000246 bff5c40c 0000007b 00000000 00000000 Call Trace: [<c0404ff2>] hypervisor_callback+0x46/0x50 ======================= Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c f7 d2 <89> 51 28 89 c8 e8 16 db eb ff eb 05 e8 22 2e 00 00 8b 44 24 04 EIP: [<c0548ceb>] evtchn_do_upcall+0x55/0x97 SS:ESP 0069:e1b27fc4 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c043059a>] down_read+0x12/0x28 [<c0438c0a>] acct_collect+0x38/0x13e [<c041fe2b>] do_exit+0x1b1/0x6f6 [<c0405b2f>] die+0x20a/0x22f [<c061326f>] do_page_fault+0xab1/0xc2e [<c042daad>] autoremove_wake_function+0x0/0x35 [<c06127be>] do_page_fault+0x0/0xc2e [<c061162d>] error_code+0x35/0x3c [<c0548ceb>] evtchn_do_upcall+0x55/0x97 [<c0404ff2>] hypervisor_callback+0x46/0x50 =======================
- Just to add another person to the list of problems... I have 6 machines... all different in terms of processor/memory/disks... as well as being in different locations... They ALL are suffering from the problems that others have mentioned above... ------------------------- kernel 2911 works... ------------------------- kernel 2933 Suffers from the problems above... kernel 2944 Suffers from the problems above... ------------------------- ============== Apr 15 20:17:35 www kernel: Linux version 2.6.20-1.2944.fc6xen (brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #1 SMP Tue Apr 10 19:12:19 EDT 2007 ============== Apr 15 20:17:59 www kernel: BUG: unable to handle kernel paging request at virtual address e8c2b00c Apr 15 20:17:59 www kernel: printing eip: Apr 15 20:17:59 www kernel: c054936b Apr 15 20:17:59 www kernel: 293a3000 -> *pde = 00000000:56dfc001 Apr 15 20:17:59 www kernel: 297fc000 -> *pme = 00000000:0313e067 Apr 15 20:17:59 www kernel: 0013e000 -> *pte = 80000000:5762b061 Apr 15 20:17:59 www kernel: Oops: 0003 [#1] Apr 15 20:17:59 www kernel: SMP Apr 15 20:17:59 www kernel: last sysfs file: /devices/pci0000:00/0000:00:1c.1/0000:04:00.0/irq Apr 15 20:17:59 www kernel: Modules linked in: autofs4 hidp l2cap bluetooth sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_ftp nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables dm_multipath video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 lp floppy sg pcspkr iTCO_wdt iTCO_vendor_support tg3 i2c_i801 ide_cd i2c_core parport_pc parport serial_core cdrom dm_snapshot dm_zero dm_mirror dm_mod ahci ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Apr 15 20:17:59 www kernel: CPU: 0 Apr 15 20:17:59 www kernel: EIP: 0061:[<c054936b>] Not tainted VLI Apr 15 20:17:59 www kernel: EFLAGS: 00010013 (2.6.20-1.2944.fc6xen #1) Apr 15 20:17:59 www kernel: EIP is at evtchn_do_upcall+0x55/0x97 Apr 15 20:17:59 www kernel: eax: 00000001 ebx: 00000000 ecx: e8c2afe4 edx: fffffeff Apr 15 20:17:59 www kernel: esi: 00000001 edi: f5416000 ebp: fffffffe esp: e8c2afc4 Apr 15 20:17:59 www kernel: ds: 007b es: 007b ss: 0069 Apr 15 20:17:59 www kernel: Process MailScanner (pid: 2885, ti=e8c2a000 task=ea092df0 task.ti=e8c2a000) Apr 15 20:17:59 www kernel: Stack: 00000000 00000000 0b86b000 e8c2afac e8c2afe4 e8c2a000 c0404ff2 e8c2afe4 Apr 15 20:17:59 www kernel: 009f1402 00000073 00000212 bfc0b2dc 0000007b 00000000 00000000 Apr 15 20:17:59 www kernel: Call Trace: Apr 15 20:17:59 www kernel: [<c0404ff2>] hypervisor_callback+0x46/0x50 Apr 15 20:17:59 www kernel: ======================= Apr 15 20:17:59 www kernel: Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c f7 d2 <89> 51 28 89 c8 e8 40 d4 eb ff eb 05 e8 22 2e 00 00 8b 44 24 04 Apr 15 20:17:59 www kernel: EIP: [<c054936b>] evtchn_do_upcall+0x55/0x97 SS:ESP 0069:e8c2afc4 Apr 15 20:17:59 www kernel: <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 Apr 15 20:17:59 www kernel: in_atomic():0, irqs_disabled():1 Apr 15 20:18:00 www kernel: [<c04303e6>] down_read+0x12/0x28 Apr 15 20:18:02 www kernel: [<c0438a56>] acct_collect+0x38/0x13e Apr 15 20:18:02 www kernel: [<c041fc77>] do_exit+0x1b1/0x6f6 Apr 15 20:18:02 www kernel: [<c0405b2f>] die+0x20a/0x22f Apr 15 20:18:03 www kernel: [<c061396f>] do_page_fault+0xab1/0xc2e Apr 15 20:18:03 www kernel: [<c0613625>] do_page_fault+0x767/0xc2e Apr 15 20:18:03 www kernel: [<c0457f4d>] vma_merge+0xfd/0x19a Apr 15 20:18:04 www kernel: [<c04583c5>] do_brk+0x169/0x212 Apr 15 20:18:04 www kernel: [<c0612ebe>] do_page_fault+0x0/0xc2e Apr 15 20:18:04 www kernel: [<c0611d2d>] error_code+0x35/0x3c Apr 15 20:18:04 www kernel: [<c054936b>] evtchn_do_upcall+0x55/0x97 Apr 15 20:18:04 www kernel: [<c0404ff2>] hypervisor_callback+0x46/0x50 Apr 15 20:18:04 www kernel: =======================
2944 didn't resolve the issue for me, either. (In reply to comment #20) > - > Just to add another person to the list of problems... > > I have 6 machines... all different in terms of processor/memory/disks... as > well as being in different locations... They ALL are suffering from the > problems that others have mentioned above... > > ------------------------- > > kernel 2911 works... > > ------------------------- > > kernel 2933 Suffers from the problems above... > kernel 2944 Suffers from the problems above... > > ------------------------- > > ============== > Apr 15 20:17:35 www kernel: Linux version 2.6.20-1.2944.fc6xen > (brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat > 4.1.1-51)) #1 SMP Tue Apr 10 19:12:19 EDT 2007 > ============== > > Apr 15 20:17:59 www kernel: BUG: unable to handle kernel paging request at > virtual address e8c2b00c > Apr 15 20:17:59 www kernel: printing eip: > Apr 15 20:17:59 www kernel: c054936b > Apr 15 20:17:59 www kernel: 293a3000 -> *pde = 00000000:56dfc001 > Apr 15 20:17:59 www kernel: 297fc000 -> *pme = 00000000:0313e067 > Apr 15 20:17:59 www kernel: 0013e000 -> *pte = 80000000:5762b061 > Apr 15 20:17:59 www kernel: Oops: 0003 [#1] > Apr 15 20:17:59 www kernel: SMP > Apr 15 20:17:59 www kernel: last sysfs > file: /devices/pci0000:00/0000:00:1c.1/0000:04:00.0/irq > Apr 15 20:17:59 www kernel: Modules linked in: autofs4 hidp l2cap bluetooth > sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp > libiscsi scsi_transport_iscsi nf_conntrack_ftp nf_conntrack_netbios_ns > ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter > ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables > dm_multipath video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 > lp floppy sg pcspkr iTCO_wdt iTCO_vendor_support tg3 i2c_i801 ide_cd i2c_core > parport_pc parport serial_core cdrom dm_snapshot dm_zero dm_mirror dm_mod ahci > ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > Apr 15 20:17:59 www kernel: CPU: 0 > Apr 15 20:17:59 www kernel: EIP: 0061:[<c054936b>] Not tainted VLI > Apr 15 20:17:59 www kernel: EFLAGS: 00010013 (2.6.20-1.2944.fc6xen #1) > Apr 15 20:17:59 www kernel: EIP is at evtchn_do_upcall+0x55/0x97 > Apr 15 20:17:59 www kernel: eax: 00000001 ebx: 00000000 ecx: e8c2afe4 > edx: fffffeff > Apr 15 20:17:59 www kernel: esi: 00000001 edi: f5416000 ebp: fffffffe > esp: e8c2afc4 > Apr 15 20:17:59 www kernel: ds: 007b es: 007b ss: 0069 > Apr 15 20:17:59 www kernel: Process MailScanner (pid: 2885, ti=e8c2a000 > task=ea092df0 task.ti=e8c2a000) > Apr 15 20:17:59 www kernel: Stack: 00000000 00000000 0b86b000 e8c2afac e8c2afe4 > e8c2a000 c0404ff2 e8c2afe4 > Apr 15 20:17:59 www kernel: 009f1402 00000073 00000212 bfc0b2dc 0000007b > 00000000 00000000 > Apr 15 20:17:59 www kernel: Call Trace: > Apr 15 20:17:59 www kernel: [<c0404ff2>] hypervisor_callback+0x46/0x50 > Apr 15 20:17:59 www kernel: ======================= > Apr 15 20:17:59 www kernel: Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 > 04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c > f7 d2 <89> 51 28 89 c8 e8 40 d4 eb ff eb 05 e8 22 2e 00 00 8b 44 24 04 > Apr 15 20:17:59 www kernel: EIP: [<c054936b>] evtchn_do_upcall+0x55/0x97 SS:ESP > 0069:e8c2afc4 > Apr 15 20:17:59 www kernel: <3>BUG: sleeping function called from invalid > context at kernel/rwsem.c:20 > Apr 15 20:17:59 www kernel: in_atomic():0, irqs_disabled():1 > Apr 15 20:18:00 www kernel: [<c04303e6>] down_read+0x12/0x28 > Apr 15 20:18:02 www kernel: [<c0438a56>] acct_collect+0x38/0x13e > Apr 15 20:18:02 www kernel: [<c041fc77>] do_exit+0x1b1/0x6f6 > Apr 15 20:18:02 www kernel: [<c0405b2f>] die+0x20a/0x22f > Apr 15 20:18:03 www kernel: [<c061396f>] do_page_fault+0xab1/0xc2e > Apr 15 20:18:03 www kernel: [<c0613625>] do_page_fault+0x767/0xc2e > Apr 15 20:18:03 www kernel: [<c0457f4d>] vma_merge+0xfd/0x19a > Apr 15 20:18:04 www kernel: [<c04583c5>] do_brk+0x169/0x212 > Apr 15 20:18:04 www kernel: [<c0612ebe>] do_page_fault+0x0/0xc2e > Apr 15 20:18:04 www kernel: [<c0611d2d>] error_code+0x35/0x3c > Apr 15 20:18:04 www kernel: [<c054936b>] evtchn_do_upcall+0x55/0x97 > Apr 15 20:18:04 www kernel: [<c0404ff2>] hypervisor_callback+0x46/0x50 > Apr 15 20:18:04 www kernel: ======================= > (In reply to comment #20) > - > Just to add another person to the list of problems... > > I have 6 machines... all different in terms of processor/memory/disks... as > well as being in different locations... They ALL are suffering from the > problems that others have mentioned above... > > ------------------------- > > kernel 2911 works... > > ------------------------- > > kernel 2933 Suffers from the problems above... > kernel 2944 Suffers from the problems above... > > ------------------------- > > ============== > Apr 15 20:17:35 www kernel: Linux version 2.6.20-1.2944.fc6xen > (brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat > 4.1.1-51)) #1 SMP Tue Apr 10 19:12:19 EDT 2007 > ============== > > Apr 15 20:17:59 www kernel: BUG: unable to handle kernel paging request at > virtual address e8c2b00c > Apr 15 20:17:59 www kernel: printing eip: > Apr 15 20:17:59 www kernel: c054936b > Apr 15 20:17:59 www kernel: 293a3000 -> *pde = 00000000:56dfc001 > Apr 15 20:17:59 www kernel: 297fc000 -> *pme = 00000000:0313e067 > Apr 15 20:17:59 www kernel: 0013e000 -> *pte = 80000000:5762b061 > Apr 15 20:17:59 www kernel: Oops: 0003 [#1] > Apr 15 20:17:59 www kernel: SMP > Apr 15 20:17:59 www kernel: last sysfs > file: /devices/pci0000:00/0000:00:1c.1/0000:04:00.0/irq > Apr 15 20:17:59 www kernel: Modules linked in: autofs4 hidp l2cap bluetooth > sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp > libiscsi scsi_transport_iscsi nf_conntrack_ftp nf_conntrack_netbios_ns > ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter > ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables > dm_multipath video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 > lp floppy sg pcspkr iTCO_wdt iTCO_vendor_support tg3 i2c_i801 ide_cd i2c_core > parport_pc parport serial_core cdrom dm_snapshot dm_zero dm_mirror dm_mod ahci > ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > Apr 15 20:17:59 www kernel: CPU: 0 > Apr 15 20:17:59 www kernel: EIP: 0061:[<c054936b>] Not tainted VLI > Apr 15 20:17:59 www kernel: EFLAGS: 00010013 (2.6.20-1.2944.fc6xen #1) > Apr 15 20:17:59 www kernel: EIP is at evtchn_do_upcall+0x55/0x97 > Apr 15 20:17:59 www kernel: eax: 00000001 ebx: 00000000 ecx: e8c2afe4 > edx: fffffeff > Apr 15 20:17:59 www kernel: esi: 00000001 edi: f5416000 ebp: fffffffe > esp: e8c2afc4 > Apr 15 20:17:59 www kernel: ds: 007b es: 007b ss: 0069 > Apr 15 20:17:59 www kernel: Process MailScanner (pid: 2885, ti=e8c2a000 > task=ea092df0 task.ti=e8c2a000) > Apr 15 20:17:59 www kernel: Stack: 00000000 00000000 0b86b000 e8c2afac e8c2afe4 > e8c2a000 c0404ff2 e8c2afe4 > Apr 15 20:17:59 www kernel: 009f1402 00000073 00000212 bfc0b2dc 0000007b > 00000000 00000000 > Apr 15 20:17:59 www kernel: Call Trace: > Apr 15 20:17:59 www kernel: [<c0404ff2>] hypervisor_callback+0x46/0x50 > Apr 15 20:17:59 www kernel: ======================= > Apr 15 20:17:59 www kernel: Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 > 04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c > f7 d2 <89> 51 28 89 c8 e8 40 d4 eb ff eb 05 e8 22 2e 00 00 8b 44 24 04 > Apr 15 20:17:59 www kernel: EIP: [<c054936b>] evtchn_do_upcall+0x55/0x97 SS:ESP > 0069:e8c2afc4 > Apr 15 20:17:59 www kernel: <3>BUG: sleeping function called from invalid > context at kernel/rwsem.c:20 > Apr 15 20:17:59 www kernel: in_atomic():0, irqs_disabled():1 > Apr 15 20:18:00 www kernel: [<c04303e6>] down_read+0x12/0x28 > Apr 15 20:18:02 www kernel: [<c0438a56>] acct_collect+0x38/0x13e > Apr 15 20:18:02 www kernel: [<c041fc77>] do_exit+0x1b1/0x6f6 > Apr 15 20:18:02 www kernel: [<c0405b2f>] die+0x20a/0x22f > Apr 15 20:18:03 www kernel: [<c061396f>] do_page_fault+0xab1/0xc2e > Apr 15 20:18:03 www kernel: [<c0613625>] do_page_fault+0x767/0xc2e > Apr 15 20:18:03 www kernel: [<c0457f4d>] vma_merge+0xfd/0x19a > Apr 15 20:18:04 www kernel: [<c04583c5>] do_brk+0x169/0x212 > Apr 15 20:18:04 www kernel: [<c0612ebe>] do_page_fault+0x0/0xc2e > Apr 15 20:18:04 www kernel: [<c0611d2d>] error_code+0x35/0x3c > Apr 15 20:18:04 www kernel: [<c054936b>] evtchn_do_upcall+0x55/0x97 > Apr 15 20:18:04 www kernel: [<c0404ff2>] hypervisor_callback+0x46/0x50 > Apr 15 20:18:04 www kernel: ======================= >
Me too. Lots of reboots. Occasional Oops messages/lockups.
Me too: repeated problems with 2.6.20-1.2944.fc6xen. Non-xen is OK. Apr 16 14:19:25 msslin kernel: iret exception: 0000 [#1] Apr 16 14:19:25 msslin kernel: SMP Apr 16 14:19:25 msslin kernel: last sysfs file: /devices/pci0000:00/0000:00:1e.0/0000:03:00.0/i2c-4/name Apr 16 14:19:25 xxxxxx kernel: Modules linked in: bridge netloop netbk blktap blkbk autofs4 sunrpc nf_conntrack_ftp nf_conntrack _netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables dm_multipat h video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 lp floppy snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm pcspkr parport_pc ohci1394 nvidia(P)(U) i2c_nforce2 snd _mpu401 skge parport snd_mpu401_uart snd_rawmidi ide_cd forcedeth snd_timer cdrom snd_seq_device serio_raw ieee1394 i2c_core ser ial_core ns558 gameport snd_page_alloc snd soundcore dm_snapshot dm_zero dm_mirror dm_mod sata_nv libata sd_mod scsi_mod ext3 jb d ehci_hcd ohci_hcd uhci_hcd Apr 16 14:19:25 xxxxxx kernel: CPU: 0 Apr 16 14:19:25 xxxxxx kernel: EIP: 4688:[<0891467c>] Tainted: P VLI Apr 16 14:19:25 xxxxxx kernel: EFLAGS: 08914694 (2.6.20-1.2944.fc6xen #1) Apr 16 14:19:25 xxxxxx kernel: EIP is at 0x891467c Apr 16 14:19:25 xxxxxx kernel: eax: 00000000 ebx: 0045d402 ecx: 00000073 edx: 00000202 Apr 16 14:19:25 xxxxxx kernel: esi: bfdc6ad8 edi: 0000007b ebp: 00000000 esp: c2df801c Apr 16 14:19:25 xxxxxx kernel: ds: 0000 es: 0000 ss: 0069 Apr 16 14:19:25 xxxxxx kernel: Process firefox-bin (pid: 3692, ti=c2df7000 task=c6cd80b0 task.ti=c2df7000) Apr 16 14:19:25 xxxxxx kernel: Stack: 089146a0 c2123020 c2dfa000 000000a8 c2df80a8 0000001b ffffffff 6d200000 Apr 16 14:19:25 xxxxxx kernel: 00000001 00000002 00000003 00000004 00000005 00000006 00000007 00000008 Apr 16 14:19:25 xxxxxx kernel: 00000009 0000000a 0000000b 0000000c 0000000d 0000000e 0000000f 00000010 Apr 16 14:19:25 xxxxxx kernel: Call Trace: Apr 16 14:19:25 xxxxxx kernel: general protection fault: 0000 [#2] ... etc.
(In reply to comment #23) > Me too: repeated problems with 2.6.20-1.2944.fc6xen. > Non-xen is OK. Could you post or attach the rest of the Oops message? The call trace information can be very useful.
Created attachment 154014 [details] /var/log/messages for 2 crashes from "comment 23" Log to go with "Comment 23" from pjs1
Created attachment 154114 [details] oops log from /var/log/messages
(In reply to comment #26) > Created an attachment (id=154114) [edit] > oops log from /var/log/messages > I mean, this is to show that the problem persists with 2.6.20-1.2948.fc6xen.
Wouldn't it be good to add some normal kernel guys to the CC list? And maybe start doing a divide and conquer of kernels between 2.6.19-1.2911.6.5.fc6xen and kernel-xen-2.6.20-1.2933.fc6 to try to pinoint what change contains the bug? Can anyone at redhat start doing intermediate kernels for us to try? Is there any talk on any lkml list on this bug?
(In reply to comment #28) > Wouldn't it be good to add some normal kernel guys to the CC list? The problem doesn't exist in our non-xen kernel, so it probably is on the xen-specific parts of the code. > > And maybe start doing a divide and conquer of kernels between > 2.6.19-1.2911.6.5.fc6xen and kernel-xen-2.6.20-1.2933.fc6 to try > to pinoint what change contains the bug? Can anyone at redhat start > doing intermediate kernels for us to try? Unfortunately it is not easy to make intermediate kernels because the kernel-xen is a result of merging of 2.6.20 and the xen patch (or, from another point of view, by porting the xen code to the newer kernel), and the bug probably was introduced during the merge process, that is manual. Doing a bissect would require doing re-merge of the xen patch for all the intermediate versions we would want to test, and this is not straightforward and can probably introduce other bugs (or even introduce the same bug during the process for older kernels, and that wouldn't tell us anything about what is the problem with the 2.6.20 xen patch). > > Is there any talk on any lkml list on this bug? > Probably not, as it is very specific to kernel-xen on Fedora.
This bug seems most appropriate to my case. My system (2.6.20-1.2948.fc6xen, Dom0, no DomU) didn't reboot/crashed yet, but had reported following "kernel oops" during idle period (I cannot correlate any activity with that "kernel oops"): iret exception: 0000 [#1] SMP last sysfs file: /class/net/eth0/broadcast Modules linked in: bridge netloop netbk blktap blkbk ipv6 sunrpc xt_limit iptable_filter ip_tables x_tables dm_mirror dm_mod video sbs i2c_ec dock button ba CPU: 0 EIP: 0000:[<00000000>] Not tainted VLI EFLAGS: 00000000 (2.6.20-1.2948.fc6xen #1) EIP is at 0x0 eax: 00000000 ebx: 007543cb ecx: 00000073 edx: 00210246 esi: bfaeff30 edi: 0000007b ebp: 00000000 esp: ebf1001c ds: 0000 es: 0000 ss: 0069 Process awk (pid: 9759, ti=ebf0f000 task=c0a273b0 task.ti=ebf0f000) Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Call Trace: ======================= Code: Bad EIP value. EIP: [<00000000>] 0x0 SS:ESP 0069:ebf1001c <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c04303e6>] down_read+0x12/0x28 [<c0438a56>] acct_collect+0x38/0x13e [<c041fc77>] do_exit+0x1b1/0x6f6 [<c0405b2f>] die+0x20a/0x22f [<c0406302>] do_iret_error+0xa7/0xb1 [<c0404e92>] restore_nocheck_notrace+0x7/0xf [<c0404e94>] restore_nocheck_notrace+0x9/0xf [<c0404e99>] restore_nocheck_notrace+0xe/0xf [<c042beaf>] search_exception_tables+0x14/0x25 [<c041444f>] fixup_exception+0xb/0x20 [<c06122f5>] do_general_protection+0x11c/0x16f [<c040687b>] do_IRQ+0xc6/0xdb [<c06121d9>] do_general_protection+0x0/0x16f [<c040625b>] do_iret_error+0x0/0xb1 [<c0611ddd>] error_code+0x35/0x3c ======================= Let me know if I should start separate bug or this comment is enough. BTW, I've seen request to test 2.6.20-1.2948.fc6xen kernel against this kind of issue on the maillist. I "jumped on a vagon" only recently, thus don't know if any previous build of kernel would crash my system but 2.6.20-1.2948.fc6xen don't (during boot). The 2.6.20-1.2948.fc6xen kernel still produced single "kernel oops" above, while still running (not sure if anything running died as a result). I'll post more if I observe more of the above.
Next interesting thing is that rawhide's kernel-xen-2.6.20-2925.5.fc7 works fine for me. It must be possible base patch to fc6 kernel on fc7's source especially when both kernels are 2.6.20 series, isn't it? -A-
Bleh, crashed when I wrote comment #31 :(
I would have been surprised if 2.6.20-2925.5.fc7 worked, while fc6 failed because they're based on identical Xen merge trees. We have identified a problem with the merge on 32-bit which is definitely responsible for a large number of hangs/crashes. There's a new rawhide kernel which a fix available if you're able to test: http://koji.fedoraproject.org/packages/kernel-xen-2.6/2.6.20/2925.8.fc7/ If it gets resonably positive feedback we'll update fc6 with same patches.
(In reply to comment #33) All test packages are welcomed :) I'm going to tell you my new impressions -A-
As a followup to comment #33, see http://www.google.com/notebook/public/15861144119222811466/BDRgoQgoQkrfw2aci, it has information about next 4 tracebacks of my 2.6.20-1.2948.fc6xen and 6th crash was fatal for my system. I'll let you know if I have any success with 2.6.20-2925.5.fc7.
Myroslav: I assume it was just a typo, but just in case... 2.6.20-2925.5.fc7 does not have the fix - make sure you try 2.6.20-2925.8.fc7 from the link in #33. The tracebacks you posted on google are all consistent with the bug fixed in -2925.8.fc7
I have had the new kernel up for almost two hours, the only problem I saw was a lockdep BUG (bug 239601), but I haven't noticed any effects.
(In reply to comment #37) Same behavior with test kernel. It hangs during boot but when I connect to "hanged" computer and restart X server all looks fine
*** Bug 236461 has been marked as a duplicate of this bug. ***
*** Bug 236471 has been marked as a duplicate of this bug. ***
*** Bug 238852 has been marked as a duplicate of this bug. ***
*** Bug 238403 has been marked as a duplicate of this bug. ***
I have a bug open on this same kernel for crashing on boot also. That bug is https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234283 . So are all these somehow related?
bug 234283 doesn't seem to be related to this bug, as this bug is for a kernel-xen problem, whose cause was found.
(In reply to comment #36) > Myroslav: I assume it was just a typo, but just in case... 2.6.20-2925.5.fc7 > does not have the fix - make sure you try 2.6.20-2925.8.fc7 from the link in > #33. The tracebacks you posted on google are all consistent with the bug fixed > in -2925.8.fc7 I failed to install the 2.6.20-2925.8.fc7 kernel on my FC6 system... I does not have F7 at hand to try it out. If anyone has any hints, how F7 kernel can be installed on FC6 system I can try, post them, or e-mail me in private, please. BTW, in comment #35 I made 2 errors, I meant to followup my comment #30 and I tried the 2.6.20-2925.8.fc7 kernel.
Progress made May 16 16:42:10 linux kernel: BUG: at kernel/lockdep.c:1858 trace_hardirqs_on() May 16 16:42:10 linux rpc.statd[2179]: statd running as root. chown /var/lib/nfs/statd/sm to choose different user May 16 16:42:10 linux kernel: [<c1005d9e>] show_trace_log_lvl+0x1a/0x2f May 16 16:42:10 linux kernel: [<c1006347>] show_trace+0x12/0x14 May 16 16:42:10 linux kernel: [<c10063c2>] dump_stack+0x16/0x18 May 16 16:42:10 linux kernel: [<c1037435>] trace_hardirqs_on+0xc4/0x143 May 16 16:42:10 linux kernel: [<c10055d4>] restore_all+0x3b/0x3e May 16 16:42:10 linux kernel: ======================= May 16 16:42:10 linux kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0 May 16 16:42:10 linux kernel: scsi 0:0:1:0: Attached scsi generic sg1 type 5 May 16 16:42:10 linux kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0 May 16 16:42:10 linux kernel: sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray May 16 16:42:10 linux kernel: Uniform CD-ROM driver Revision: 3.20 May 16 16:42:10 linux kernel: e100: Intel(R) PRO/100 Network Driver, 3.5.17-k2-NAPI May 16 16:42:10 linux kernel: e100: Copyright(c) 1999-2006 Intel Corporation May 16 16:42:10 linux kernel: ACPI: PCI Interrupt 0000:06:08.0[A] -> GSI 20 (level, low) -> IRQ 21 May 16 16:42:10 linux kernel: e100: eth0: e100_probe: addr 0x50000000, irq 21, MAC addr 00:16:76:0B:64:0F May 16 16:42:10 linux kernel: intel_rng: Firmware space is locked read-only. If you can't or May 16 16:42:10 linux kernel: intel_rng: don't want to disable this in firmware setup, and if May 16 16:42:10 linux kernel: intel_rng: you are certain that your system has a functional May 16 16:42:10 linux kernel: intel_rng: RNG, try using the 'no_fwh_detect' option. May 16 16:42:10 linux kernel: iTCO_vendor_support: vendor-support=0 May 16 16:42:10 linux kernel: iTCO_wdt: Intel TCO WatchDog Timer Driver v1.01 (11-Nov-2006) May 16 16:42:10 linux kernel: iTCO_wdt: Found a ICH7 or ICH7R TCO device (Version=2, TCOBASE=0x0460) May 16 16:42:10 linux kernel: iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) Even so, I am now working from the XEN kernel. By the way, I just installed today's i810 and other X-org updates. Perhaps now progress can move forward once this last hic cough is fixed. Leslie I can rerun producing a tailored pair of dump files (messages, and Xorg) Just ask and you shall receive.
(In reply to comment #46) I belive now you have this bug https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239601
Well, here is some test results LOOP POINT If I boot and go to a normal user, I can log onto the system and use it. I can logoff and log on again as both normal and root user. Seems OK. But If I boot, and the first user is root. The blue screen of death appears. Keyboard lights work, but nothing on the display but black. No mouse either. Only recourse is the hardware system reset. Back to LOOP point, I tried the above a few times, with consistent lockups when root is the very first logon.
*** Bug 233937 has been marked as a duplicate of this bug. ***
Fix committed to CVS.
The fix is on kernel-xen version 2.6.20-1.2952.fc6, that is available on the updates-testing repository.
(In reply to comment #51) I've booted into 2.6.20-1.2952.fc6 successfully and will be watching it into next couple of days while building full xen environment. We've had variety of networking-related issues with 2.6.20-fc6xen (pre 1.2952) kernels, and we'll see if they appear into this newer one. > The fix is on kernel-xen version 2.6.20-1.2952.fc6, that is available on the > updates-testing repository.
Works well for me on 3 guests and 1 host aprox one day. Thank you. :)
(In reply to comment #52) > We've had variety of > networking-related issues with 2.6.20-fc6xen (pre 1.2952) kernels, and we'll see > if they appear into this newer one. Maybe your networking problems are related to bug #223258, that is fixed on rawhide/F7, but not fixed on FC6 yet.
(In reply to comment #54) > (In reply to comment #52) > > We've had variety of > > networking-related issues with 2.6.20-fc6xen (pre 1.2952) kernels, and we'll > > see if they appear into this newer one. > > Maybe your networking problems are related to bug #223258, that is fixed on > rawhide/F7, but not fixed on FC6 yet. It look like not. Dom0 was doing NAT for xen guests (we are not allowing internal MAC/IP to go out of the box) and ip_conntrac failed to do proper connection tracking then. As far as I remember, ICMP packets were able to escape the xen-guest and ICMP replies returned there but anything TCP-related failed to route properly (while not being marked by ip_conntrac as RELATED at Dom0).
Wednesday May 23rd Well, with the update to 2925.9 XEN kernel and supporting files, my system now consistantly locks up at the time it has to switch to Gnome. The boot process is ok, but the problem is in switching. At first I had to do a first boot with XEN, after lockup, reboot, and then XEN would show the Gnome prompt. That led me to believe that there is some uninitialized memory that, after the boot process, is initialized for the next reboot. So, with 2925.8 Fc7, the system was working as I described earlier, but now, after installing this latest XEN kernel, we have the problem again. I have installed... vmlinuz-2.6.20-2925.8.fc7xen and vmlinuz-2.6.20-2925.9.fc7xen One more comment, the cpu microcode module is used for non-XEN kernels, should it be included in the XEN version? I ask that because I am getting an error message about it being missing for XEN. (my processor, is intel d930, mother board intel d945gnt, memory 1 gig, graphics driver i810.
Created attachment 155293 [details] Abridged Message file (showing XEN problems) I pruned away the non-XEN stuff.
I'm running 2.6.20-1.2952.fc6xen for almost 2 days already (mostly idle). No reboots, not lockups, no oops... The only thing I've got is following "4gb seg fixup" which can be related to #215201: May 24 04:12:56 anon kernel: 4gb seg fixup, process prelink (pid 21353), cs:ip 73:08083da1 May 24 04:12:56 anon last message repeated 9 times May 24 04:12:56 anon init: Trying to re-exec init
I am going to give a try to 2.6.20-1.2952.fc6xen soon, look like this latest kernel-xen fixed all the instability issues with previous kernel-xen (last 3 perhaps). Last working kernel here is 2.6.19-1.2911.6.5.fc6xen. Askar
*** Bug 236737 has been marked as a duplicate of this bug. ***
I have updated kernel-xen with 2.6.20-1.2952.fc6xen on one of our hosts and its been working cool from last 17 hours, nothing in logs. Dom0 and demU (5) working just fine, I hope finally we are again back on track :) I'll watching this host for 24+ hours, then will go to update other 2 hosts kernel-xen. Thanks. Askar
With kernel-xen-2.6.20-1.2952.fc6 I've been stable for 3.5 days now (upgraded from kernel-xen-2.6.19-1.2911.6.5.fc6). Looks like the instability problems related to the interim xen kernels has been resolved.
kernel-xen-2.6.20-1.2952.fc6 went to FC6 updates on May 30th: http://fedoraproject.org/wiki/FSA/FC6/FEDORA-2007-513 Closing bug.
*** Bug 236474 has been marked as a duplicate of this bug. ***
Most excellent-o... this fix... fixed all my problems! Thanks very much-o!
my apologies to greno
*** Bug 238350 has been marked as a duplicate of this bug. ***