Bug 234008 - kernel-xen-2.6.20-1.2933 freezes/crashes after boot
Summary: kernel-xen-2.6.20-1.2933 freezes/crashes after boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel-xen
Version: 6
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Eduardo Habkost
QA Contact: Brian Brock
URL:
Whiteboard:
: 233937 235313 236461 236471 236474 236737 238350 238403 238852 (view as bug list)
Depends On:
Blocks: 238432
TreeView+ depends on / blocked
 
Reported: 2007-03-26 16:37 UTC by Pasi Karkkainen
Modified: 2007-11-30 22:12 UTC (History)
31 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-06-07 17:07:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
kernel-xen-2.6.20-1.2937.fc6.i686.rpm BUG log (4.13 KB, text/plain)
2007-03-26 16:37 UTC, Pasi Karkkainen
no flags Details
syslog of crashes from FC5 2307 xen0 kernel (18.53 KB, text/plain)
2007-04-04 12:33 UTC, Robert Story
no flags Details
System log excerpts (70.57 KB, text/plain)
2007-04-04 14:45 UTC, Blair Steenerson
no flags Details
Oops on Asus A7M-266D SMP motherboard. (6.05 KB, text/plain)
2007-04-06 00:01 UTC, Carl-Johan Kjellander
no flags Details
/var/log/messages for 2 crashes from "comment 23" (11.98 KB, text/plain)
2007-05-03 09:44 UTC, Phil Smith
no flags Details
oops log from /var/log/messages (5.79 KB, text/plain)
2007-05-04 12:57 UTC, Emil Jerabek
no flags Details
Abridged Message file (showing XEN problems) (254.51 KB, text/plain)
2007-05-23 20:43 UTC, Leslie Satenstein
no flags Details

Description Pasi Karkkainen 2007-03-26 16:37:33 UTC
Description of problem:

machine freezes (crashes) directly after/during boot, or a couple of minutes
after boot. Sometimes I see BUG and/or calltraces printed on screen. Usually
nothing on the logs.. but I found one BUG from the logs (attached) for 1.2937.

Version-Release number of selected component (if applicable):

kernel-xen-2.6.20-1.2933.fc6.i686.rpm

Same machine worked well with fc6 2.6.18 and 2.6.19 kernels.

How reproducible:

Always. 

Steps to Reproduce:
1. Install kernel-xen-2.6.20-1.2933.fc6.i686.rpm
2. Reboot
3. Wait
  
Actual results:

Screen is frozen (but you can see the picture), nothing happens, keyboard
doesn't work, network doesn't work, system is crashed. 


Additional info:

I also tried test kernel 1.2937 for fc6.. with that kernel the machine reboots
itself instead of freezing.. 

Machine was/is idle during these tests. Only process usually running is md-raid1
reconstruction (because of the crashes).

Hardware is Intel P4 with i955x chipset, ahci sata disks, md raid1 + lvm, 2 GB
RAM, dom0_mem=256M.

Comment 1 Pasi Karkkainen 2007-03-26 16:37:33 UTC
Created attachment 150914 [details]
kernel-xen-2.6.20-1.2937.fc6.i686.rpm BUG log

Comment 2 Adam Tkac 2007-03-27 08:56:59 UTC
I have same problems. Log says nothing. Any hints how can I get more
debug-usefull information?

-A-

Comment 3 Askar Ali Khan 2007-03-27 12:28:40 UTC
I have same problem after updating kernel-xen to 2.6.20-1.2933.fc6.i686.rpm,
domO reboots after boot and domU(s) stuck/unresponsive (have to reboot it) after
running for a while.

I have restored the previous working kernel i-e 2.6.19-1.2911.6.5.fc6xen on both
host and vms.

Askar.

Comment 4 Gerry Reno 2007-03-27 22:32:53 UTC
I can also confirm the problem.
Numerous kernel oops, spurious reboots, entirely unstable.



Comment 5 Chuck Ebbert 2007-03-27 22:59:40 UTC
BUG: unable to handle kernel paging request at virtual address cddd800c
 printing eip:
c0548cbb
0ddd8000 -> *pde = 00000000:72fd9001
0ddd9000 -> *pme = 00000000:03066067
00066000 -> *pte = 00000000:72fd8061
Oops: 0003 [#1]
SMP 
CPU:    1
EIP:    0061:[<c0548cbb>]    Not tainted VLI
EFLAGS: 00010017   (2.6.20-1.2937.fc6xen #1)
EIP is at evtchn_do_upcall+0x55/0x97
eax: 00000006   ebx: 00000000   ecx: cddd7fe4   edx: fffffefa
esi: 00000001   edi: f5416000   ebp: fffffffe   esp: cddd7fc4
ds: 007b   es: 007b   ss: 0069
Process modprobe (pid: 856, ti=cddd7000 task=cdeb4930 task.ti=cddd7000)
Stack: 00000000 00000001 00000000 cddd7fac cddd7fe4 cddd7000 c0404ff2 cddd7fe4 
       00876402 00000073 00000212 bfd055d8 0000007b 00000000 00000000 
Call Trace:
 [<c0404ff2>] hypervisor_callback+0x46/0x50
 =======================
Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 04 24 eb 29 0f bc c0 03 04 24
8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c f7 d2 <89> 51 28 89 c8 e8 f0 da
eb ff eb 05 e8 22 2e 00 00 8b 44 24 04 


It's blowing up at drivers/xen/core/evtchn.c:235:

       do_IRQ(irq, regs);

which is a macro -- the part that dies exapands to:

       (regs)->orig_eax = ~(irq);

regs is 0xcddd7fe4 which is too close to the end of stack and 
regs->orig_eax is beyond the stack by 12 bytes.

How in the world that happened I can't say. Maybe the critical
region fixup is broken?

Comment 6 Michael Young 2007-03-28 10:07:32 UTC
I am seeing a slightly different traceback (2933 kernel) which may be the same
problem
Mar 25 17:13:52 xenda kernel: iret exception: 0000 [#5]
Mar 25 17:13:52 xenda kernel: SMP
Mar 25 17:13:52 xenda kernel: last sysfs file: /block/ram0/range
Mar 25 17:13:52 xenda kernel: Modules linked in: nfs lockd nfs_acl autofs4 hidp
rfcomm l2cap bluetooth sunrpc xennet ipv6 dm_mirror dm_multipath dm_mod
parport_pc lp parport pcspkr xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Mar 25 17:13:52 xenda kernel: CPU:    0
Mar 25 17:13:52 xenda kernel: EIP:    0000:[<45fb2d2a>]    Not tainted VLI
Mar 25 17:13:52 xenda kernel: EFLAGS: 00010000   (2.6.20-1.2933.fc6xen #1)
Mar 25 17:13:52 xenda kernel: EIP is at 0x45fb2d2a
Mar 25 17:13:52 xenda kernel: eax: 00000000   ebx: 008fb402   ecx: 00000073  
edx: 00000286
Mar 25 17:13:52 xenda kernel: esi: bfb75ce8   edi: 0000007b   ebp: 00000000  
esp: ce90601c
Mar 25 17:13:52 xenda kernel: ds: 0000   es: 0000   ss: 0069
Mar 25 17:13:52 xenda kernel: Process cfagent (pid: 5814, ti=ce905000
task=c03a6170 task.ti=ce905000)
Mar 25 17:13:52 xenda kernel: Stack: 00000170 00000000 00000000 00067047
00067049 0006704b 0006704c 0006704d
Mar 25 17:13:52 xenda kernel:        0006704e 00067051 00067063 00067104
000671e9 000671ea 000671eb 000671ec
Mar 25 17:13:52 xenda kernel:        00000000 00000000 ac447517 00060fdd
00000000 00000000 00000000 00000000
Mar 25 17:13:52 xenda kernel: Call Trace:
Mar 25 17:13:52 xenda kernel: BUG: unable to handle kernel paging request at
virtual address 0006704c
Mar 25 17:13:52 xenda kernel:  printing eip:
Mar 25 17:13:52 xenda kernel: c04055c2
Mar 25 17:13:52 xenda kernel: 0376b000 -> *pde = 00000000:0901d001
Mar 25 17:13:52 xenda kernel: 094e5000 -> *pme = 00000000:00000000
Mar 25 17:13:52 xenda kernel: Oops: 0000 [#6]
Mar 25 17:13:52 xenda kernel: SMP
Mar 25 17:13:52 xenda kernel: last sysfs file: /block/ram0/range
Mar 25 17:13:52 xenda kernel: Modules linked in: nfs lockd nfs_acl autofs4 hidp
rfcomm l2cap bluetooth sunrpc xennet ipv6 dm_mirror dm_multipath dm_mod
parport_pc lp parport pcspkr xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Mar 25 17:13:52 xenda kernel: CPU:    0
Mar 25 17:13:52 xenda kernel: EIP:    0061:[<c04055c2>]    Not tainted VLI
Mar 25 17:13:52 xenda kernel: EFLAGS: 00010093   (2.6.20-1.2933.fc6xen #1)
Mar 25 17:13:52 xenda kernel: EIP is at dump_trace+0x5c/0x93
Mar 25 17:13:52 xenda kernel: eax: 00067ffd   ebx: 0006704c   ecx: 039b779e  
edx: 00ab5f00
Mar 25 17:13:52 xenda kernel: esi: 00000000   edi: 00067000   ebp: c0693fce  
esp: ce905e7c
Mar 25 17:13:52 xenda kernel: ds: 007b   es: 007b   ss: 0069
Mar 25 17:13:52 xenda kernel: Process cfagent (pid: 5814, ti=ce905000
task=c03a6170 task.ti=ce905000)
Mar 25 17:13:52 xenda kernel: Stack: c0693e8e c0693fce 00000018 00000000
c0693fce c0405611 c06e44e0 c0693fce
Mar 25 17:13:52 xenda kernel:        ce90607f c04056c0 c0693fce c0693fce
ce905fe4 ce90601c 00000002 00010000
Mar 25 17:13:52 xenda kernel:        ce905fe4 ce90601c c0405856 c0693fce
00000010 c03a6304 000016b6 ce905000
Mar 25 17:13:52 xenda kernel: Call Trace:
Mar 25 17:13:52 xenda kernel:  [<c0405611>] show_trace_log_lvl+0x18/0x2c
Mar 25 17:13:52 xenda kernel:  [<c04056c0>] show_stack_log_lvl+0x9b/0xa3
Mar 25 17:13:52 xenda kernel:  [<c0405856>] show_registers+0x18e/0x25d
Mar 25 17:13:52 xenda kernel:  [<c0613405>] notifier_call_chain+0x19/0x29
Mar 25 17:13:52 xenda kernel:  [<c0405a58>] die+0x133/0x22f
Mar 25 17:13:52 xenda kernel:  [<c0406302>] do_iret_error+0xa7/0xb1
Mar 25 17:13:52 xenda kernel:  [<c0404e92>] restore_nocheck_notrace+0x7/0xf
Mar 25 17:13:52 xenda kernel:  [<c0404e93>] restore_nocheck_notrace+0x8/0xf
Mar 25 17:13:52 xenda kernel:  [<c0404e94>] restore_nocheck_notrace+0x9/0xf
Mar 25 17:13:52 xenda kernel:  [<c0404e99>] restore_nocheck_notrace+0xe/0xf
Mar 25 17:13:52 xenda kernel:  [<c042c063>] search_exception_tables+0x14/0x25
Mar 25 17:13:52 xenda kernel:  [<c04144ef>] fixup_exception+0xb/0x20
Mar 25 17:13:52 xenda kernel:  [<c0611b45>] do_general_protection+0x11c/0x16f
Mar 25 17:13:52 xenda kernel:  [<c04068d1>] do_IRQ+0xc6/0xdd
Mar 25 17:13:52 xenda kernel:  [<c0611a29>] do_general_protection+0x0/0x16f
Mar 25 17:13:52 xenda kernel:  [<c040625b>] do_iret_error+0x0/0xb1
Mar 25 17:13:52 xenda kernel:  [<c061162d>] error_code+0x35/0x3c
Mar 25 17:13:52 xenda kernel:  =======================
Mar 25 17:13:52 xenda kernel: Code: 9a d4 01 00 00 89 df 81 e7 00 f0 ff ff eb 0e
8b 4c 24 18 89 f2 89 e8 ff 51 08 83 c3 04 39 fb 76 29 8d 87 fd 0f 00 00 39 c3 73
1f <8b> 33 89 f0 e8 61 6a 02 00 85 c0 74 e2 eb d5 8b 4f 34 85 c9 74
Mar 25 17:13:52 xenda kernel: EIP: [<c04055c2>] dump_trace+0x5c/0x93 SS:ESP
0069:ce905e7c
Mar 25 17:13:52 xenda kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Mar 25 17:13:52 xenda kernel: in_atomic():0, irqs_disabled():1
Mar 25 17:13:52 xenda kernel:  [<c043059a>] down_read+0x12/0x28
Mar 25 17:13:52 xenda kernel:  [<c0438c0a>] acct_collect+0x38/0x13e
Mar 25 17:13:52 xenda kernel:  [<c041fe2b>] do_exit+0x1b1/0x6f6
Mar 25 17:13:52 xenda kernel:  [<c0405b2f>] die+0x20a/0x22f
Mar 25 17:13:52 xenda kernel:  [<c061326f>] do_page_fault+0xab1/0xc2e
Mar 25 17:13:52 xenda kernel:  [<c06114ff>] _spin_unlock_irqrestore+0x8/0x16
Mar 25 17:13:52 xenda kernel:  [<c06114ff>] _spin_unlock_irqrestore+0x8/0x16
Mar 25 17:13:52 xenda kernel:  [<c041d829>] release_console_sem+0x192/0x1d1
Mar 25 17:13:52 xenda kernel:  [<c041de9a>] vprintk+0x2de/0x2e8
Mar 25 17:13:52 xenda kernel:  [<c06127be>] do_page_fault+0x0/0xc2e
Mar 25 17:13:52 xenda kernel:  [<c061162d>] error_code+0x35/0x3c
Mar 25 17:13:52 xenda kernel:  [<c04100d8>] MPBIOS_trigger+0x4b/0xbc
Mar 25 17:13:52 xenda kernel:  [<c04055c2>] dump_trace+0x5c/0x93
Mar 25 17:13:52 xenda kernel:  [<c0405611>] show_trace_log_lvl+0x18/0x2c
Mar 25 17:13:52 xenda kernel:  [<c04056c0>] show_stack_log_lvl+0x9b/0xa3
Mar 25 17:13:52 xenda kernel:  [<c0405856>] show_registers+0x18e/0x25d
Mar 25 17:13:52 xenda kernel:  [<c0613405>] notifier_call_chain+0x19/0x29
Mar 25 17:13:52 xenda kernel:  [<c0405a58>] die+0x133/0x22f
Mar 25 17:13:52 xenda kernel:  [<c0406302>] do_iret_error+0xa7/0xb1
Mar 25 17:13:52 xenda kernel:  [<c0404e92>] restore_nocheck_notrace+0x7/0xf
Mar 25 17:13:52 xenda kernel:  [<c0404e93>] restore_nocheck_notrace+0x8/0xf
Mar 25 17:13:52 xenda kernel:  [<c0404e94>] restore_nocheck_notrace+0x9/0xf
Mar 25 17:13:52 xenda kernel:  [<c0404e99>] restore_nocheck_notrace+0xe/0xf
Mar 25 17:13:52 xenda kernel:  [<c042c063>] search_exception_tables+0x14/0x25
Mar 25 17:13:52 xenda kernel:  [<c04144ef>] fixup_exception+0xb/0x20
Mar 25 17:13:52 xenda kernel:  [<c0611b45>] do_general_protection+0x11c/0x16f
Mar 25 17:13:52 xenda kernel:  [<c04068d1>] do_IRQ+0xc6/0xdd
Mar 25 17:13:52 xenda kernel:  [<c0611a29>] do_general_protection+0x0/0x16f
Mar 25 17:13:52 xenda kernel:  [<c040625b>] do_iret_error+0x0/0xb1
Mar 25 17:13:52 xenda kernel:  [<c061162d>] error_code+0x35/0x3c
Mar 25 17:13:52 xenda kernel:  =======================

Comment 7 Chuck Ebbert 2007-03-28 16:02:20 UTC
Comment #5 is from kernel 2937 with the March 21 update applied.

So that bug is still unfixed.


Comment 8 Shashin Shinde 2007-03-29 14:08:13 UTC
I can also confirm the behaviour. It is irrlevant of the architecture as well. I
see the behaviour on 32 bit as well as 64 bit machines. 32 bit is Intel CPU and
64 bit is AMD Athlon CPU. Kernel I noticed it on is kernel-xen-2.6.20-1.2933

Comment 9 Askar Ali Khan 2007-03-29 15:26:35 UTC
(In reply to comment #7)
> Comment #5 is from kernel 2937 with the March 21 update applied.
> 
> So that bug is still unfixed.
> 

I don't see any acknowledgment from any fedora dev people. 

Comment 10 Blair Steenerson 2007-04-03 03:00:49 UTC
Michael's logs are similar to mine.  2.6.20-1.2933.fc6xen is completely 
unstable.  Reboots spontaneously, sometimes after a few minutes, sometimes a 
few hours.

Comment 11 Robert Story 2007-04-04 11:27:20 UTC
I'm seeing similar issues as well, with FC5 kernel 2307. The 'at
evtchn_do_upcall' bit in Comment #5 From Chuck Ebbert looks familair. I don't
have any logs, because I'm still working to get the machine back up under a
previous kernel (fsck on several large filesystems).

Comment 12 Robert Story 2007-04-04 12:33:51 UTC
Created attachment 151652 [details]
syslog of crashes from FC5 2307 xen0 kernel

Comment 13 Eduardo Habkost 2007-04-04 14:03:11 UTC
The symptoms described here look different, but they may have the same cause 
of bug #233937

Comment 14 Blair Steenerson 2007-04-04 14:45:17 UTC
Created attachment 151660 [details]
System log excerpts

Comment 15 Blair Steenerson 2007-04-04 14:52:15 UTC
I've backleveled to 2.6.19-1.2911.6.5.fc6xen and my problems have gone away. 
Completely stable again.

I've sent SOME of the errors from my system log in the previous attachment -
sorry, I'm new to this Bugzilla thing.

All I see in the logs are these errors, and reboots.  Sometimes they coincide,
sometimes not.

Comment 16 Carl-Johan Kjellander 2007-04-05 23:53:28 UTC
*** Bug 235313 has been marked as a duplicate of this bug. ***

Comment 17 Carl-Johan Kjellander 2007-04-06 00:01:33 UTC
Created attachment 151820 [details]
Oops on Asus A7M-266D SMP motherboard.

Here is one of the Oopses during an X crash that the system survived. (But not
for long, it hung the system a minute later)

Comment 18 Ricardo Cantu 2007-04-10 14:08:45 UTC
I'm getting pretty much the same thing on fc5 2307 xen0, but it's not freezing
or rebooting. Here is my Oops:

xen kernel: iret exception: 0000 [#2]
xen kernel: SMP
xen kernel: CPU:    1
xen kernel: EIP:    2868:[<e8000000>]    Not tainted VLI
xen kernel: EFLAGS: 082444c7   (2.6.20-1.2307.fc5xen0 #1)
xen kernel: EIP is at 0xe8000000
xen kernel: eax: 00000000   ebx: 00e29aa1   ecx: 00000073   edx: 00310246
xen kernel: esi: b6e54c98   edi: 0000007b   ebp: 00000000   esp: c84be01c
xen kernel: ds: 0000   es: 0000   ss: 0069
xen kernel: Process beagle-build-in (pid: 11390, ti=c84bd000 task=f3f15930
task.ti=c84bd000)
xen kernel: Stack: 08387e20 042444c7 08299a30 0e2404c7 e8000000 0017284c
299ae0b8 2444c708
xen kernel:        3876000c 24448908 2444c708 2fc9df04 2404c708 00000000
174927e8 9a30b800
xen kernel:        44c70829 76000c24 44890838 44c70824 c9e20424 04c7082f
00000024 4902e800
xen kernel: Call Trace:
xen kernel: Oops: 0000 [#3]
xen kernel: SMP
xen kernel: CPU:    1
xen kernel: EIP:    0061:[<c1005562>]    Not tainted VLI
xen kernel: EFLAGS: 00310097   (2.6.20-1.2307.fc5xen0 #1)
xen kernel: EIP is at dump_trace+0x5c/0x93
xen kernel: eax: 299aeffd   ebx: 299ae0b8   ecx: 00d5ab89   edx: 004ce880
xen kernel: esi: 082fcd95   edi: 299ae000   ebp: c125b39e   esp: c84bde7c
xen kernel: ds: 007b   es: 007b   ss: 0069
xen kernel: Process beagle-build-in (pid: 11390, ti=c84bd000 task=f3f15930
task.ti=c84bd000)
xen kernel: Stack: c125b25e c125b39e 00000018 00000000 c125b39e c10055b1
c12af4e0 c125b39e
xen kernel:        c84be07f c1005660 c125b39e c125b39e c84bdfe4 c84be01c
00000002 082444c7
xen kernel:        c84bdfe4 c84be01c c10057f6 c125b39e 00000010 f3f15adc
00002c7e c84bd000
xen kernel: Call Trace:
xen kernel:  [<c10055b1>] show_trace_log_lvl+0x18/0x2c
xen kernel:  [<c1005660>] show_stack_log_lvl+0x9b/0xa3
xen kernel:  [<c10057f6>] show_registers+0x18e/0x25d
xen kernel:  [<c1216dbc>] notifier_call_chain+0x19/0x29
xen kernel:  [<c10059f8>] die+0x133/0x22f
xen kernel:  [<c10062ab>] do_iret_error+0xa7/0xb1
xen kernel:  [<c1004e2a>] restore_nocheck_notrace+0x7/0xf
xen kernel:  [<c1004e2b>] restore_nocheck_notrace+0x8/0xf
xen kernel:  [<c1004e2c>] restore_nocheck_notrace+0x9/0xf
xen kernel:  [<c1004e31>] restore_nocheck_notrace+0xe/0xf
xen kernel:  [<c102f487>] search_exception_tables+0x14/0x25
xen kernel:  [<c101739f>] fixup_exception+0xb/0x20
xen kernel:  [<c12158e5>] do_general_protection+0x11c/0x16f
xen kernel:  [<c1006879>] do_IRQ+0xc6/0xdd
xen kernel:  [<c12157c9>] do_general_protection+0x0/0x16f
xen kernel:  [<c1006204>] do_iret_error+0x0/0xb1
xen kernel:  [<c12153cd>] error_code+0x35/0x3c
xen kernel:  =======================
xen kernel: Code: 9a f4 01 00 00 89 df 81 e7 00 f0 ff ff eb 0e 8b 4c 24 18 89 f2
89 e8 ff 51 08 83 c3 04 39 fb 76 29 8d 87 fd 0f 00 00 39 c3 73 1f <8b> 33 89 f0
e8 e5 9e 02 00 85 c0 74 e2 eb d5 8b 4f 34 85 c9 74
xen kernel: EIP: [<c1005562>] dump_trace+0x5c/0x93 SS:ESP 0069:c84bde7c


Comment 19 Phil Lobbes 2007-04-10 20:47:55 UTC
Also seeing similar problems.  Adding the info just in case it helps QA catch
problems like this before unstable kernels make it to the general public.

BUG: unable to handle kernel paging request at virtual address e1b2800c
 printing eip:
c0548ceb
22c7a000 -> *pde = 00000001:1fa6f001
21c6f000 -> *pme = 00000000:06103067
00103000 -> *pte = 80000001:1fd28061
Oops: 0003 [#1]
SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: nfsd exportfs lockd nfs_acl sunrpc ipv6 ib_iser rdma_cm ib_cm
 iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_m
ultipath video sbs i2c_ec i2c_core dock button battery asus_acpi backlight ac pa
rport_pc lp parport sg ata_piix libata pcspkr ide_cd bnx2 serio_raw cdrom serial
_core dm_snapshot dm_zero dm_mirror dm_mod megaraid_sas sd_mod scsi_mod ext3 jbd
 ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0061:[<c0548ceb>]    Not tainted VLI
EFLAGS: 00010017   (2.6.20-1.2933.fc6xen #1)
EIP is at evtchn_do_upcall+0x55/0x97
eax: 00000018   ebx: 00000000   ecx: e1b27fe4   edx: ffffffef
esi: 00000001   edi: f5416000   ebp: fffffffe   esp: e1b27fc4
ds: 007b   es: 007b   ss: 0069
Process sshd (pid: 6621, ti=e1b27000 task=ed7351b0 task.ti=e1b27000)
Stack: 00000000 00000000 00000009 e1b27fac e1b27fe4 e1b27000 c0404ff2 e1b27fe4
       00cf1402 00000073 00000246 bff5c40c 0000007b 00000000 00000000
Call Trace:
 [<c0404ff2>] hypervisor_callback+0x46/0x50
 =======================
Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 04 24 eb 29 0f bc c0 03 04 24
 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c f7 d2 <89> 51 28 89 c8 e8 16 db
 eb ff eb 05 e8 22 2e 00 00 8b 44 24 04
EIP: [<c0548ceb>] evtchn_do_upcall+0x55/0x97 SS:ESP 0069:e1b27fc4
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043059a>] down_read+0x12/0x28
 [<c0438c0a>] acct_collect+0x38/0x13e
 [<c041fe2b>] do_exit+0x1b1/0x6f6
 [<c0405b2f>] die+0x20a/0x22f
 [<c061326f>] do_page_fault+0xab1/0xc2e
 [<c042daad>] autoremove_wake_function+0x0/0x35
 [<c06127be>] do_page_fault+0x0/0xc2e
 [<c061162d>] error_code+0x35/0x3c
 [<c0548ceb>] evtchn_do_upcall+0x55/0x97
 [<c0404ff2>] hypervisor_callback+0x46/0x50
 =======================


Comment 20 antony osullivan 2007-04-16 17:07:23 UTC
-
Just to add another person to the list of problems...

I have 6 machines... all different in terms of processor/memory/disks... as 
well as being in different locations...  They ALL are suffering from the 
problems that others have mentioned above...

-------------------------

kernel 2911 works...

-------------------------

kernel 2933 Suffers from the problems above...
kernel 2944 Suffers from the problems above...

-------------------------

==============
Apr 15 20:17:35 www kernel: Linux version 2.6.20-1.2944.fc6xen 
(brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 
4.1.1-51)) #1 SMP Tue Apr 10 19:12:19 EDT 2007
==============

Apr 15 20:17:59 www kernel: BUG: unable to handle kernel paging request at 
virtual address e8c2b00c
Apr 15 20:17:59 www kernel:  printing eip:
Apr 15 20:17:59 www kernel: c054936b
Apr 15 20:17:59 www kernel: 293a3000 -> *pde = 00000000:56dfc001
Apr 15 20:17:59 www kernel: 297fc000 -> *pme = 00000000:0313e067
Apr 15 20:17:59 www kernel: 0013e000 -> *pte = 80000000:5762b061
Apr 15 20:17:59 www kernel: Oops: 0003 [#1]
Apr 15 20:17:59 www kernel: SMP
Apr 15 20:17:59 www kernel: last sysfs 
file: /devices/pci0000:00/0000:00:1c.1/0000:04:00.0/irq
Apr 15 20:17:59 www kernel: Modules linked in: autofs4 hidp l2cap bluetooth 
sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
libiscsi scsi_transport_iscsi nf_conntrack_ftp nf_conntrack_netbios_ns 
ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter 
ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables 
dm_multipath video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 
lp floppy sg pcspkr iTCO_wdt iTCO_vendor_support tg3 i2c_i801 ide_cd i2c_core 
parport_pc parport serial_core cdrom dm_snapshot dm_zero dm_mirror dm_mod ahci 
ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Apr 15 20:17:59 www kernel: CPU:    0
Apr 15 20:17:59 www kernel: EIP:    0061:[<c054936b>]    Not tainted VLI
Apr 15 20:17:59 www kernel: EFLAGS: 00010013   (2.6.20-1.2944.fc6xen #1)
Apr 15 20:17:59 www kernel: EIP is at evtchn_do_upcall+0x55/0x97
Apr 15 20:17:59 www kernel: eax: 00000001   ebx: 00000000   ecx: e8c2afe4   
edx: fffffeff
Apr 15 20:17:59 www kernel: esi: 00000001   edi: f5416000   ebp: fffffffe   
esp: e8c2afc4
Apr 15 20:17:59 www kernel: ds: 007b   es: 007b   ss: 0069
Apr 15 20:17:59 www kernel: Process MailScanner (pid: 2885, ti=e8c2a000 
task=ea092df0 task.ti=e8c2a000)
Apr 15 20:17:59 www kernel: Stack: 00000000 00000000 0b86b000 e8c2afac e8c2afe4 
e8c2a000 c0404ff2 e8c2afe4
Apr 15 20:17:59 www kernel:        009f1402 00000073 00000212 bfc0b2dc 0000007b 
00000000 00000000
Apr 15 20:17:59 www kernel: Call Trace:
Apr 15 20:17:59 www kernel:  [<c0404ff2>] hypervisor_callback+0x46/0x50
Apr 15 20:17:59 www kernel:  =======================
Apr 15 20:17:59 www kernel: Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 
04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c 
f7 d2 <89> 51 28 89 c8 e8 40 d4 eb ff eb 05 e8 22 2e 00 00 8b 44 24 04
Apr 15 20:17:59 www kernel: EIP: [<c054936b>] evtchn_do_upcall+0x55/0x97 SS:ESP 
0069:e8c2afc4
Apr 15 20:17:59 www kernel:  <3>BUG: sleeping function called from invalid 
context at kernel/rwsem.c:20
Apr 15 20:17:59 www kernel: in_atomic():0, irqs_disabled():1
Apr 15 20:18:00 www kernel:  [<c04303e6>] down_read+0x12/0x28
Apr 15 20:18:02 www kernel:  [<c0438a56>] acct_collect+0x38/0x13e
Apr 15 20:18:02 www kernel:  [<c041fc77>] do_exit+0x1b1/0x6f6
Apr 15 20:18:02 www kernel:  [<c0405b2f>] die+0x20a/0x22f
Apr 15 20:18:03 www kernel:  [<c061396f>] do_page_fault+0xab1/0xc2e
Apr 15 20:18:03 www kernel:  [<c0613625>] do_page_fault+0x767/0xc2e
Apr 15 20:18:03 www kernel:  [<c0457f4d>] vma_merge+0xfd/0x19a
Apr 15 20:18:04 www kernel:  [<c04583c5>] do_brk+0x169/0x212
Apr 15 20:18:04 www kernel:  [<c0612ebe>] do_page_fault+0x0/0xc2e
Apr 15 20:18:04 www kernel:  [<c0611d2d>] error_code+0x35/0x3c
Apr 15 20:18:04 www kernel:  [<c054936b>] evtchn_do_upcall+0x55/0x97
Apr 15 20:18:04 www kernel:  [<c0404ff2>] hypervisor_callback+0x46/0x50
Apr 15 20:18:04 www kernel:  =======================


Comment 21 Jeff Pajor 2007-04-20 23:02:48 UTC
2944 didn't resolve the issue for me, either.

(In reply to comment #20)
> -
> Just to add another person to the list of problems...
> 
> I have 6 machines... all different in terms of processor/memory/disks... as 
> well as being in different locations...  They ALL are suffering from the 
> problems that others have mentioned above...
> 
> -------------------------
> 
> kernel 2911 works...
> 
> -------------------------
> 
> kernel 2933 Suffers from the problems above...
> kernel 2944 Suffers from the problems above...
> 
> -------------------------
> 
> ==============
> Apr 15 20:17:35 www kernel: Linux version 2.6.20-1.2944.fc6xen 
> (brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 
> 4.1.1-51)) #1 SMP Tue Apr 10 19:12:19 EDT 2007
> ==============
> 
> Apr 15 20:17:59 www kernel: BUG: unable to handle kernel paging request at 
> virtual address e8c2b00c
> Apr 15 20:17:59 www kernel:  printing eip:
> Apr 15 20:17:59 www kernel: c054936b
> Apr 15 20:17:59 www kernel: 293a3000 -> *pde = 00000000:56dfc001
> Apr 15 20:17:59 www kernel: 297fc000 -> *pme = 00000000:0313e067
> Apr 15 20:17:59 www kernel: 0013e000 -> *pte = 80000000:5762b061
> Apr 15 20:17:59 www kernel: Oops: 0003 [#1]
> Apr 15 20:17:59 www kernel: SMP
> Apr 15 20:17:59 www kernel: last sysfs 
> file: /devices/pci0000:00/0000:00:1c.1/0000:04:00.0/irq
> Apr 15 20:17:59 www kernel: Modules linked in: autofs4 hidp l2cap bluetooth 
> sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
> libiscsi scsi_transport_iscsi nf_conntrack_ftp nf_conntrack_netbios_ns 
> ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter 
> ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables 
> dm_multipath video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 
> lp floppy sg pcspkr iTCO_wdt iTCO_vendor_support tg3 i2c_i801 ide_cd i2c_core 
> parport_pc parport serial_core cdrom dm_snapshot dm_zero dm_mirror dm_mod ahci 
> ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
> Apr 15 20:17:59 www kernel: CPU:    0
> Apr 15 20:17:59 www kernel: EIP:    0061:[<c054936b>]    Not tainted VLI
> Apr 15 20:17:59 www kernel: EFLAGS: 00010013   (2.6.20-1.2944.fc6xen #1)
> Apr 15 20:17:59 www kernel: EIP is at evtchn_do_upcall+0x55/0x97
> Apr 15 20:17:59 www kernel: eax: 00000001   ebx: 00000000   ecx: e8c2afe4   
> edx: fffffeff
> Apr 15 20:17:59 www kernel: esi: 00000001   edi: f5416000   ebp: fffffffe   
> esp: e8c2afc4
> Apr 15 20:17:59 www kernel: ds: 007b   es: 007b   ss: 0069
> Apr 15 20:17:59 www kernel: Process MailScanner (pid: 2885, ti=e8c2a000 
> task=ea092df0 task.ti=e8c2a000)
> Apr 15 20:17:59 www kernel: Stack: 00000000 00000000 0b86b000 e8c2afac e8c2afe4 
> e8c2a000 c0404ff2 e8c2afe4
> Apr 15 20:17:59 www kernel:        009f1402 00000073 00000212 bfc0b2dc 0000007b 
> 00000000 00000000
> Apr 15 20:17:59 www kernel: Call Trace:
> Apr 15 20:17:59 www kernel:  [<c0404ff2>] hypervisor_callback+0x46/0x50
> Apr 15 20:17:59 www kernel:  =======================
> Apr 15 20:17:59 www kernel: Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 
> 04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c 
> f7 d2 <89> 51 28 89 c8 e8 40 d4 eb ff eb 05 e8 22 2e 00 00 8b 44 24 04
> Apr 15 20:17:59 www kernel: EIP: [<c054936b>] evtchn_do_upcall+0x55/0x97 SS:ESP 
> 0069:e8c2afc4
> Apr 15 20:17:59 www kernel:  <3>BUG: sleeping function called from invalid 
> context at kernel/rwsem.c:20
> Apr 15 20:17:59 www kernel: in_atomic():0, irqs_disabled():1
> Apr 15 20:18:00 www kernel:  [<c04303e6>] down_read+0x12/0x28
> Apr 15 20:18:02 www kernel:  [<c0438a56>] acct_collect+0x38/0x13e
> Apr 15 20:18:02 www kernel:  [<c041fc77>] do_exit+0x1b1/0x6f6
> Apr 15 20:18:02 www kernel:  [<c0405b2f>] die+0x20a/0x22f
> Apr 15 20:18:03 www kernel:  [<c061396f>] do_page_fault+0xab1/0xc2e
> Apr 15 20:18:03 www kernel:  [<c0613625>] do_page_fault+0x767/0xc2e
> Apr 15 20:18:03 www kernel:  [<c0457f4d>] vma_merge+0xfd/0x19a
> Apr 15 20:18:04 www kernel:  [<c04583c5>] do_brk+0x169/0x212
> Apr 15 20:18:04 www kernel:  [<c0612ebe>] do_page_fault+0x0/0xc2e
> Apr 15 20:18:04 www kernel:  [<c0611d2d>] error_code+0x35/0x3c
> Apr 15 20:18:04 www kernel:  [<c054936b>] evtchn_do_upcall+0x55/0x97
> Apr 15 20:18:04 www kernel:  [<c0404ff2>] hypervisor_callback+0x46/0x50
> Apr 15 20:18:04 www kernel:  =======================
> 

(In reply to comment #20)
> -
> Just to add another person to the list of problems...
> 
> I have 6 machines... all different in terms of processor/memory/disks... as 
> well as being in different locations...  They ALL are suffering from the 
> problems that others have mentioned above...
> 
> -------------------------
> 
> kernel 2911 works...
> 
> -------------------------
> 
> kernel 2933 Suffers from the problems above...
> kernel 2944 Suffers from the problems above...
> 
> -------------------------
> 
> ==============
> Apr 15 20:17:35 www kernel: Linux version 2.6.20-1.2944.fc6xen 
> (brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 
> 4.1.1-51)) #1 SMP Tue Apr 10 19:12:19 EDT 2007
> ==============
> 
> Apr 15 20:17:59 www kernel: BUG: unable to handle kernel paging request at 
> virtual address e8c2b00c
> Apr 15 20:17:59 www kernel:  printing eip:
> Apr 15 20:17:59 www kernel: c054936b
> Apr 15 20:17:59 www kernel: 293a3000 -> *pde = 00000000:56dfc001
> Apr 15 20:17:59 www kernel: 297fc000 -> *pme = 00000000:0313e067
> Apr 15 20:17:59 www kernel: 0013e000 -> *pte = 80000000:5762b061
> Apr 15 20:17:59 www kernel: Oops: 0003 [#1]
> Apr 15 20:17:59 www kernel: SMP
> Apr 15 20:17:59 www kernel: last sysfs 
> file: /devices/pci0000:00/0000:00:1c.1/0000:04:00.0/irq
> Apr 15 20:17:59 www kernel: Modules linked in: autofs4 hidp l2cap bluetooth 
> sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
> libiscsi scsi_transport_iscsi nf_conntrack_ftp nf_conntrack_netbios_ns 
> ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter 
> ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables 
> dm_multipath video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 
> lp floppy sg pcspkr iTCO_wdt iTCO_vendor_support tg3 i2c_i801 ide_cd i2c_core 
> parport_pc parport serial_core cdrom dm_snapshot dm_zero dm_mirror dm_mod ahci 
> ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
> Apr 15 20:17:59 www kernel: CPU:    0
> Apr 15 20:17:59 www kernel: EIP:    0061:[<c054936b>]    Not tainted VLI
> Apr 15 20:17:59 www kernel: EFLAGS: 00010013   (2.6.20-1.2944.fc6xen #1)
> Apr 15 20:17:59 www kernel: EIP is at evtchn_do_upcall+0x55/0x97
> Apr 15 20:17:59 www kernel: eax: 00000001   ebx: 00000000   ecx: e8c2afe4   
> edx: fffffeff
> Apr 15 20:17:59 www kernel: esi: 00000001   edi: f5416000   ebp: fffffffe   
> esp: e8c2afc4
> Apr 15 20:17:59 www kernel: ds: 007b   es: 007b   ss: 0069
> Apr 15 20:17:59 www kernel: Process MailScanner (pid: 2885, ti=e8c2a000 
> task=ea092df0 task.ti=e8c2a000)
> Apr 15 20:17:59 www kernel: Stack: 00000000 00000000 0b86b000 e8c2afac e8c2afe4 
> e8c2a000 c0404ff2 e8c2afe4
> Apr 15 20:17:59 www kernel:        009f1402 00000073 00000212 bfc0b2dc 0000007b 
> 00000000 00000000
> Apr 15 20:17:59 www kernel: Call Trace:
> Apr 15 20:17:59 www kernel:  [<c0404ff2>] hypervisor_callback+0x46/0x50
> Apr 15 20:17:59 www kernel:  =======================
> Apr 15 20:17:59 www kernel: Code: bd fe ff ff ff 88 d9 89 d8 c1 e0 05 d3 c5 89 
> 04 24 eb 29 0f bc c0 03 04 24 8b 14 85 80 f0 6f c0 83 fa ff 74 12 8b 4c 24 1c 
> f7 d2 <89> 51 28 89 c8 e8 40 d4 eb ff eb 05 e8 22 2e 00 00 8b 44 24 04
> Apr 15 20:17:59 www kernel: EIP: [<c054936b>] evtchn_do_upcall+0x55/0x97 SS:ESP 
> 0069:e8c2afc4
> Apr 15 20:17:59 www kernel:  <3>BUG: sleeping function called from invalid 
> context at kernel/rwsem.c:20
> Apr 15 20:17:59 www kernel: in_atomic():0, irqs_disabled():1
> Apr 15 20:18:00 www kernel:  [<c04303e6>] down_read+0x12/0x28
> Apr 15 20:18:02 www kernel:  [<c0438a56>] acct_collect+0x38/0x13e
> Apr 15 20:18:02 www kernel:  [<c041fc77>] do_exit+0x1b1/0x6f6
> Apr 15 20:18:02 www kernel:  [<c0405b2f>] die+0x20a/0x22f
> Apr 15 20:18:03 www kernel:  [<c061396f>] do_page_fault+0xab1/0xc2e
> Apr 15 20:18:03 www kernel:  [<c0613625>] do_page_fault+0x767/0xc2e
> Apr 15 20:18:03 www kernel:  [<c0457f4d>] vma_merge+0xfd/0x19a
> Apr 15 20:18:04 www kernel:  [<c04583c5>] do_brk+0x169/0x212
> Apr 15 20:18:04 www kernel:  [<c0612ebe>] do_page_fault+0x0/0xc2e
> Apr 15 20:18:04 www kernel:  [<c0611d2d>] error_code+0x35/0x3c
> Apr 15 20:18:04 www kernel:  [<c054936b>] evtchn_do_upcall+0x55/0x97
> Apr 15 20:18:04 www kernel:  [<c0404ff2>] hypervisor_callback+0x46/0x50
> Apr 15 20:18:04 www kernel:  =======================
> 



Comment 22 Michael DeHaan 2007-04-30 20:02:59 UTC
Me too.

Lots of reboots.  Occasional Oops messages/lockups.


Comment 23 Phil Smith 2007-05-02 14:23:38 UTC
Me too: repeated problems with 2.6.20-1.2944.fc6xen.
Non-xen is OK.

Apr 16 14:19:25 msslin kernel: iret exception: 0000 [#1]
Apr 16 14:19:25 msslin kernel: SMP
Apr 16 14:19:25 msslin kernel: last sysfs file:
/devices/pci0000:00/0000:00:1e.0/0000:03:00.0/i2c-4/name
Apr 16 14:19:25 xxxxxx kernel: Modules linked in: bridge netloop netbk blktap
blkbk autofs4 sunrpc nf_conntrack_ftp nf_conntrack
_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink
xt_tcpudp iptable_filter ip_tables x_tables dm_multipat
h video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 lp floppy
snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm pcspkr
parport_pc ohci1394 nvidia(P)(U) i2c_nforce2 snd
_mpu401 skge parport snd_mpu401_uart snd_rawmidi ide_cd forcedeth snd_timer
cdrom snd_seq_device serio_raw ieee1394 i2c_core ser
ial_core ns558 gameport snd_page_alloc snd soundcore dm_snapshot dm_zero
dm_mirror dm_mod sata_nv libata sd_mod scsi_mod ext3 jb
d ehci_hcd ohci_hcd uhci_hcd
Apr 16 14:19:25 xxxxxx kernel: CPU:    0
Apr 16 14:19:25 xxxxxx kernel: EIP:    4688:[<0891467c>]    Tainted: P      VLI
Apr 16 14:19:25 xxxxxx kernel: EFLAGS: 08914694   (2.6.20-1.2944.fc6xen #1)
Apr 16 14:19:25 xxxxxx kernel: EIP is at 0x891467c
Apr 16 14:19:25 xxxxxx kernel: eax: 00000000   ebx: 0045d402   ecx: 00000073  
edx: 00000202
Apr 16 14:19:25 xxxxxx kernel: esi: bfdc6ad8   edi: 0000007b   ebp: 00000000  
esp: c2df801c
Apr 16 14:19:25 xxxxxx kernel: ds: 0000   es: 0000   ss: 0069
Apr 16 14:19:25 xxxxxx kernel: Process firefox-bin (pid: 3692, ti=c2df7000
task=c6cd80b0 task.ti=c2df7000)
Apr 16 14:19:25 xxxxxx kernel: Stack: 089146a0 c2123020 c2dfa000 000000a8
c2df80a8 0000001b ffffffff 6d200000
Apr 16 14:19:25 xxxxxx kernel:        00000001 00000002 00000003 00000004
00000005 00000006 00000007 00000008
Apr 16 14:19:25 xxxxxx kernel:        00000009 0000000a 0000000b 0000000c
0000000d 0000000e 0000000f 00000010
Apr 16 14:19:25 xxxxxx kernel: Call Trace:
Apr 16 14:19:25 xxxxxx kernel: general protection fault: 0000 [#2]
...
etc.

Comment 24 Eduardo Habkost 2007-05-02 21:32:21 UTC
(In reply to comment #23)
> Me too: repeated problems with 2.6.20-1.2944.fc6xen.
> Non-xen is OK.

Could you post or attach the rest of the Oops message? The call trace 
information can be very useful.

Comment 25 Phil Smith 2007-05-03 09:44:33 UTC
Created attachment 154014 [details]
/var/log/messages for 2 crashes from "comment 23"

Log to go with "Comment 23" from pjs1

Comment 26 Emil Jerabek 2007-05-04 12:57:37 UTC
Created attachment 154114 [details]
oops log from /var/log/messages

Comment 27 Emil Jerabek 2007-05-04 13:01:04 UTC
(In reply to comment #26)
> Created an attachment (id=154114) [edit]
> oops log from /var/log/messages
> 

I mean, this is to show that the problem persists with 2.6.20-1.2948.fc6xen.

Comment 28 Carl-Johan Kjellander 2007-05-04 14:11:53 UTC
Wouldn't it be good to add some normal kernel guys to the CC list?

And maybe start doing a divide and conquer of kernels between
2.6.19-1.2911.6.5.fc6xen and kernel-xen-2.6.20-1.2933.fc6 to try
to pinoint what change contains the bug? Can anyone at redhat start
doing intermediate kernels for us to try?

Is there any talk on any lkml list on this bug?


Comment 29 Eduardo Habkost 2007-05-08 20:56:17 UTC
(In reply to comment #28)
> Wouldn't it be good to add some normal kernel guys to the CC list?

The problem doesn't exist in our non-xen kernel, so it probably is on the 
xen-specific parts of the code.

> 
> And maybe start doing a divide and conquer of kernels between
> 2.6.19-1.2911.6.5.fc6xen and kernel-xen-2.6.20-1.2933.fc6 to try
> to pinoint what change contains the bug? Can anyone at redhat start
> doing intermediate kernels for us to try?

Unfortunately it is not easy to make intermediate kernels because the 
kernel-xen is a result of merging of 2.6.20 and the xen patch (or, from 
another point of view, by porting the xen code to the newer kernel), and the 
bug probably was introduced during the merge process, that is manual. Doing a 
bissect would require doing re-merge of the xen patch for all the intermediate 
versions we would want to test, and this is not straightforward and can 
probably introduce other bugs (or even introduce the same bug during the 
process for older kernels, and that wouldn't tell us anything about what is 
the problem with the 2.6.20 xen patch).

> 
> Is there any talk on any lkml list on this bug?
> 

Probably not, as it is very specific to kernel-xen on Fedora.

Comment 30 Myroslav Opyr 2007-05-09 16:37:00 UTC
This bug seems most appropriate to my case. My system (2.6.20-1.2948.fc6xen,
Dom0, no DomU) didn't reboot/crashed yet, but had reported following "kernel
oops" during idle period (I cannot correlate any activity with that "kernel oops"):

iret exception: 0000 [#1]
SMP
last sysfs file: /class/net/eth0/broadcast
Modules linked in: bridge netloop netbk blktap blkbk ipv6 sunrpc xt_limit
iptable_filter ip_tables x_tables dm_mirror dm_mod video sbs i2c_ec dock button ba
CPU:    0
EIP:    0000:[<00000000>]    Not tainted VLI
EFLAGS: 00000000   (2.6.20-1.2948.fc6xen #1)
EIP is at 0x0
eax: 00000000   ebx: 007543cb   ecx: 00000073   edx: 00210246
esi: bfaeff30   edi: 0000007b   ebp: 00000000   esp: ebf1001c
ds: 0000   es: 0000   ss: 0069
Process awk (pid: 9759, ti=ebf0f000 task=c0a273b0 task.ti=ebf0f000)
Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] 0x0 SS:ESP 0069:ebf1001c
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c04303e6>] down_read+0x12/0x28
 [<c0438a56>] acct_collect+0x38/0x13e
 [<c041fc77>] do_exit+0x1b1/0x6f6
 [<c0405b2f>] die+0x20a/0x22f
 [<c0406302>] do_iret_error+0xa7/0xb1
 [<c0404e92>] restore_nocheck_notrace+0x7/0xf
 [<c0404e94>] restore_nocheck_notrace+0x9/0xf
 [<c0404e99>] restore_nocheck_notrace+0xe/0xf
 [<c042beaf>] search_exception_tables+0x14/0x25
 [<c041444f>] fixup_exception+0xb/0x20
 [<c06122f5>] do_general_protection+0x11c/0x16f
 [<c040687b>] do_IRQ+0xc6/0xdb
 [<c06121d9>] do_general_protection+0x0/0x16f
 [<c040625b>] do_iret_error+0x0/0xb1
 [<c0611ddd>] error_code+0x35/0x3c
 =======================

Let me know if I should start separate bug or this comment is enough.

BTW, I've seen request to test 2.6.20-1.2948.fc6xen kernel against this kind of
issue on the maillist. I "jumped on a vagon" only recently, thus don't know if
any previous build of kernel would crash my system but 2.6.20-1.2948.fc6xen
don't (during boot). The 2.6.20-1.2948.fc6xen kernel still produced single
"kernel oops" above, while still running (not sure if anything running died as a
result). I'll post more if I observe more of the above.

Comment 31 Adam Tkac 2007-05-11 09:24:39 UTC
Next interesting thing is that rawhide's kernel-xen-2.6.20-2925.5.fc7 works fine
for me. It must be possible base patch to fc6 kernel on fc7's source especially
when both kernels are 2.6.20 series, isn't it?

-A-

Comment 32 Adam Tkac 2007-05-11 09:39:38 UTC
Bleh, crashed when I wrote comment #31 :(

Comment 33 Daniel Berrangé 2007-05-11 11:19:12 UTC
I would have been surprised if 2.6.20-2925.5.fc7  worked, while fc6 failed
because they're based on identical Xen merge trees.

We have identified a problem with the merge on 32-bit which is definitely
responsible for a large number of hangs/crashes. There's a new rawhide kernel
which a fix available if you're able to test:

http://koji.fedoraproject.org/packages/kernel-xen-2.6/2.6.20/2925.8.fc7/

If it gets resonably positive feedback we'll update fc6 with same patches.

Comment 34 Adam Tkac 2007-05-11 12:17:10 UTC
(In reply to comment #33)
All test packages are welcomed :) I'm going to tell you my new impressions

-A-

Comment 35 Myroslav Opyr 2007-05-11 13:19:01 UTC
As a followup to comment #33, see
http://www.google.com/notebook/public/15861144119222811466/BDRgoQgoQkrfw2aci, it
has information about next 4 tracebacks of my 2.6.20-1.2948.fc6xen and 6th crash
was fatal for my system. I'll let you know if I have any success with
2.6.20-2925.5.fc7.

Comment 36 Daniel Berrangé 2007-05-11 13:24:04 UTC
Myroslav: I assume it was just a typo, but just in case... 2.6.20-2925.5.fc7
does not have the fix - make sure you try 2.6.20-2925.8.fc7 from the link in
#33. The tracebacks you posted on google are all consistent with the bug fixed
in -2925.8.fc7

Comment 37 Michael Young 2007-05-11 13:37:15 UTC
I have had the new kernel up for almost two hours, the only problem I saw was a
lockdep BUG (bug 239601), but I haven't noticed any effects.

Comment 38 Adam Tkac 2007-05-11 13:48:32 UTC
(In reply to comment #37)
Same behavior with test kernel. It hangs during boot but when I connect to
"hanged" computer and restart X server all looks fine

Comment 39 Eduardo Habkost 2007-05-16 14:47:34 UTC
*** Bug 236461 has been marked as a duplicate of this bug. ***

Comment 40 Eduardo Habkost 2007-05-16 14:50:12 UTC
*** Bug 236471 has been marked as a duplicate of this bug. ***

Comment 41 Eduardo Habkost 2007-05-16 14:52:28 UTC
*** Bug 238852 has been marked as a duplicate of this bug. ***

Comment 42 Eduardo Habkost 2007-05-16 16:38:27 UTC
*** Bug 238403 has been marked as a duplicate of this bug. ***

Comment 43 Gerry Reno 2007-05-16 17:01:24 UTC
I have a bug open on this same kernel for crashing on boot also.  That bug is
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234283 .  So are all these
somehow related?


Comment 44 Eduardo Habkost 2007-05-16 17:08:24 UTC
bug 234283 doesn't seem to be related to this bug, as this bug is for a 
kernel-xen problem, whose cause was found.

Comment 45 Myroslav Opyr 2007-05-16 18:52:11 UTC
(In reply to comment #36)
> Myroslav: I assume it was just a typo, but just in case... 2.6.20-2925.5.fc7
> does not have the fix - make sure you try 2.6.20-2925.8.fc7 from the link in
> #33. The tracebacks you posted on google are all consistent with the bug fixed
> in -2925.8.fc7

I failed to install the 2.6.20-2925.8.fc7 kernel on my FC6 system... I does not
have F7 at hand to try it out. If anyone has any hints, how F7 kernel can be
installed on FC6 system I can try, post them, or e-mail me in private, please.

BTW, in comment #35 I made 2 errors, I meant to followup my comment #30 and I
tried the 2.6.20-2925.8.fc7 kernel.

Comment 46 Leslie Satenstein 2007-05-16 20:51:56 UTC
Progress made

May 16 16:42:10 linux kernel: BUG: at kernel/lockdep.c:1858 trace_hardirqs_on()
May 16 16:42:10 linux rpc.statd[2179]: statd running as root. chown
/var/lib/nfs/statd/sm to choose different user
May 16 16:42:10 linux kernel:  [<c1005d9e>] show_trace_log_lvl+0x1a/0x2f
May 16 16:42:10 linux kernel:  [<c1006347>] show_trace+0x12/0x14
May 16 16:42:10 linux kernel:  [<c10063c2>] dump_stack+0x16/0x18
May 16 16:42:10 linux kernel:  [<c1037435>] trace_hardirqs_on+0xc4/0x143
May 16 16:42:10 linux kernel:  [<c10055d4>] restore_all+0x3b/0x3e
May 16 16:42:10 linux kernel:  =======================
May 16 16:42:10 linux kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
May 16 16:42:10 linux kernel: scsi 0:0:1:0: Attached scsi generic sg1 type 5
May 16 16:42:10 linux kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0
May 16 16:42:10 linux kernel: sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw
xa/form2 cdda tray
May 16 16:42:10 linux kernel: Uniform CD-ROM driver Revision: 3.20
May 16 16:42:10 linux kernel: e100: Intel(R) PRO/100 Network Driver, 3.5.17-k2-NAPI
May 16 16:42:10 linux kernel: e100: Copyright(c) 1999-2006 Intel Corporation
May 16 16:42:10 linux kernel: ACPI: PCI Interrupt 0000:06:08.0[A] -> GSI 20
(level, low) -> IRQ 21
May 16 16:42:10 linux kernel: e100: eth0: e100_probe: addr 0x50000000, irq 21,
MAC addr 00:16:76:0B:64:0F
May 16 16:42:10 linux kernel: intel_rng: Firmware space is locked read-only. If
you can't or
May 16 16:42:10 linux kernel: intel_rng: don't want to disable this in firmware
setup, and if
May 16 16:42:10 linux kernel: intel_rng: you are certain that your system has a
functional
May 16 16:42:10 linux kernel: intel_rng: RNG, try using the 'no_fwh_detect' option.
May 16 16:42:10 linux kernel: iTCO_vendor_support: vendor-support=0
May 16 16:42:10 linux kernel: iTCO_wdt: Intel TCO WatchDog Timer Driver v1.01
(11-Nov-2006)
May 16 16:42:10 linux kernel: iTCO_wdt: Found a ICH7 or ICH7R TCO device
(Version=2, TCOBASE=0x0460)
May 16 16:42:10 linux kernel: iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)


Even so, I am now working from the XEN kernel.  By the way, I just installed
today's i810 and other X-org updates. Perhaps now progress can move forward once
this last hic cough is fixed.

Leslie

I can rerun producing a tailored pair of dump files (messages, and Xorg)
Just ask and you shall receive. 


Comment 47 Itamar Reis Peixoto 2007-05-16 20:57:30 UTC
(In reply to comment #46)

I belive now you have this bug

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239601

Comment 48 Leslie Satenstein 2007-05-16 21:11:45 UTC
Well, here is some test results

LOOP POINT

If I boot and go to a normal user, I can log onto the system and use it.
I can logoff and log on again as both normal and root user. 
Seems OK.

But

If I boot, and the first user is root. The blue screen of death appears.
Keyboard lights work, but nothing on the display but black.  No mouse either. 
Only recourse is the hardware system reset.

Back to LOOP point,

I tried the above a few times, with consistent lockups when root is the very
first logon.



Comment 49 Eduardo Habkost 2007-05-21 13:47:24 UTC
*** Bug 233937 has been marked as a duplicate of this bug. ***

Comment 50 Eduardo Habkost 2007-05-21 22:22:31 UTC
Fix committed to CVS.

Comment 51 Eduardo Habkost 2007-05-23 16:52:16 UTC
The fix is on kernel-xen version 2.6.20-1.2952.fc6, that is available on the 
updates-testing repository.

Comment 52 Myroslav Opyr 2007-05-23 17:33:07 UTC
(In reply to comment #51)

I've booted into 2.6.20-1.2952.fc6 successfully and will be watching it into
next couple of days while building full xen environment. We've had variety of
networking-related issues with 2.6.20-fc6xen (pre 1.2952) kernels, and we'll see
if they appear into this newer one.

> The fix is on kernel-xen version 2.6.20-1.2952.fc6, that is available on the 
> updates-testing repository.



Comment 53 Jan ONDREJ 2007-05-23 17:39:36 UTC
Works well for me on 3 guests and 1 host aprox one day.
Thank you. :)


Comment 54 Eduardo Habkost 2007-05-23 17:42:35 UTC
(In reply to comment #52)
> We've had variety of
> networking-related issues with 2.6.20-fc6xen (pre 1.2952) kernels, and we'll 
see
> if they appear into this newer one.

Maybe your networking problems are related to bug #223258, that is fixed on 
rawhide/F7, but not fixed on FC6 yet.

Comment 55 Myroslav Opyr 2007-05-23 17:57:52 UTC
(In reply to comment #54)
> (In reply to comment #52)
> > We've had variety of
> > networking-related issues with 2.6.20-fc6xen (pre 1.2952) kernels, and we'll 
> > see if they appear into this newer one.
> 
> Maybe your networking problems are related to bug #223258, that is fixed on 
> rawhide/F7, but not fixed on FC6 yet.

It look like not. Dom0 was doing NAT for xen guests (we are not allowing
internal MAC/IP to go out of the box) and ip_conntrac failed to do proper
connection tracking then.

As far as I remember, ICMP packets were able to escape the xen-guest and ICMP
replies returned there but anything TCP-related failed to route properly (while
not being marked by ip_conntrac as RELATED at Dom0).

Comment 56 Leslie Satenstein 2007-05-23 20:34:04 UTC
Wednesday May 23rd
Well, with the update to 2925.9 XEN kernel and supporting files, my system now
consistantly locks up at the time it has to switch to Gnome.

The boot process is ok, but the problem is in switching.

At first I had to do a first boot with XEN, after lockup, reboot, and then XEN
would show the Gnome prompt. That led me to believe that there is some
uninitialized memory that, after the boot process, is initialized for the next
reboot.

So, with 2925.8 Fc7, the system was working as I described earlier, but now,
after installing this latest XEN kernel, we have the problem again.
I have installed... 

 vmlinuz-2.6.20-2925.8.fc7xen
and 
 vmlinuz-2.6.20-2925.9.fc7xen

One more comment, the cpu microcode module is used for non-XEN kernels, should
it be included in the XEN version? I ask that because I am getting an error
message about it being missing for XEN.

(my processor, is intel d930, mother board intel d945gnt, memory 1 gig, graphics
driver i810.  

Comment 57 Leslie Satenstein 2007-05-23 20:43:13 UTC
Created attachment 155293 [details]
Abridged Message file (showing XEN problems)

I pruned away the non-XEN stuff.

Comment 58 Myroslav Opyr 2007-05-25 14:34:37 UTC
I'm running 2.6.20-1.2952.fc6xen for almost 2 days already (mostly idle). No
reboots, not lockups, no oops...

The only thing I've got is following "4gb seg fixup" which can be related to
#215201: 
May 24 04:12:56 anon kernel: 4gb seg fixup, process prelink (pid 21353), cs:ip
73:08083da1
May 24 04:12:56 anon last message repeated 9 times
May 24 04:12:56 anon init: Trying to re-exec init

Comment 59 Askar Ali Khan 2007-06-03 17:42:15 UTC
I am going to give a try to 2.6.20-1.2952.fc6xen soon, look like this latest
kernel-xen fixed all the instability issues with previous kernel-xen (last 3
perhaps).

Last working kernel here is 2.6.19-1.2911.6.5.fc6xen.

Askar

Comment 60 Eduardo Habkost 2007-06-04 14:05:15 UTC
*** Bug 236737 has been marked as a duplicate of this bug. ***

Comment 61 Askar Ali Khan 2007-06-07 11:05:57 UTC
I have updated kernel-xen with 2.6.20-1.2952.fc6xen on one of our hosts and its
been working cool from last 17 hours, nothing in logs. Dom0 and demU (5) working
just fine, I hope finally we are again back on track :)

I'll watching this host for 24+ hours, then will go to update other 2 hosts
kernel-xen.

Thanks. Askar

Comment 62 Phil Lobbes 2007-06-07 13:43:54 UTC
With kernel-xen-2.6.20-1.2952.fc6 I've been stable for 3.5 days now (upgraded
from kernel-xen-2.6.19-1.2911.6.5.fc6).  Looks like the instability problems
related to  the interim xen kernels has been resolved.

Comment 63 Eduardo Habkost 2007-06-07 17:07:13 UTC
kernel-xen-2.6.20-1.2952.fc6 went to FC6 updates on May 30th:
http://fedoraproject.org/wiki/FSA/FC6/FEDORA-2007-513

Closing bug.

Comment 64 Eduardo Habkost 2007-06-07 17:10:44 UTC
*** Bug 236474 has been marked as a duplicate of this bug. ***

Comment 65 antony osullivan 2007-07-20 15:19:02 UTC
Most excellent-o... this fix... fixed all my problems!  Thanks very much-o!

Comment 66 Greg Huber 2007-07-25 13:06:31 UTC
my apologies to greno 

Comment 67 Eduardo Habkost 2007-10-15 13:46:20 UTC
*** Bug 238350 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.