Bug 236474

Summary: kernel-xen-2.6.20-1.2944.fc6 reboots
Product: [Fedora] Fedora Reporter: Askar Ali Khan <asraikhn>
Component: kernel-xenAssignee: Eduardo Habkost <ehabkost>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: bstein, itamar, katzj, sts+redhat-bugzilla, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-07 17:10:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Askar Ali Khan 2007-04-14 20:33:05 UTC
Description of problem:

Machine (crashes)/reboots after few mintues (maximum 30 minutes)every unstable,
logs 

Version-Release number of selected component (if applicable):

kernel-xen-2.6.20-1.2944.fc6

How reproducible:

Its a production xen base host, I can't afforad more testing on it, but it
rebooted 2 times, so can't afford to be on this buggy kernel, just fall back to
last working kernel ie 2.6.19-1.2911.6.5.fc6xen and everything is once again normal

Steps to Reproduce:
1. Upates kernel-xen-2.6.20-1.2944.fc6 via yum
2. reboots
3. waits while its reboots
  
Actual results:
Sometime its reboots after 20 minutes and sometime after 15, nothing on screen
in logs i find these..

Apr 14 19:04:27 xxxxx kernel: CPU:    0
Apr 14 19:04:27 xxxxx kernel: EIP:    0000:[<00000000>]    Not tainted VLI
Apr 14 19:04:27 xxxxx kernel: EFLAGS: 00000000   (2.6.20-1.2944.fc6xen #1)
Apr 14 19:04:27 xxxxx kernel: EIP is at 0x0
Apr 14 19:04:27 xxxxx kernel: eax: 00000000   ebx: 00809f01   ecx: 00000073  
edx: 00200246
Apr 14 19:04:27 xxxxx kernel: esi: b7fca384   edi: 0000007b   ebp: 00000000  
esp: eb09b01c
Apr 14 19:04:27 xxxxx kernel: ds: 0000   es: 0000   ss: 0069
Apr 14 19:04:27 xxxxx kernel: Process tapdisk (pid: 2838, ti=eb09a000
task=ec7e8b30 task.ti=eb09a000)
Apr 14 19:04:27 xxxxx kernel: Stack: 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
Apr 14 19:04:27 xxxxx kernel:        00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
Apr 14 19:04:27 xxxxx kernel:        00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
Apr 14 19:04:27 xxxxx kernel: Call Trace:
Apr 14 19:04:27 xxxxx kernel:  =======================
Apr 14 19:04:27 xxxxx kernel: Code:  Bad EIP value.
Apr 14 19:04:27 xxxxx kernel: EIP: [<00000000>] 0x0 SS:ESP 0069:eb09b01c



Apr 14 19:07:13 xxxxx ntpd[1889]: synchronized to 69.20.226.105, stratum 2
Apr 14 19:12:04 xxxxx kernel: tap tap-6-2049: 2 getting info
Apr 14 19:12:04 xxxxx kernel: tap tap-6-2050: 2 getting info
Apr 14 19:12:04 xxxxx kernel: tap tap-6-2051: 2 getting info
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: Reached Fail_flush
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: Reached Fail_flush
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid kernel buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: invalid user buffer -- could not remap it
Apr 14 19:12:29 xxxxx kernel: blk_tap: Reached Fail_flush

Thanks. Askar

Comment 1 Itamar Reis Peixoto 2007-04-14 20:40:01 UTC
I have added the noreboot option to grub and my dom0 seems to be survive and 
not rebooting

Comment 2 Itamar Reis Peixoto 2007-04-14 20:42:08 UTC
noreboot doesn't have fixed, dead after 1 hour of uptime.

[root@serv ~]# (XEN) (file=extable.c, line=77) Pre-exception: ff1619f4 -> 
ff163ce3
(XEN) (file=traps.c, line=1518) GPF (4814): ff163d28 -> ff163d39
(XEN) (file=traps.c, line=1518) GPF (0000): ff161ba1 -> ff161cba
(XEN) domain_crash_sync called from entry.S (ff163d78)
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.0.3-0-1.2944.fc6  x86_32p  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) EIP:    4817:[<c1ebcee4>]
(XEN) EFLAGS: 40690fed   CONTEXT: guest
(XEN) eax: 00000000   ebx: c04013a7   ecx: 00000061   edx: 00000246
(XEN) esi: c04080fa   edi: c1ec2500   ebp: ffffffff   esp: c073425a
(XEN) cr0: 8005003b   cr4: 000006f0   cr3: 77d56000   cr2: b7ff5000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 4817
(XEN) Guest stack trace from esp=c073425a:
(XEN)    00000000 00040000 00000000 00000000 00000000 00200000 00040000 00000000
(XEN)    02000000 00000000 00000000 00000000 428c0000 428cc073 0001c073 4ead0000
(XEN)    ffffdead ffffffff e658ffff ccccc093 cccccccc cccccccc cccccccc cccccccc
(XEN)    cccccccc 6ec0cccc 4140c0e4 8d40c073 dd78ed7e e3c0c0d3 0002ee09 00000000
(XEN)    000d0000 02000000 00000000 00000000 1eed0100 ffffdeaf ffffffff 0000ffff
(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00040000
(XEN)    00000000 00040000 00000000 00000000 00000000 00200000 00030000 00000000
(XEN)    02000000 00000000 00000000 00000000 434c0000 434cc073 0001c073 4ead0000
(XEN)    ffffdead ffffffff eb00ffff ccccc093 cccccccc cccccccc cccccccc cccccccc
(XEN)    cccccccc 4740cccc 7368c073 0000c046 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 00000000 1eed0100 ffffdeaf ffffffff 0000ffff
(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 00000000 440c0000 440cc073 0001c073 4ead0000
(XEN)    ffffdead ffffffff 0000ffff cccc0000 cccccccc cccccccc cccccccc cccccccc
(XEN)    cccccccc 4080cccc 7368c073 0000c046 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 00000000 1eed0100 ffffdeaf ffffffff 0000ffff
(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 00000000 44cc0000 44ccc073 0001c073 4ead0000
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.


Comment 3 S Senator 2007-04-14 21:09:41 UTC
One way to reproduce appears to be as follows:
1. Export an NFS file system from the domain0 to the domainU
   Export over a brouter interface to the domainU.
2. Mount it in the domainU
3. Copy a file >2Gb from a non-NFS mounted partition to the NFS-mounted partition 
Notes:
- The domain0 may lose access to any host accessible on the brouter, shortly
before. This is hard to catch in time without logging from a console port
- Changing the domU to use UDP instead of TCP or reducing the rsize/wsize= NFS
mount options in the domU appear to delay, but not prevent, triggering this as soon.
- This behavior did not occur in 2.6.19 based kernels.


Comment 4 Askar Ali Khan 2007-04-30 15:17:07 UTC
No updates ?

Look like we have to stick with 2.6.19-1.2911.6.5.fc6xen which is the last
working  kernel-xen.

Or they are plaining to fix it while releasing 2.6.21.x :)

Thanks. Askar

Comment 5 Itamar Reis Peixoto 2007-04-30 15:21:42 UTC
2.6.19-1.2911.6.5.fc6xen doesn't work for me.

anyone have a estimated time to new release of  xen packages in fedora  ?




Comment 6 Eduardo Habkost 2007-04-30 23:20:22 UTC
Why 2.6.19 doesn't work? Do it have problems for you, also?

I don't have an estimate on the time to debug the instability/rebooting bugs. 
But there may be some work (in parallel) to update the FC6 kernel to 2.6.21, 
soon, and there is a possibility of the 2.6.21 update solving the instability 
problems. At least I hope so.  :)

Comment 7 Jeff Layton 2007-05-02 20:56:29 UTC
I'm seeing some similar crashes on my home machine, and was able to get the
stack trace on the serial console by adding the noreboot option to the xen
kernel command line (without that, the box would just spontaneously reboot
before ever outputting it):

(XEN) domain_crash_sync called from entry.S (ff161d99)

(XEN) Domain 0 (vcpu#0) crashed on cpu#0:

(XEN) ----[ Xen-3.0.3-0-1.2948.fc6  x86_32p  debug=n  Not tainted ]----

(XEN) CPU:    0

(XEN) EIP:    0061:[<c0404e99>]

(XEN) EFLAGS: 00210292   CONTEXT: guest

(XEN) eax: 00000000   ebx: 007e564f   ecx: 00000073   edx: 00200297

(XEN) esi: bfccf44c   edi: 0000007b   ebp: 00000000   esp: e72ee010

(XEN) cr0: 80050033   cr4: 000006f0   cr3: 9aa7b000   cr2: 08df40) ds: 0000  
es: 0000   fs: 0000   gs: 0000   ss: 0069   cs: 0061

(XEN) Guest stack trace from esp=e72ee010:

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 2f610025 00000000 00000000 00000000 2f612025 00000000

(XEN)    (XEN)    2f617025 00000000 2f618025 00000000 2f619025 00000000 00000000
00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    2f61f025 00000000 2f620025 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 000000000 00000000
00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 00000000 00000000 00000000000

(XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

(XEN)    00000000 00000000 2f650025 00000000 2f651025 00000000 00000000 00000000

(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.


It says "not tainted", but I actually have the nvidia module plugged in in this
case. I've been able, however, to reproduce the problem without it though.

The 2.6.19 xen kernels were pretty stable on this machine, but I've had no luck
whatsoever with the 2.6.20 series. Both 2.6.19 and 2.6.20 xen kernels, however
work fine on my work machine. So there seems to be something hardware-specific
about these problems. My home box is a AMD X2, and work is a dual dual-core xeon
box, so there are some not-insignificant differences.

I'll be happy to collect info or test kernels/boot options if you can suggest
anything...


Comment 8 Askar Ali Khan 2007-05-03 10:49:40 UTC
So we got another update for kernel-xen kernel-xen.i686[2.6.20-1.2948.fc6, duno
if its fix the reboots/crashing problem or not?


Thanks.

Comment 9 Eduardo Habkost 2007-05-23 16:55:22 UTC
This may be the same problem reported on bug #234008. Could you test using 
kernel-xen-2.6.20-1.2952.fc6, that is available on the Fedora Core 6 
updates-testing repository?

Comment 10 Eduardo Habkost 2007-06-04 12:20:27 UTC
Any test results using kernel-xen-2.6.20-1.2952.fc6?

Comment 11 Askar Ali Khan 2007-06-04 15:47:38 UTC
I'll give a try to kernel-xen-2.6.20-1.2952.fc6 which is available via yum and
then update if the the problem persist. 

Askar.

Comment 12 Eduardo Habkost 2007-06-05 18:41:47 UTC
Thanks. I will keep the "needinfo" flag, so the system will remember me that I 
am waiting for the 2.6.20-1.2952.fc6 test results, when checking the list of 
open bugs.  :)

Comment 13 Askar Ali Khan 2007-06-07 11:03:34 UTC
I have updated kernel-xen with 2.6.20-1.2952.fc6xen on one of our hosts and its
been working cool from last 17 hours, nothing in logs. Dom0 and demU (5) working
just fine, I hope finally we are again back on track :)

I'll watching this host for 24+ hours, then will go to update other 2 hosts
kernel-xen.

Thanks. Askar

Comment 14 Eduardo Habkost 2007-06-07 17:10:40 UTC
Thanks for the information. Marking this bug as another instance of bug 
#234008.

*** This bug has been marked as a duplicate of 234008 ***