Bug 606052

Summary: F13 XEN guest kernel crashes on F8 XEN hypervisor
Product: [Fedora] Fedora Reporter: Jacques Amar <laxfedorabug>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 13CC: drjones
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-04 22:50:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jacques Amar 2010-06-20 07:39:16 UTC
Description of problem:
F13 kernel-xen runninf on F8 xen hypervisor constantly crashes

Version-Release number of selected component (if applicable):
2.6.33.5-124.fc13.x86_64

How reproducible:
Start the XEN virtual machine and eventually crashes

Steps to Reproduce:
1. start a VM
2. wait ...
3.
  
Actual results:
first sign is a non responding shell and CPU (on xm top) @ 100%


Expected results:


Additional info:
Have 4 VM machines and all react the same. If I boot back into F12 - no issues

Example in the logs:

Jun 13 13:24:00 lax1 kernel: CPU 0
Jun 13 13:24:00 lax1 kernel: Pid: 269, comm: kjournald Not tainted 2.6.33.5-112.fc13.x86_64 #1 /
Jun 13 13:24:00 lax1 kernel: RIP: e030:[<ffffffff8100122a>]  [<ffffffff8100122a>] hypercall_page+0x22a/0x1006
Jun 13 13:24:00 lax1 kernel: RSP: e02b:ffff88000300f990  EFLAGS: 00000246
Jun 13 13:24:00 lax1 kernel: RAX: 0000000000030001 RBX: ffff88002a05c228 RCX: ffffffff8100122a
Jun 13 13:24:00 lax1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jun 13 13:24:00 lax1 kernel: RBP: ffff88000300f9a8 R08: ffff880003dd7480 R09: ffff8800031dd208
Jun 13 13:24:00 lax1 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: ffff88002f296000
Jun 13 13:24:00 lax1 kernel: R13: 0000000000011200 R14: ffffffff810c29be R15: 0000000000011200
Jun 13 13:24:00 lax1 kernel: FS:  00007ff03a679740(0000) GS:ffff880003dc5000(0000) knlGS:0000000000000000
Jun 13 13:24:00 lax1 kernel: CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 13 13:24:00 lax1 kernel: CR2: 0000000000ca3bb8 CR3: 000000000a154000 CR4: 0000000000000660
Jun 13 13:24:00 lax1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 13 13:24:00 lax1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
Jun 13 13:24:00 lax1 kernel: Process kjournald (pid: 269, threadinfo ffff88000300e000, task ffff88000333dd40)
Jun 13 13:24:00 lax1 kernel: Stack:
Jun 13 13:24:00 lax1 kernel: ffff880002fe18b0 0000000000000000 ffffffff81005f05 ffff88000300fa50
Jun 13 13:24:00 lax1 kernel: <0> ffffffff810065e2 ffff880002fe18b0 0000000000000001 ffff8800031dd208
Jun 13 13:24:00 lax1 kernel: <0> ffff880003dd7480 0000000000010200 0000000000011200 0000000000000000
Jun 13 13:24:00 lax1 kernel: Call Trace:
Jun 13 13:24:00 lax1 kernel: [<ffffffff81005f05>] ? xen_force_evtchn_callback+0xd/0xf
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065e2>] check_events+0x12/0x20
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065cf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 13 13:24:00 lax1 kernel: [<ffffffff810f5873>] ? kmem_cache_alloc+0xa2/0x10f
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065cf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c29be>] mempool_alloc_slab+0x10/0x12
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c2aa2>] mempool_alloc+0x6c/0x11e
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c2aa2>] ? mempool_alloc+0x6c/0x11e
Jun 13 13:24:00 lax1 kernel: [<ffffffff8134e4b1>] alloc_tio+0x21/0x39
Jun 13 13:24:00 lax1 kernel: [<ffffffff8134fa4a>] __split_and_process_bio+0x23c/0x529
Jun 13 13:24:00 lax1 kernel: [<ffffffff81005f05>] ? xen_force_evtchn_callback+0xd/0xf
Jun 13 13:24:00 lax1 kernel: [<ffffffff813500c6>] dm_request+0x1c8/0x1db
Jun 13 13:24:00 lax1 kernel: [<ffffffff811ea91b>] generic_make_request+0x2c8/0x321
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c2aa2>] ? mempool_alloc+0x6c/0x11e
Jun 13 13:24:00 lax1 kernel: [<ffffffff811eaa41>] submit_bio+0xcd/0xea
Jun 13 13:24:00 lax1 kernel: [<ffffffff81120cc5>] submit_bh+0xef/0x111
Jun 13 13:24:00 lax1 kernel: [<ffffffff8119c8a9>] journal_commit_transaction+0x9c8/0xfd7
Jun 13 13:24:00 lax1 kernel: [<ffffffff810586f0>] ? try_to_del_timer_sync+0x6e/0x7c
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065cf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 13 13:24:00 lax1 kernel: [<ffffffff8119f987>] kjournald+0xe3/0x220
Jun 13 13:24:00 lax1 kernel: [<ffffffff8106480b>] ? autoremove_wake_function+0x0/0x34
Jun 13 13:24:00 lax1 kernel: [<ffffffff8142ae38>] ? _raw_spin_unlock_irqrestore+0x14/0x16
Jun 13 13:24:00 lax1 kernel: [<ffffffff8119f8a4>] ? kjournald+0x0/0x220
Jun 13 13:24:00 lax1 kernel: [<ffffffff810643bb>] kthread+0x7a/0x82
Jun 13 13:24:00 lax1 kernel: [<ffffffff8100a924>] kernel_thread_helper+0x4/0x10
Jun 13 13:24:00 lax1 kernel: [<ffffffff81009d21>] ? int_ret_from_sys_call+0x7/0x1b
Jun 13 13:24:00 lax1 kernel: [<ffffffff8142b29d>] ? retint_restore_args+0x5/0x6
Jun 13 13:24:00 lax1 kernel: [<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10
Jun 13 13:24:00 lax1 kernel: Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
Jun 13 13:24:00 lax1 kernel: Call Trace:
Jun 13 13:24:00 lax1 kernel: [<ffffffff81005f05>] ? xen_force_evtchn_callback+0xd/0xf
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065e2>] check_events+0x12/0x20
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065cf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 13 13:24:00 lax1 kernel: [<ffffffff810f5873>] ? kmem_cache_alloc+0xa2/0x10f
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065cf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c29be>] mempool_alloc_slab+0x10/0x12
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c2aa2>] mempool_alloc+0x6c/0x11e
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c2aa2>] ? mempool_alloc+0x6c/0x11e
Jun 13 13:24:00 lax1 kernel: [<ffffffff8134e4b1>] alloc_tio+0x21/0x39
Jun 13 13:24:00 lax1 kernel: [<ffffffff8134fa4a>] __split_and_process_bio+0x23c/0x529
Jun 13 13:24:00 lax1 kernel: [<ffffffff81005f05>] ? xen_force_evtchn_callback+0xd/0xf
Jun 13 13:24:00 lax1 kernel: [<ffffffff813500c6>] dm_request+0x1c8/0x1db
Jun 13 13:24:00 lax1 kernel: [<ffffffff811ea91b>] generic_make_request+0x2c8/0x321
Jun 13 13:24:00 lax1 kernel: [<ffffffff810c2aa2>] ? mempool_alloc+0x6c/0x11e
Jun 13 13:24:00 lax1 kernel: [<ffffffff811eaa41>] submit_bio+0xcd/0xea
Jun 13 13:24:00 lax1 kernel: [<ffffffff81120cc5>] submit_bh+0xef/0x111
Jun 13 13:24:00 lax1 kernel: [<ffffffff8119c8a9>] journal_commit_transaction+0x9c8/0xfd7
Jun 13 13:24:00 lax1 kernel: [<ffffffff810586f0>] ? try_to_del_timer_sync+0x6e/0x7c
Jun 13 13:24:00 lax1 kernel: [<ffffffff810065cf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 13 13:24:00 lax1 kernel: [<ffffffff8119f987>] kjournald+0xe3/0x220
Jun 13 13:24:00 lax1 kernel: [<ffffffff8106480b>] ? autoremove_wake_function+0x0/0x34
Jun 13 13:24:00 lax1 kernel: [<ffffffff8142ae38>] ? _raw_spin_unlock_irqrestore+0x14/0x16
Jun 13 13:24:00 lax1 kernel: [<ffffffff8119f8a4>] ? kjournald+0x0/0x220
Jun 13 13:24:00 lax1 kernel: [<ffffffff810643bb>] kthread+0x7a/0x82
Jun 13 13:24:00 lax1 kernel: [<ffffffff8100a924>] kernel_thread_helper+0x4/0x10
Jun 13 13:24:00 lax1 kernel: [<ffffffff81009d21>] ? int_ret_from_sys_call+0x7/0x1b
Jun 13 13:24:00 lax1 kernel: [<ffffffff8142b29d>] ? retint_restore_args+0x5/0x6
Jun 13 13:24:00 lax1 kernel: [<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10

Comment 1 Andrew Jones 2010-06-21 12:01:04 UTC
Since the f12 kernel you are using works, and the f13 doesn't, can you please try to bisect it down using the kernels available here?

http://kojipkgs.fedoraproject.org/packages/kernel/

Thanks,
Andrew

Comment 2 Jacques Amar 2010-06-21 23:16:23 UTC
(In reply to comment #1)
> Since the f12 kernel you are using works, and the f13 doesn't, can you please
> try to bisect it down using the kernels available here?
> 
> http://kojipkgs.fedoraproject.org/packages/kernel/
> 
> Thanks,
> Andrew    

Quick attempts before a more thorough testing:

last F12 : kernel-2.6.32.14-134.fc12.x86_64.rpm  -> STABLE
first F13 : kernel-2.6.33.1-17.fc13.x86_64.rpm -> crash during boot.

will try more kernels in the 2.6.33 branch (latest - earliest - middle of the date range)

Comment 3 Jacques Amar 2010-08-04 22:50:56 UTC
Fixed in:

kernel-2.6.33.5-133.fc13.x86_64 and still in 
kernel-2.6.33.6-147.2.4.fc13.x86_64

I *think* that xen-libs-3.4.3-2.fc13.x86_64 were the catalyst.

Previous FC13 versions would always crash.