I have extracted Xen patches from kernel-xen-2.6-2.6.25-1.fc9.src.rpm and applied them to 2.6.25. I am running this guest as XenU on Xen 3.2.0. Sometimes when VM is swapping, I get this kernel panic: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff8055c000, task ffffffff8051d320) Stack: 0000000000000001 ffffffff8055dcc8 ffffffff805278c0 0000000000000000 ffffffff8055dcc8 ffffffff8055dcc8 ffff880001388080 0000000000000000 000000000000000b ffffffff8055ddd8 ffffffff8051d320 ffff88000fc48f00 Call Trace: [<ffffffff80211ffd>] ? oops_end+0x7d/0x80 [<ffffffff8021340c>] ? do_invalid_op+0x8c/0xa0 [<ffffffff804878d0>] ? xen_failsafe_callback+0x0/0x10 [<ffffffff8023bd46>] ? alloc_pid+0x296/0x3d0 [<ffffffff8027e71d>] ? kmem_cache_alloc+0x6d/0xd0 [<ffffffff804874ab>] ? error_exit+0x0/0x61 [<ffffffff804878d0>] ? xen_failsafe_callback+0x0/0x10 [<ffffffff804878d0>] ? xen_failsafe_callback+0x0/0x10 [<ffffffff8020e668>] ? xen_clocksource_read+0x38/0x60 [<ffffffff8020c4b2>] ? xen_mc_flush+0x52/0x180 [<ffffffff8020c0d6>] ? xen_load_tls+0x76/0xa0 [<ffffffff8020f908>] ? __switch_to+0x88/0x370 [<ffffffff8020be4b>] ? xen_write_cr3+0x15b/0x170 [<ffffffff80485d92>] ? thread_return+0x0/0x15e [<ffffffff8020c3c0>] ? xen_idle+0x0/0x50 [<ffffffff802101a0>] ? cpu_idle+0x60/0x70 Code: 83 bb 08 07 00 00 00 74 05 e8 5f 2d 16 00 48 8b bb 60 07 00 00 48 85 ff 74 05 e8 fe 9c 05 00 48 c7 03 40 00 00 00 e8 b2 74 25 00 <0f> 0b eb fe 8b 8b 20 01 00 00 83 f9 ff 0f 85 27 ff ff ff 44 8b RIP [<ffffffff8022e73e>] do_exit+0x51e/0x750 RSP <ffffffff8055dca8> ---[ end trace b060fa16d852ccc5 ]--- Kernel panic - not syncing: Attempted to kill the idle task!
Created attachment 303872 [details] Another kernel panic
Created attachment 303873 [details] Another kernel panic #2
Created attachment 303874 [details] config Attaching 2 similar kernel panics with different test load and config file. The original kernel panic is while compiling gcc on swap, next is stressed mysql server, and another one is postfix with some milter spam checks. Also attaching kernel config file. GCC compilation heavily stressed swap, but mysql and postfix swapped only barely.
Jan: thanks for giving this a shot; if you want to keep up with the latest x86_64 work, try building the master branch from: http://git.et.redhat.com/?p=xen-pvops-64.git The issue here is that a kernel bug can cause xen_failsafe callback to be called, but we don't currently implement that - i.e. the underlying cause is a kernel bug, but we should implement xen_failsafe_callback to handle the bug more gracefully anyway. See bug #442949 for another example. Eduardo: re-assigning to you; note that it isn't a kernel-xen RPM build, but the x86_64 patches on 2.6.25
Just for future reference - we're talking about patches from the xen-pvops-64-2.6.25-rc9-markmc1 tag here, probably
Jan: try again with the xen-pvops-64-2.6.25-rc9-ehabkost2 tag
Thanks Mark. I am currently in Las Vegas, I will give it a try as soon as I return back to Slovakia (19th May).
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Mark, I am still getting crash, I hope I extracted patches correctly (I am new to git), I extracted them with the command below and applied them to 2.6.25.4: git-format-patch 120dd64cacd4fb796bca0acba3665553f1d9ecaa..3e1f4183fe564f53cf3642cc2ad9582c683a33ed (first is Linux-2.6.25-rc9 commit, second is 'Clear %fs on xen_load_tls() commit) The workload was asterisk doing 4-way phone conference over network. RBP: ffff88000ee7de14 R08: ffff88000ee7c000 R09: 0000000000000001 R10: 0000000000000001 R11: ffffffff8020eab0 R12: ffff88000ff4f180 R13: ffff88000ee7de2c R14: ffff88000ee7de08 R15: 0000000000000000 FS: 00007f230c0716f0(0063) GS:ffffffff8054f000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000ed24000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process asterisk (pid: 2799, threadinfo ffff88000ee7c000, task ffff88000fc7c780) Stack: ffffffff8028f93d ffff88000ee6e600 ffff88000ee7dbb8 ffff88000ee6e870 ffff88000ee7df68 0000000040ce6770 0000000000000000 ffffffff802906d0 0000000000000000 0000000300000000 ffff88000ffebc00 ffffffff00000000 Call Trace: [<ffffffff8028f93d>] ? do_sys_poll+0x14d/0x380 [<ffffffff802906d0>] ? __pollwait+0x0/0x130 [<ffffffff802266a0>] ? default_wake_function+0x0/0x10 [<ffffffff802266a0>] ? default_wake_function+0x0/0x10 [<ffffffff802266a0>] ? default_wake_function+0x0/0x10 [<ffffffff803de97b>] ? sock_sendmsg+0xcb/0x100 [<ffffffff8039c31a>] ? rb_insert_color+0x8a/0xf0 [<ffffffff802892a8>] ? pipe_read+0x2d8/0x420 [<ffffffff8023e3a0>] ? autoremove_wake_function+0x0/0x30 [<ffffffff80288fd9>] ? pipe_read+0x9/0x420 [<ffffffff80282289>] ? do_sync_read+0xd9/0x120 [<ffffffff803de261>] ? sockfd_lookup_light+0x41/0x80 [<ffffffff803df554>] ? sys_sendto+0x174/0x180 [<ffffffff804874ab>] ? error_exit+0x0/0x61 [<ffffffff804874d8>] ? error_exit+0x2d/0x61 [<ffffffff804874ab>] ? error_exit+0x0/0x61 [<ffffffff8020e668>] ? xen_clocksource_read+0x38/0x60 [<ffffffff80290cee>] ? sys_poll+0x2e/0x90 [<ffffffff80211110>] ? sysret_check+0x1c/0x64 [<ffffffff802110ea>] ? system_call_after_saveargs+0x38/0x3d [<ffffffff80487930>] ? xen_system_call_entry+0x0/0x35 Code: c3 0f 1f 40 00 e9 5b fc ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 00 00 00 00 48 8b 80 18 06 00 00 c7 06 00 00 00 00 <83> 38 01 75 1f 48 8b 50 08 3b 3a 73 11 89 f8 48 c1 e0 03 48 03 RIP [<ffffffff80283796>] fget_light+0x16/0x90 RSP <ffff88000ee7db80> CR2: 0000000000000000 ---[ end trace b484998d28b71338 ]--- Fixing recursive fault but reboot is needed! ------------[ cut here ]------------ kernel BUG at /usr/src/linux-2.6.25-gentoo-r1/kernel/exit.c:1022! invalid opcode: 0000 [3] CPU 0 Pid: 0, comm: swapper Tainted: G D 2.6.25-gentoo-r1 #4 RIP: e030:[<ffffffff8022e73e>] [<ffffffff8022e73e>] do_exit+0x51e/0x750 RSP: e02b:ffffffff8055dca8 EFLAGS: 00010246 RAX: ffffffff8055dfd8 RBX: ffff88000fc7c780 RCX: 0000000000000100 RDX: 0000000000000000 RSI: ffff88000fd5a780 RDI: ffffffff805a9068 RBP: 0000000000000020 R08: ffffffff8055c000 R09: 00000000ffffffff R10: 0000000000000001 R11: ffffffff8044fab0 R12: ffff88000fc7c860 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f230c0716f0(0000) GS:ffffffff8054f000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000000ec4c000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff8055c000, task ffffffff8051d320) Stack: 0000000000000001 ffffffff8055dcc8 ffffffff805278c0 0000000000000000 ffffffff8055dcc8 ffffffff8055dcc8 ffff880001388080 0000000000000000 000000000000000b ffffffff8055ddd8 ffffffff8051d320 ffff88000fc7c780 Call Trace: [<ffffffff80211ffd>] ? oops_end+0x7d/0x80 [<ffffffff8021340c>] ? do_invalid_op+0x8c/0xa0 [<ffffffff804878d0>] ? xen_failsafe_callback+0x0/0x10 [<ffffffff8023bd46>] ? alloc_pid+0x296/0x3d0 [<ffffffff8027e71d>] ? kmem_cache_alloc+0x6d/0xd0 [<ffffffff804874ab>] ? error_exit+0x0/0x61 [<ffffffff804878d0>] ? xen_failsafe_callback+0x0/0x10 [<ffffffff804878d0>] ? xen_failsafe_callback+0x0/0x10 [<ffffffff8020e668>] ? xen_clocksource_read+0x38/0x60 [<ffffffff804359c0>] ? udp_poll+0x0/0x100 [<ffffffff8020c4b2>] ? xen_mc_flush+0x52/0x180 [<ffffffff8020c0d6>] ? xen_load_tls+0x76/0xa0 [<ffffffff8020f908>] ? __switch_to+0x88/0x370 [<ffffffff80485d92>] ? thread_return+0x0/0x15e [<ffffffff8020c3c0>] ? xen_idle+0x0/0x50 [<ffffffff802101a0>] ? cpu_idle+0x60/0x70 Code: 83 bb 08 07 00 00 00 74 05 e8 5f 2d 16 00 48 8b bb 60 07 00 00 48 85 ff 74 05 e8 fe 9c 05 00 48 c7 03 40 00 00 00 e8 b2 74 25 00 <0f> 0b eb fe 8b 8b 20 01 00 00 83 f9 ff 0f 85 27 ff ff ff 44 8b RIP [<ffffffff8022e73e>] do_exit+0x51e/0x750 RSP <ffffffff8055dca8> ---[ end trace b484998d28b71338 ]--- Kernel panic - not syncing: Attempted to kill the idle task!
I have tried xen-pvops-2.6.26-rc5-ehabkost2 on 2 VMs. They are running fine for more than 24 hours. Before, I have seen crashes in less than 1 hour. I will slowly start with migrating less-important production VMs to test it even more. I am marking this bug as fixed.