Bug 671161
| Summary: | xen microcode WARN on save-restore | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Andrew Jones <drjones> |
| Component: | kernel | Assignee: | Andrew Jones <drjones> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.0 | CC: | anton, imammedo, jwest, mjenner, qwan, xen-maint |
| Target Milestone: | rc | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.32-117.el6 | Doc Type: | Bug Fix |
| Doc Text: |
If the microcode module was loaded, saving and restoring a Xen guest
returned a warning message and a backtrace error. With this update,
backtrace errors are no longer returned, and saving and restoring a Xen
guest works as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-05-23 20:38:32 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 710632 | ||
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. I haven't had time to try and come up with a clean way to handle this when running on Xen for 6.1. Can we just blacklist this module when on Xen somehow? Andrew, seems someone already tried to fix it: http://marc.info/?l=linux-kernel&m=126105863415715&w=2 could you try the patch also, whether it fixes the case? I will push into upstream. thanks, Andrew, here the kernels with the patch integrated: http://people.redhat.com/aarapov/kernel/ Patch is in mm-tree. http://marc.info/?l=linux-mm-commits&m=129720269928412&w=2 - would you backport it? - is this bug a blocker? Yes, I need to backport it now for 6.1 since it's pretty ugly to get a big backtrace on every save/restore. I couldn't test it until now as save/restore had other issues (bug 676009). Other issues resolved I've now tested this patch and will post it today. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-117.el6 RHEL6.0 kernel 2.6.32-71.24.1 is affected by this WARNING too. Moreover if microcode module is loaded and have a valid microcode blob, then PV guest, with VCPUs > 1, on restore will catch BUG_ON (raw_smp_processor_id() != cpu) in a vendor specific apply_microcode func and crash like this: ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/microcode_amd.c:142! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/platform/microcode/firmware/microcode/loading CPU 0 Modules linked in: microcode(U) ipv6 dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Modules linked in: microcode(U) ipv6 dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 571, comm: kstop/0 Tainted: G W ---------------- 2.6.32-71.24.1.el6.x86_64nxen #24 RIP: e030:[<ffffffffa0080062>] [<ffffffffa0080062>] apply_microcode_amd+0xb2/0xc0 [microcode] RSP: e02b:ffff88007be03d40 EFLAGS: 00010097 RAX: 0000000000000000 RBX: ffffc90000322000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff81cf0018 RDI: 0000000000000001 RBP: ffff88007be03d70 R08: 0000000000000000 R09: ffffffff8156f780 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000004000000000 R15: ffff8800021bdec8 FS: 00007ff285a427a0(0000) GS:ffff88000219f000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000007c7b4000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000 Process kstop/0 (pid: 571, threadinfo ffff88007be02000, task ffff88007b008080) Stack: ffffffff81810848 ffff8800021cab28 0000004000000000 0000000000000001 <0> ffffffff81810848 ffff8800021cab28 ffff88007be03d90 ffffffffa007f0ad <0> 000000000000000b ffffffffa00817c0 ffff88007be03dc0 ffffffff8139be2e Call Trace: [<ffffffffa007f0ad>] mc_sysdev_resume+0x4d/0x70 [microcode] [<ffffffff8139be2e>] __sysdev_resume+0x4e/0xe0 [<ffffffff8139bf49>] sysdev_resume+0x89/0x190 [<ffffffff810c6260>] ? stop_cpu+0x0/0xf0 [<ffffffff81355df2>] xen_suspend+0x92/0xf0 [<ffffffff810c6309>] stop_cpu+0xa9/0xf0 [<ffffffff8108c780>] worker_thread+0x170/0x2a0 [<ffffffff8100f33d>] ? xen_force_evtchn_callback+0xd/0x10 [<ffffffff8100fb62>] ? check_events+0x12/0x20 [<ffffffff81091e50>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108c610>] ? worker_thread+0x0/0x2a0 [<ffffffff81091ae6>] kthread+0x96/0xa0 [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff81013393>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff81013b1d>] ? retint_restore_args+0x5/0x6 [<ffffffff810141c0>] ? child_rip+0x0/0x20 Code: 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 0f 1f 40 00 89 da 44 89 e6 48 c7 c7 68 11 08 a0 31 c0 e8 09 ee 49 e1 b8 ff ff ff ff eb d4 <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 RIP [<ffffffffa0080062>] apply_microcode_amd+0xb2/0xc0 [microcode] RSP <ffff88007be03d40> ---[ end trace 7f34b47b4668bb2c ]--- Kernel panic - not syncing: Fatal exception comment 17 shows that this is a good candidate for 6.0.z. Adding zstream keyword. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
If the microcode module was loaded, saving and restoring a Xen guest
returned a warning message and a backtrace error. With this update,
backtrace errors are no longer returned, and saving and restoring a Xen
guest works as expected.
|
When save/restoring a xen guest the guest spews the following warning to the console/dmesg on every restore if the microcode module was loaded ------------[ cut here ]------------ WARNING: at arch/x86/kernel/microcode_core.c:451 mc_sysdev_resume+0x67/0x70 [microcode]() (Not tainted) Modules linked in: sunrpc ipv6 xt_physdev iptable_filter ip_tables dm_mirror dm_region_hash dm_log microcode xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mod [last unloaded: scsi_wait_scan] Pid: 5, comm: migration/0 Not tainted 2.6.32-99.el6.x86_64 #1 Call Trace: [<ffffffff81063917>] warn_slowpath_common+0x87/0xc0 [<ffffffff8106396a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa009e0c7>] mc_sysdev_resume+0x67/0x70 [microcode] [<ffffffff8132961e>] __sysdev_resume+0x4e/0xe0 [<ffffffff81329739>] sysdev_resume+0x89/0x190 [<ffffffff812e4e82>] xen_suspend+0x92/0xf0 [<ffffffff810be35b>] stop_machine_cpu_stop+0x9b/0xe0 [<ffffffff810be2c0>] ? stop_machine_cpu_stop+0x0/0xe0 [<ffffffff810be1ea>] cpu_stopper_thread+0xda/0x1b0 [<ffffffff814c6166>] ? thread_return+0x4e/0x778 [<ffffffff8100733d>] ? xen_force_evtchn_callback+0xd/0x10 [<ffffffff81007b62>] ? check_events+0x12/0x20 [<ffffffff81007b4f>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff810be110>] ? cpu_stopper_thread+0x0/0x1b0 [<ffffffff81089a76>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff8100b393>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8100bb1d>] ? retint_restore_args+0x5/0x6 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 ---[ end trace a27ac8656b6a708e ]--- The code that produces this warning (below) has been around forever. However, we haven't seen the problem before because the module was never loaded. Recent changes to the microcode_ctl package # rpm -q --changelog microcode_ctl * Wed Nov 24 2010 Anton Arapov <anton> - 1:1.17-4 - Update to microcode-20101123.dat - Make microcode_ctl event driven - Resolves: rhbz#578107 ... has started loading the module automatically on some platforms. This occurs also with upstream kernel code, so there isn't currently a fix currently available. To fix it we either need to find a way in the kernel to satisfy the WARN_ON, or, to at least fix it in RHEL we just need to ensure that the microcode module doesn't get automatically loaded on xen platforms. 436 static int mc_sysdev_resume(struct sys_device *dev) 437 { 438 int cpu = dev->id; 439 struct ucode_cpu_info *uci = ucode_cpu_info + cpu; 440 441 if (!cpu_online(cpu)) 442 return 0; 443 444 /* 445 * All non-bootup cpus are still disabled, 446 * so only CPU 0 will apply ucode here. 447 * 448 * Moreover, there can be no concurrent 449 * updates from any other places at this point. 450 */ 451 WARN_ON(cpu != 0);