RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 815785 - kdump fails with lapic error in xen hvm guest
Summary: kdump fails with lapic error in xen hvm guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Don Zickus
QA Contact: Virtualization Bugs
URL:
Whiteboard: xen
Depends On:
Blocks: 653816
TreeView+ depends on / blocked
 
Reported: 2012-04-24 13:56 UTC by Qixiang Wan
Modified: 2012-06-20 13:59 UTC (History)
6 users (show)

Fixed In Version: kernel-2.6.32-269.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 13:59:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
second kernel call trace with LAPIC error (10.35 KB, text/plain)
2012-04-24 14:03 UTC, Qixiang Wan
no flags Details
second kernel call trace and continue, then reboot after "lost interrupt" error (24.81 KB, text/plain)
2012-04-24 14:05 UTC, Qixiang Wan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0862 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2012-06-20 12:55:00 UTC

Description Qixiang Wan 2012-04-24 13:56:58 UTC
Description of problem:
When using kdump in a RHEL6.3 xen HVM guest, the second kernel call trace and hang with the following error:

------------[ cut here ]------------
WARNING: at arch/x86/kernel/apic/apic.c:1304 setup_local_APIC+0x189/0x290() (Not tainted)
Hardware name: HVM domU
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.32-263.el6.x86_64 #1
Call Trace:
 [<ffffffff8106b6b7>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b70a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff814f6d49>] ? setup_local_APIC+0x189/0x290
 [<ffffffff81c30402>] ? native_smp_prepare_cpus+0x2bd/0x389
 [<ffffffff81c21740>] ? kernel_init+0x112/0x2fe
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81c2162e>] ? kernel_init+0x0/0x2fe
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end trace a7919e7f17c0a725 ]---
Spurious LAPIC timer interrupt on cpu 0
do_IRQ: 0.73 No irq handler for vector (irq -1)

It's more reproducible when trigger crash while scping data from guest, or it may continue the boot process after the above call trace and then hang with "lost interrupt" error.

It's very likely caused by commit 0a267f9:
[x86] kdump: No need to disable ioapic in crash path (Don Zickus) [783322]

Version-Release number of selected component (if applicable):
kernel-2.6.32-263

How reproducible:
100%

Steps to Reproduce:
1. Install a xen HVM guest
2. Enable kdump with the steps in https://access.redhat.com/knowledge/solutions/92943, pasting the necessary steps for RHEL6.3 here:

[1] Add the kernel command line parameter xen_emul_unplug=never to the kernel's command line and boot. This boots using the emulated devices (and appropriate drivers) and without paravirt drivers.
[2] Start the kdump service service kdump start. This will generate a dumprd with the drivers necessary for the emulated devices.
[3] Edit /etc/modprobe.d/blacklist.conf by adding the three lines shown below to blacklist the drivers used for the emulated devices. This will ensure that even if the host presents the emulated devices to the guest, the guest will use the paravirt drivers instead.

blacklist ata_piix
blacklist 8139too
blacklist 8139cp

[4] Remove the xen_emul_unplug=never kernel command line parameter added in step 1 and add the kernel command line xen_emul_unplug=unnecessary and reboot.
[5] Ensure that the kdump service has started: service kdump status
[6] Run echo c >/proc/sysrq-trigger to force a crash that should invoke kdump

Actual results:
The second kernel call trace and hang

Expected results:
kdump should work

Additional info:

Comment 1 Qixiang Wan 2012-04-24 14:03:32 UTC
Created attachment 579867 [details]
second kernel call trace with LAPIC error

This error is more reproducible if trigger the crash while scping data from guest

Comment 2 Qixiang Wan 2012-04-24 14:05:58 UTC
Created attachment 579868 [details]
second kernel call trace and continue, then reboot after "lost interrupt" error

Guest has a chance (if don't scp data from guest when trigger the crash) to continue boot after the call trace, but it will reboot after "lost interrupt" error later.

Comment 3 Andrew Jones 2012-04-24 18:05:45 UTC
I've started a brew build here

https://brewweb.devel.redhat.com/taskinfo?taskID=4334583

that has 0a267f9 reverted for testing.

Comment 4 Qixiang Wan 2012-04-25 03:08:26 UTC
(In reply to comment #3)
> I've started a brew build here
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=4334583
> 
> that has 0a267f9 reverted for testing.

Tested this build, kdump works well without any call trace.

Comment 5 Andrew Jones 2012-04-25 07:54:53 UTC
Thanks for the testing qwan!

I'll start chatting with dzickus about this.

Comment 6 RHEL Program Management 2012-04-25 08:10:07 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 7 Andrew Jones 2012-04-25 12:40:42 UTC
This brew build has the patch (hack) below to try and keep 0a267f9

https://brewweb.devel.redhat.com/taskinfo?taskID=4336690

I'm not sure if we want to do this, but I guess we can test it to see if it
even works for starters.



diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c1b0780..1ec6287 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -30,6 +30,8 @@
 #include <asm/virtext.h>
 #include <asm/iommu.h>

+#include <xen/xen.h>
+

 int in_crash_kexec;

@@ -103,6 +105,10 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
        cpu_emergency_svm_disable();

        lapic_shutdown();
+#if defined(CONFIG_X86_IO_APIC)
+       if (xen_hvm_domain())
+               disable_IO_APIC();
+#endif
        if (mcp55_rewrite) {
                u32 cfg;
                printk(KERN_CRIT "REWRITING MCP55 CFG REG\n");

Comment 8 Qixiang Wan 2012-04-25 12:58:00 UTC
(In reply to comment #7)
> This brew build has the patch (hack) below to try and keep 0a267f9
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=4336690
> 
> I'm not sure if we want to do this, but I guess we can test it to see if it
> even works for starters.

It works in the same environment.

Comment 9 Andrew Jones 2012-05-02 14:24:21 UTC
Don posted a 'revert 0a267f9' patch with under this BZ, so kicking it to POST. He'll revisit the issue for 6.4.

Comment 10 Jarod Wilson 2012-05-02 16:21:46 UTC
Patch(es) available on kernel-2.6.32-269.el6

Comment 13 Qixiang Wan 2012-05-03 05:57:21 UTC
Verified with kernel-2.6.32-269.el6. With this build, kdump service can start
successfully and works well in xen HVM guest.

The latest build contains the following fixes:
Bug 810222 - Revert "[virt] xen: mask MTRR feature from guest BZ#750758" (fix
in -262)
Bug 811815 - [FJ6.2 Bug]: kdump service fails with the message "Kdump is
unsupported on this kernel" (fix in -266)
Bug 815785 - kdump fails with lapic error in xen hvm guest (fix in -269).

With all of the above three fixes integrated, kdump in RHEL6.3 xen hvm guest
works well now. So verify these 3 bugs together.

Test steps:

[1] Add the kernel command line parameter xen_emul_unplug=never to the kernel's
command line and boot.
[2] Start the kdump service.
[3] Blacklist the drivers used for xen emulated device by adding the following
tree lines to /etc/modprobe.d/blacklist.conf:

blacklist ata_piix
blacklist 8139too
blacklist 8139cp

[4] Remove the xen_emul_unplug=never kernel command line parameter added in
step 1 and add the kernel command line xen_emul_unplug=unnecessary, then
reboot.
[5] Ensure that the kdump service has started.
[6] Run echo c >/proc/sysrq-trigger to force a crash that should invoke kdump

Comment 15 errata-xmlrpc 2012-06-20 13:59:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html


Note You need to log in before you can comment on or make changes to this bug.