Bug 505491

Summary: 32-bit Dom0 Cannot Boot in RHEL5.4
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kernel-xenAssignee: Don Dugger <ddugger>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: clalance, dzickus, knoel, rlerch, sghosh, xen-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
On older 32-bit systems with only 1 CPU and an IOAPIC, the 32-bit Xen dom0 kernel may fail to boot with message similar to: (XEN) irq.c:794: dom0: pirq 2 or vector 144 already mapped (XEN) io_apic.c:2207: (XEN) ioapic_guest_write: apic=0, pin=2, old_irq=-1, new_irq=4 (XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009e4 (XEN) ioapic_guest_write: Attempt to add IO-APIC pin for in-use IRQ! To workaround this problem, pass the option "noapic" to the hypervisor while booting the kernel. This bug exists in the RHEL-5.4 beta kernel, but should be addressed before RHEL-5.4 Final.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:19:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xm dmesg from -128 xen kernel
none
dmesg from -128 dom0
none
xm dmesg from -153 xen kernel (collected via serial console)
none
dmesg from -153 dom0 none

Description Qian Cai 2009-06-12 05:23:45 UTC
Description of problem:
There is a 32-bit only system athlon5.rhts.bos.redhat.com (AMD Athlon(tm) XP 1500+), which failed to boot kernel-xen. RHEL5.3 (-128.el5) kernel-xen worked fine.

 Xen version 3.1.2-153.el5 (mockbuild) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) Wed Jun 10 17:50:32 EDT 2009
 Latest ChangeSet: unavailable

(XEN) Command line: com1=115200,8n1 noreboot
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009f800 (usable)
(XEN)  000000000009f800 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000001fff0000 (usable)
(XEN)  000000001fff0000 - 000000001fff3000 (ACPI NVS)
(XEN)  000000001fff3000 - 0000000020000000 (ACPI data)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ffff0000 - 0000000100000000 (reserved)
(XEN) System RAM: 511MB (523836kB)
(XEN) Xen heap: 9MB (10092kB)
(XEN) Domain heap initialised: DMA width 32 bits
(XEN) PAE enabled, limit: 16 GB
(XEN) Processor #0 6:6 APIC version 16
(XEN) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 1328.317 MHz processor.
(XEN) CPU0: AMD Athlon(tm) XP 1500+ stepping 02
(XEN) Total of 1 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) Platform timer overflows in 2 jiffies.
(XEN) Platform timer is 1.193MHz PIT
(XEN) Brought up 1 CPUs
(XEN) I/O virtualisation disabled
(XEN) *** LOADING DOMAIN 0 ***
(XEN) elf_parse_binary: phdr: paddr=0xc0400000 memsz=0x280708
(XEN) elf_parse_binary: phdr: paddr=0xc0681000 memsz=0x162000
(XEN) elf_parse_binary: memory: 0xc0400000 -> 0xc07e3000
(XEN) elf_xen_parse_note: GUEST_OS = "linux"
(XEN) elf_xen_parse_note: GUEST_VERSION = "2.6"
(XEN) elf_xen_parse_note: XEN_VERSION = "xen-3.0"
(XEN) elf_xen_parse_note: VIRT_BASE = 0xc0000000
(XEN) elf_xen_parse_note: PADDR_OFFSET = 0xc0000000
(XEN) elf_xen_parse_note: ENTRY = 0xc0400000
(XEN) elf_xen_parse_note: HYPERCALL_PAGE = 0xc0401000
(XEN) elf_xen_parse_note: FEATURES = "writable_page_tables|writable_descriptor_tables|auto_translated_physmap|pae_pgdir_above_4gb|supervisor_mode_kernel"
(XEN) elf_xen_parse_note: PAE_MODE = "yes"
(XEN) elf_xen_parse_note: LOADER = "generic"
(XEN) elf_xen_addr_calc_check: addresses:
(XEN)     virt_base        = 0xc0000000
(XEN)     elf_paddr_offset = 0xc0000000
(XEN)     virt_offset      = 0x0
(XEN)     virt_kstart      = 0xc0400000
(XEN)     virt_kend        = 0xc07e3000
(XEN)     virt_entry       = 0xc0400000
(XEN)  Xen  kernel: 32-bit, PAE, lsb
(XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0xc0400000 -> 0xc07e3000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   000000001d000000->000000001e000000 (106770 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0400000->c07e3000
(XEN)  Init. ramdisk: c07e3000->c0edac00
(XEN)  Phys-Mach map: c0edb000->c0f47448
(XEN)  Start info:    c0f48000->c0f4846c
(XEN)  Page tables:   c0f49000->c0f56000
(XEN)  Boot stack:    c0f56000->c0f57000
(XEN)  TOTAL:         c0000000->c1000000
(XEN)  ENTRY ADDRESS: c0400000
(XEN) Dom0 has maximum 1 VCPUs
(XEN) elf_load_binary: phdr 0 at 0xc0400000 -> 0xc0680708
(XEN) elf_load_binary: phdr 1 at 0xc0681000 -> 0xc072bcc4
(XEN) Initrd len 0x6f7c00, start at 0xc07e3000
(XEN) Scrubbing Free RAM: done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen).
(XEN) Freed 104kB init memory.
Linux version 2.6.18-153.el5xen (mockbuild.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Jun 10 18:06:38 EDT 2009
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 000000001b912000 (usable)
0MB HIGHMEM available.
441MB LOWMEM available.
Using x86 segment limits to approximate NX protection
DMI 2.2 present.
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists.  Total pages: 112914
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0750000 soft=c0730000
PID hash table entries: 2048 (order: 11, 8192 bytes)
Xen reported: 1328.317 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Software IO TLB enabled: 
 Aperture:     2 megabytes
 Kernel range: 0x00000000c014b000 - 0x00000000c034b000
vmalloc area: dc800000-f4ffe000, maxmem 2d7fe000
Memory: 424576k/451656k available (2162k kernel code, 18648k reserved, 887k data, 176k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 3323.11 BogoMIPS (lpj=6646233)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 13k freed
ACPI: Core revision 20060707
ENABLING IO-APIC IRQs
(XEN) ioapic_guest_write: apic=0, pin=0, old_irq=-1, new_irq=0
(XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009f0
(XEN) ioapic_guest_write: Attempt to add IO-APIC pin for in-use IRQ!
(XEN) irq.c:794: dom0: pirq 2 or vector 144 already mapped
(XEN) ioapic_guest_write: apic=0, pin=2, old_irq=-1, new_irq=4
(XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009e4
(XEN) ioapic_guest_write: Attempt to add IO-APIC pin for in-use IRQ!

Version-Release number of selected component (if applicable):
kernel-2.6.18-152.el5
kernel-2.6.18-153.el5

How reproducible:
always

Comment 2 Chris Lalancette 2009-06-12 07:32:18 UTC
Initial bisect shows that this problem started occurring in the -141 kernel; prior to that, it booted just fine.  Further refinement shows that this is a hypervisor problem.  That is, a -141 dom0 kernel with a -141 hypervisor shows the problem, but a -141 dom0 kernel with a -140 hypervisor does not.  In the -141 release, the VT-d and AMD IOMMU code was added to the hypervisor, so something in there is likely causing the problem.

I'm continuing to investigate.

Chris Lalancette

Comment 3 Chris Lalancette 2009-06-12 10:30:34 UTC
Right.  So the trouble begins here:

    old_vector = domain_irq_to_vector(d, pirq);
    old_pirq = domain_vector_to_irq(d, vector);

    if ( (old_vector && (old_vector != vector) ) ||
         (old_pirq && (old_pirq != pirq)) )
    {
        dprintk(XENLOG_G_ERR, "dom%d: pirq %d or vector %d already mapped\n",
                d->domain_id, pirq, vector);
        return -EINVAL;
    }

in arch/x86/irq.c.  What's happening is that "old_vector" is being set to 226, meaning that we fall into the if() statement, and then all is lost.  Now, domain_irq_to_vector is actually a macro in include/asm-x86/irq.h:

#define domain_irq_to_vector(d, irq) ((d)->arch.pirq_vector[irq] ?: \
                                      IO_APIC_IRQ(irq) ? 0 : LEGACY_VECTOR(irq))

In point of fact, that macro *was* changed with the VT-d stuff; it used to be a much simpler:

#define domain_irq_to_vector(d, irq) ((d)->arch.pirq_vector[(irq)])

Now let's look at what is going on.  During bootup, if we detect an IOAPIC in the system (which we do), then we run through arch/x86/io_apic.c:setup_IO_APIC().  In there, we end up setting "io_apic_irqs" to 0xffffffff, and calling "init_IO_APIC_traps()".  init_IO_APIC_traps looks like this:

static inline void init_IO_APIC_traps(void)
{
    int irq;
    /* Xen: This is way simpler than the Linux implementation. */
    for (irq = 0; irq < 16 ; irq++)
        if (IO_APIC_IRQ(irq) && !IO_APIC_VECTOR(irq))
            make_8259A_irq(irq);
}

So, it runs through all of the legacy IRQs, and sees if they are managed by the IO_APIC.  Now, since we earlier said that all IRQs are managed by the IO_APIC, the first check is true for all legacy vectors.  However, the second check (!IO_APIC_VECTOR()) is only true for irq #2 (why?  I need to figure this out).  That means that we end up masking irq #2 out of the the io_apic_irqs.  Later on, when we go to map that IRQ into domain 0, it fails because that IRQ is masked out, and so the domain_irq_to_vector() ends up returning 0xe2, or a LEGACY_VECTOR, and then the "if (old_vector)" test fires.

Chris Lalancette

Comment 4 Chris Lalancette 2009-06-12 12:06:17 UTC
I sent this mail to virt-intel-list, and while it duplicates some of Comment #3, I think it explains the situation a bit better:

Hello all,
     Internal QE just pointed us at a regression on old hardware, which I
tracked down to the VT-d patches.  The hardware in questions is an old AMD
Athlon machine, pre-64bit.  Prior to the VT-d patches going in (in the
2.6.18-141xen kernel), this machine would boot and run Xen just fine.  However,
after the VT-d patches, this machine no longer boots.  The problem manifests
itself as a hang during dom0 boot, and you get the following messages on the
console:

Freeing SMP alternatives: 13k freed
ACPI: Core revision 20060707
ENABLING IO-APIC IRQs
(XEN) ioapic_guest_write: apic=0, pin=0, old_irq=-1, new_irq=0
(XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009f0
(XEN) ioapic_guest_write: Attempt to add IO-APIC pin for in-use IRQ!
(XEN) irq.c:794: dom0: pirq 2 or vector 144 already mapped
(XEN) ioapic_guest_write: apic=0, pin=2, old_irq=-1, new_irq=4
(XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009e4
(XEN) ioapic_guest_write: Attempt to add IO-APIC pin for in-use IRQ!

The important (different) bit here is that "pirq 2 or vector 144 already mapped"
message.  I tracked it down to this change:

-#define domain_irq_to_vector(d, irq) ((d)->arch.pirq_vector[(irq)])
+#define domain_irq_to_vector(d, irq) ((d)->arch.pirq_vector[irq] ?: \
                                      IO_APIC_IRQ(irq) ? 0 : LEGACY_VECTOR(irq))

in the hypervisor in include/asm-x86/irq.h.  What seems to be happening is that
during ACPI parse, we find out that we have an IOAPIC, so we set acpi_ioapic to
1.  Then, during IOAPIC setup in setup_IO_APIC(), we set the io_apic_irqs mask
to ~0, meaning that we think all IRQs are routed through the IOAPIC (because
that's what ACPI told us).  However, during the rest of IOAPIC setup, in
particular during setup_IO_APIC_irqs(), (I think) we find that this is not the
case, and IRQ 2 isn't actually routed through the IOAPIC.  Because of this,
init_IO_APIC_traps() masks IRQ 2 out of io_apic_irqs.

Now, dom0 starts up and starts mapping pirq's.  When it gets to pirq 2,
map_domain_pirq() does:

old_vector = domain_irq_to_vector(d, pirq);

Looking at the "new" version of the domain_irq_to_vector macro above, though, we
see that d->arch.pirq_vector[2] == 0 (since we've not tried to map this yet).
Therefore, we check the next part, which is IO_APIC_IRQ(2), but from the above,
we know that pirq 2 is *not* mapped through the IOAPIC.  Because of that, we
drop through and return LEGACY_VECTOR(2), which ends up being 0xe2.  This return
value causes this check:

    if ( (old_vector && (old_vector != vector) ) ||
         (old_pirq && (old_pirq != pirq)) )
    {
        dprintk(XENLOG_G_ERR, "dom%d: pirq %d or vector %d already mapped\n",
                d->domain_id, pirq, vector);
        return -EINVAL;
    }

to fire, and hence the mapping (and the boot) to fail.

I'm not entirely sure what the fix would be.  It seems to me that the
domain_irq_to_vector macro needs to take into account pirqs which aren't routed
through the IOAPIC, but also aren't "LEGACY" vectors.  Does anybody have
thoughts about how to go about fixing this?  For reference, this is also broken
with an upstream hypervisor.

Chris Lalancette

Comment 6 Chris Lalancette 2009-06-16 13:12:49 UTC
Created attachment 348102 [details]
xm dmesg from -128 xen kernel

Comment 7 Chris Lalancette 2009-06-16 13:13:30 UTC
Created attachment 348103 [details]
dmesg from -128 dom0

Comment 8 Chris Lalancette 2009-06-16 13:14:06 UTC
Created attachment 348104 [details]
xm dmesg from -153 xen kernel (collected via serial console)

Comment 9 Chris Lalancette 2009-06-16 13:14:34 UTC
Created attachment 348105 [details]
dmesg from -153 dom0

Comment 10 Chris Lalancette 2009-06-16 14:03:58 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
On older 32-bit systems with only 1 CPU and an IOAPIC, the 32-bit Xen dom0 kernel may fail to boot with message similar to:

(XEN) irq.c:794: dom0: pirq 2 or vector 144 already mapped
(XEN) io_apic.c:2207: 
(XEN) ioapic_guest_write: apic=0, pin=2, old_irq=-1, new_irq=4
(XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009e4
(XEN) ioapic_guest_write: Attempt to add IO-APIC pin for in-use IRQ!

To workaround this problem, pass the option "noapic" to the hypervisor while booting the kernel.  This bug exists in the RHEL-5.4 beta kernel, but should be addressed before RHEL-5.4 Final.

Comment 15 Don Zickus 2009-06-30 20:22:44 UTC
in kernel-2.6.18-156.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 20 errata-xmlrpc 2009-09-02 08:19:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Comment 21 Chris Lalancette 2010-07-19 13:07:46 UTC
Clearing a needinfo request.

Chris Lalancette