Bug 442538 - kernel panic in gnttab_map when booting RHEL4 x86_64 FV xen guest
kernel panic in gnttab_map when booting RHEL4 x86_64 FV xen guest
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7
All Linux
high Severity high
: rc
: ---
Assigned To: Don Dutile
Martin Jenner
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-15 09:36 EDT by Jeff Layton
Modified: 2014-06-18 03:37 EDT (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-24 15:29:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Posted patch for 4.7 (2.22 KB, patch)
2008-04-15 14:52 EDT, Don Dutile
no flags Details | Diff
screen shot of kernel panic with Xen (17.80 KB, image/png)
2008-04-30 05:52 EDT, Alexander Todorov
no flags Details
can't open /dev/xvda error (149.37 KB, image/png)
2008-05-19 09:08 EDT, Alexander Todorov
no flags Details

  None (edit)
Description Jeff Layton 2008-04-15 09:36:51 EDT
Opening this as a kernel bug, but it may be a xen issue...

I got this panic when booting a RHEL4 x86_64 FV xen guest. This is from
-68.34.ELsmp, but I've seen the same panic on a -68.32 based kernel as well. Let
me know if you need access to the machine...


Bootdata ok (command line is ro root=/dev/VolGroup00/LogVol00
console=ttyS0,115200 nohpet)
Linux version 2.6.9-68.34.ELsmp (brewbuilder@hs20-bc1-7.build.redhat.com) (gcc
version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP Mon Apr 14 16:50:55 EDT 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fffac00 (usable)
 BIOS-e820: 000000001fffac00 - 0000000020000000 (reserved)
No NUMA configuration found
Faking a node at 0000000000000000-000000001fffa000
Bootmem setup node 0 0000000000000000-000000001fffa000
DMI 2.4 present.
ACPI: PM-Timer IO Port: 0x1f48
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:15 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
Processor #2 6:15 APIC version 16
Setting APIC routing to flat
ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 low level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 7 low level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 low level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level)
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 30000000 (gap: 20000000:e0000000)
Checking aperture...
Built 1 zonelists
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200 nohpet
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 65536 bytes)
time.c: Using 3.579545 MHz PM timer.
time.c: Detected 2992.653 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 508516k/524264k available (2160k kernel code, 0k reserved, 1352k data,
208k init)
Calibrating delay using timer specific routine.. 5995.99 BogoMIPS (lpj=2997999)
Security Scaffold v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU0: Intel(R) Xeon(R) CPU            5160  @ 3.00GHz stepping 0b
per-CPU timeslice cutoff: 4096.17 usecs.
task migration cache decay timeout: 4 msecs.
Booting processor 1/2 rip 6000 rsp 1001fc05f58
Initializing CPU#1
Calibrating delay using timer specific routine.. 5984.27 BogoMIPS (lpj=2992135)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
Intel(R) Xeon(R) CPU            5160  @ 3.00GHz stepping 0b
Total of 2 processors activated (11980.26 BogoMIPS).
activating NMI Watchdog ... done.
testing NMI watchdog ... CPU#0: NMI appears to be stuck (0)!
Using local APIC timer interrupts.
Detected 6.250 MHz APIC timer.
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
time.c: Using PIT/TSC based timekeeping.
checking if image is initramfs... it is
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040816
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Link [LNKA] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *7 10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 7 10 *11)
xen_mem: Initialising balloon driver.
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:00:01.2[A] -> GSI 20 (level, low) -> IRQ 169
GSI 17 sharing vector 0xB1 and IRQ 17
ACPI: PCI Interrupt 0000:00:01.3[D] -> GSI 23 (level, low) -> IRQ 177
GSI 18 sharing vector 0xB9 and IRQ 18
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 28 (level, low) -> IRQ 185
GSI 19 sharing vector 0xC1 and IRQ 19
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 32 (level, low) -> IRQ 193
PCI-DMA: Disabling IOMMU.
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
audit: initializing netlink socket (disabled)
audit(1208266359.835:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key 9A2F7475FE63665B
- User ID: Red Hat, Inc. (Kernel Module GPG key)
Limiting direct PCI/PCI transfers.
PCI: PIIX3: Enabling Passive Release on 0000:00:01.0
Activating ISA DMA hang workarounds.
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
ACPI: Processor [PR00] (supports C1)
ACPI: Processor [PR01] (supports C1)
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 68 ports, IRQ sharing enabled
�ttyS0 at I/O 0x3f8 (irq = 4) is a 16450
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 28 (level, low) -> IRQ 185
Xen version 3.1.
Hypercall area is 1 pages.
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at gnttab:527
invalid operand: 0000 [1] SMP 
CPU 0 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.9-68.34.ELsmp
RIP: 0010:[<ffffffff8025f365>] <ffffffff8025f365>{gnttab_map+82}
RSP: 0018:000001001fc01d38  EFLAGS: 00010282
RAX: ffffffffffffffea RBX: 0000000000020000 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: 000001001fc01d38 RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000246
R10: 0000000000000246 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000001000 R14: 0000000000000000 R15: 00000000f3000000
FS:  0000000000000000(0000) GS:ffffffff8050c380(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000045efc0 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo 000001001fc00000, task 00000100011e47f0)
Stack: 0000000100007ff0 00000000ffffffff 00000001000f3000 0000000000000020 
       0000000000000000 ffffffff8025f67e 0000000000001000 0000000000000000 
       00000000f3000000 ffffffff801763a9 
Call Trace:<ffffffff8025f67e>{gnttab_resume+68}
<ffffffff801763a9>{alloc_page_interleave+61} 
       <ffffffff8052f7ed>{gnttab_init+132}
<ffffffff8025edd2>{platform_pci_init+889} 
       <ffffffff801f8a0c>{pci_device_probe+110} <ffffffff8024f075>{bus_match+57} 
       <ffffffff8024f173>{driver_attach+68} <ffffffff8024f48f>{bus_add_driver+143} 
       <ffffffff801f877c>{pci_register_driver+119}
<ffffffff8052f741>{platform_pci_module_init+13} 
       <ffffffff8010c58a>{init+474} <ffffffff80110f87>{child_rip+8} 
       <ffffffff8020d5f8>{acpi_ds_init_one_object+0} <ffffffff8010c3b0>{init+0} 
       <ffffffff80110f7f>{child_rip+0} 

Code: 0f 0b 27 47 34 80 ff ff ff ff 0f 02 ff c9 eb a7 48 83 c4 28 
RIP <ffffffff8025f365>{gnttab_map+82} RSP <000001001fc01d38>
 <0>Kernel panic - not syncing: Oops
Comment 1 Jeff Layton 2008-04-15 09:44:27 EDT
-68.31.EL looks to be fine. I suspect the regression was introduced in
-68.32.EL. The delta between 31 and 32 is mostly xen patches, so this does look
more like a xen bug.
Comment 3 Chris Lalancette 2008-04-15 11:08:50 EDT
Jeff,
     Yeah, I do believe this is a Xen bug, but I do not believe it is a dup of
442298.  Don Dutile is aware of this problem and is working on it; I'll CC him here.

Chris Lalancette
Comment 4 Jeff Layton 2008-04-15 11:09:57 EDT
Thanks Chris...

Yep, confirmed. I just tested a kernel with the patch for 442298 and it still
panics on boot. Let me know if you need me to test a patch or anything...
Comment 5 Don Dutile 2008-04-15 11:51:47 EDT
Jeff,

Found the source of the bug; the > 3VNIF patch introduced new code for the
pv-on-hvm code path, but it (obviously) was never tested.
RHEL4.7 actually has pv-on-hvm support built into it, and thus, crashes when
booted as a FV guest.

I have found a fix (and a few more memory-leaking bugs & fixes), which I 
will be posting today.

Comment 6 Don Dutile 2008-04-15 14:52:40 EDT
Created attachment 302504 [details]
Posted patch for 4.7
Comment 8 Alexander Todorov 2008-04-30 05:52:32 EDT
Created attachment 304205 [details]
screen shot of kernel panic with Xen

Can you confirm that the attached call trace is the same as the one in comment
#0?
It looks the same to me but I get this when I try to install the guest. This is
with RHEL4-U7-re20080424.0/x86_64/FV.

Thanks.
Comment 11 Vivek Goyal 2008-05-16 13:52:48 EDT
Committed in 70.EL. RPMS are available at http://people.redhat.com/vgoyal/
Comment 12 Alexander Todorov 2008-05-19 09:04:52 EDT
Vivek,
I can confirm that the panic is not there. I was able to boot a guest with the
70.EL kernel to the installation program. However I encountered a new bug and
will need some help to triage it.

Anaconda (stage2) can't open device /dev/xvda (the disk) while probing for
available disk space (this is in the beginning of the install). This is a parted
error. Trying to open the device with parted/fdisk from the console (while in
the install environment) doesn't work also. One of the anaconda folks told me
that this is probably kernel's fault as device perms seemed OK to him. 
Comment 13 Alexander Todorov 2008-05-19 09:08:02 EDT
Created attachment 305937 [details]
can't open /dev/xvda error

Additional info: 
this is RHEL4-U7-re20080515.0.

ia64 - FV guest works fine
x86_64 - FV guest leads to the above error
Comment 14 Chris Lalancette 2008-05-19 09:51:16 EDT
Something is wrong with the way you started that installation.  The only way
anaconda should be looking for /dev/xvda is if it was starting a PV
installation, not a FV installation.  What was the virt-install command-line you
used, or, alternatively, what were the steps you went through in virt-manager to
start this install?

Chris Lalancette
Comment 15 Alexander Todorov 2008-05-19 09:57:51 EDT
Ignore comment #12 and #13 - filed as bug #447315

I can confirm that kernel panic is gone.
Comment 19 errata-xmlrpc 2008-07-24 15:29:01 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html

Note You need to log in before you can comment on or make changes to this bug.