Bug 734708 - xen modules - unable to handle kernel NULL pointer dereference
Summary: xen modules - unable to handle kernel NULL pointer dereference
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.7
Hardware: i686
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514491
TreeView+ depends on / blocked
 
Reported: 2011-08-31 08:29 UTC by Josef Lusticky
Modified: 2012-02-21 03:54 UTC (History)
7 users (show)

Fixed In Version: kernel-2.6.18-287.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-21 03:54:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
don't hardcode is_running_on_xen() for pv-on-hvm drivers (1.07 KB, patch)
2011-09-12 16:26 UTC, Laszlo Ersek
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0150 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 5.8 kernel update 2012-02-21 07:35:24 UTC

Description Josef Lusticky 2011-08-31 08:29:54 UTC
Description of problem:
Removing module causes kernel panic saying "unable to handle kernel NULL pointer dereference at virtual address 00000180".
The address is always 00000180.
This panic is caused by modprobe xen-balloon or modprobe -r xen-platform-pci.

Version-Release number of selected component (if applicable):
283.el5 kernel

How reproducible:


Steps to Reproduce:
1. modprobe -a xen-balloon xen-platform-pci xen-vnif
2. modprobe -r xen-platform-pci
3. modprobe -r xen-vnif
  
Actual results:
Kernel panic with following output:
WARNING: Error removing xen_platform_pci (/lib/modules/2.6.18-283.el5BUG: unable to handle kernel NULL pointer dereference/kernel/drivers/ at virtual address 00000180
xenpv_hvm/platfo printing eip:
00000180
rm-pci/xen-platf*pde = 00000000
orm-pci.ko): DevOops: 0000 [#1]
SMP 
last sysfs file: /class/misc/aer_inject/dev
Modules linked in: xen_balloon xen_platform_pci ipoib_helper i5k_amb hwmon capifs bas_gigaset usb_gigaset gigaset isdn slhc crc_ccitt i2c_amd756 ovcamchip reed_solomon chipreg ide_cd nsc_gpio i8xx_tco ipmi_si ipmi_msghandler nls_cp932 td
CPU:    1
EIP:    0060:[<00000180>]    Tainted: G     ---- VLI
EFLAGS: 00010202   (2.6.18-283.el5 #1) 
EIP is at 0x180
eax: 00000180   ebx: 00000001   ecx: ec095e84   edx: ffff0001
esi: eb8953c0   edi: ec095e98   ebp: eb8953f8   esp: ec095e7c
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 22483, ti=ec095000 task=ee371550 task.ti=ec095000)
Stack: f885a0a0 f896464a 00000000 00000000 00000000 00000000 00007ff0 f8965c00 
       eb895000 c043fa06 ec095ed8 ec095efc f8965c00 00000000 00000000 00000000 
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Call Trace:
 [<f885a0a0>] balloon_init+0xa0/0xc5 [xen_balloon]
 [<c043fa06>] sys_init_module+0x1af3/0x1cb8
 [<c0473ed1>] __kmalloc+0x0/0x72
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
Code:  Bad EIP value.
EIP: [<00000180>] 0x180 SS:ESP 0068:ec095e7c
 ice or resource <0>Kernel panic - not syncing: Fatal exception
 busy


Expected results:
No kernel panic

Additional info:

Comment 1 Andrew Jones 2011-08-31 08:45:42 UTC
So the problem is that after loading xen modules on a bare-metal kernel and then attempting to unload those modules we panic. This could probably just be thrown out with a "just don't do that" statement, but we can consider adding a simple check in the module's inits to bail if they're not on xen.

Comment 2 Josef Lusticky 2011-08-31 09:24:47 UTC
I also got panic just using "modprobe xen-balloon", no removing needed:

[root@dell-pe1650-02 ~]# modprobe xen-balloon
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000180
 printing eip:
00000180
*pde = 2596a067
Oops: 0000 [#1]
SMP 
last sysfs file: /block/ram0/dev
Modules linked in: xen_balloon xen_platform_pci autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac parport_pc lp parport floppy sg pcspkr scb2_flash mtdcore chipreg serio_raw tpm_tis i2c_piix4 i2c_core e1000 ide_cd cdrom tpm tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod aacraid sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<00000180>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-274.el5 #1) 
EIP is at 0x180
eax: 00000180   ebx: 00000001   ecx: e4563e84   edx: ffff0001
esi: f5c273c0   edi: e4563e98   ebp: f5c273f8   esp: e4563e7c
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 3258, ti=e4563000 task=f5d67550 task.ti=e4563000)
Stack: f88f90a0 f8d1664a 00000000 00000000 00000000 00000000 00007ff0 f8d17c00 
       f5c27000 c043fa0a e4563ed8 e4563efc f8d17c00 00000000 00000000 00000000 
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Call Trace:
 [<f88f90a0>] balloon_init+0xa0/0xc5 [xen_balloon]
 [<c043fa0a>] sys_init_module+0x1af3/0x1cb8
 [<c0473ecd>] __kmalloc+0x0/0x72
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
Code:  Bad EIP value.
EIP: [<00000180>] 0x180 SS:ESP 0068:e4563e7c
 <0>Kernel panic - not syncing: Fatal exception

Comment 3 Andrew Jones 2011-08-31 09:40:33 UTC
It's still a shooting yourself in the foot type of situation, since you don't need a xen balloon driver if you don't have xen. However, I believe in gun control so I'll look into taking the gun away from the sysadmin with a simple if-statement.

Comment 4 Laszlo Ersek 2011-09-12 14:27:26 UTC
balloon_init() already starts with a call to is_running_on_xen(). Unfortunately, is_running_on_xen is a function-like macro always returning 1, it is defined in <include/asm-i386/mach-xen/asm/hypervisor.h>. The idea is probably that the module is only ever built with CONFIG_XEN.

This compile-time-static definition is wrong when we're inserting the module in an HVM guest or a bare metal kernel.

http://xenbits.xensource.com/linux-2.6.18-xen.hg/rev/407

After this change, modproble shouldn't even be able to load (attempt to initialize) whatever depends on is_running_on_xen(), unless "xen-platform-pci.ko" is loaded. If "xen-platform-pci.ko" is loaded in the bare metal kernel, then is_running_on_xen() will return false.

Comment 5 Laszlo Ersek 2011-09-12 14:29:26 UTC
(In reply to comment #4)
> If "xen-platform-pci.ko" is loaded in the bare
> metal kernel, then is_running_on_xen() will return false.

See cpuid(0x40000000) in get_hypercall_stubs().

Comment 6 Laszlo Ersek 2011-09-12 16:26:09 UTC
Created attachment 522731 [details]
don't hardcode is_running_on_xen() for pv-on-hvm drivers

Allowing graceful failure of these modules when inadvertently loaded on native kernels.

(Backport from linux-2.6.18-xen.hg changeset 407:5c61cd349b20.)

Comment 7 RHEL Program Management 2011-09-13 09:49:51 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 Laszlo Ersek 2011-09-14 09:34:22 UTC
(Re)tested PV-on-HVM drivers (disk and network) in HVM guest, they work.

Bare metal:

[root@lacos-workstation 2.6.18-284.el5.pv_modprobe_bz734708]# insmod \
    ./kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko
insmod: error inserting './kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko':
-1 Unknown symbol in module

This is because the patch makes the balloon driver dependent on "hypercall_stubs", which is defined by xen-platform-pci. dmesg:

xen_balloon: Unknown symbol xenbus_scanf
xen_balloon: Unknown symbol xen_features
xen_balloon: Unknown symbol register_xenstore_notifier
xen_balloon: Unknown symbol register_xenbus_watch
xen_balloon: Unknown symbol hypercall_stubs

[root@lacos-workstation ~]# modprobe xen-balloon
FATAL: Error inserting xen_balloon (/lib/modules/2.6.18-284.el5.pv_modprobe_bz734708/kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko): No such device

(This added xen_platform_pci first, and then the patched is_running_on_xen() macro worked.) Similarly for xen-vbd. xen-vnif depends on xen-balloon.

xen_platform_pci is permanent.

Comment 14 Jarod Wilson 2011-10-10 21:46:50 UTC
Patch(es) available in kernel-2.6.18-287.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 15 Jarod Wilson 2011-10-10 21:50:01 UTC
Patch(es) available in kernel-2.6.18-287.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 17 Qin Guan 2011-12-13 10:00:51 UTC
Verify this fix with 2.6.18-300.el5. Also can reproduce it with 5.7 released kernel(2.6.18-274.el5).

Verify steps:
1. Install RHEL5.8 host with kernel 2.6.18-300.el5 

2. boot up with normal linux kernel (without xen)

3. Install kernel-xen with the same version as kernel

4. Add the xen modules by modprobe, will get FATAL messages for xen-balloon, xen-vnif and xen-vbd: 
# modprobe xen-platform-pci
# modprobe xen-balloon
FATAL: Error inserting xen_balloon (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko): No such device
# modprobe xen-vnif
FATAL: Error inserting xen_vnif (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/netfront/xen-vnif.ko): No such device
# modprobe xen-vbd
FATAL: Error inserting xen_vbd (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/blkfront/xen-vbd.ko): No such device

5. check that only the modules xen_platform_pci is loaded:
# lsmod | grep xen
xen_platform_pci      118125  0 [permanent]

6. Remove the xen modules, the modules can not be removed with below ERROR/SARNING message:
# modprobe -r xen-platform-pci
FATAL: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy
# modprobe -r xen-vnif
WARNING: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy
# modprobe -r xen-vbd
WARNING: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy
# modprobe -r xen-balloon
WARNING: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy

7. Repeat tesps 4 and 5 for 10 times, no Call Trace or crash happens.

Comment 18 errata-xmlrpc 2012-02-21 03:54:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html


Note You need to log in before you can comment on or make changes to this bug.