Bug 734708
Summary: | xen modules - unable to handle kernel NULL pointer dereference | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Josef Lusticky <jlustick> | ||||
Component: | kernel-xen | Assignee: | Laszlo Ersek <lersek> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.7 | CC: | drjones, leiwang, lersek, qguan, qwan, xen-maint, yuzhou | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.18-287.el5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-02-21 03:54:02 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 514491 | ||||||
Attachments: |
|
Description
Josef Lusticky
2011-08-31 08:29:54 UTC
So the problem is that after loading xen modules on a bare-metal kernel and then attempting to unload those modules we panic. This could probably just be thrown out with a "just don't do that" statement, but we can consider adding a simple check in the module's inits to bail if they're not on xen. I also got panic just using "modprobe xen-balloon", no removing needed: [root@dell-pe1650-02 ~]# modprobe xen-balloon BUG: unable to handle kernel NULL pointer dereference at virtual address 00000180 printing eip: 00000180 *pde = 2596a067 Oops: 0000 [#1] SMP last sysfs file: /block/ram0/dev Modules linked in: xen_balloon xen_platform_pci autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac parport_pc lp parport floppy sg pcspkr scb2_flash mtdcore chipreg serio_raw tpm_tis i2c_piix4 i2c_core e1000 ide_cd cdrom tpm tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod aacraid sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<00000180>] Not tainted VLI EFLAGS: 00010202 (2.6.18-274.el5 #1) EIP is at 0x180 eax: 00000180 ebx: 00000001 ecx: e4563e84 edx: ffff0001 esi: f5c273c0 edi: e4563e98 ebp: f5c273f8 esp: e4563e7c ds: 007b es: 007b ss: 0068 Process modprobe (pid: 3258, ti=e4563000 task=f5d67550 task.ti=e4563000) Stack: f88f90a0 f8d1664a 00000000 00000000 00000000 00000000 00007ff0 f8d17c00 f5c27000 c043fa0a e4563ed8 e4563efc f8d17c00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Call Trace: [<f88f90a0>] balloon_init+0xa0/0xc5 [xen_balloon] [<c043fa0a>] sys_init_module+0x1af3/0x1cb8 [<c0473ecd>] __kmalloc+0x0/0x72 [<c0404f4b>] syscall_call+0x7/0xb ======================= Code: Bad EIP value. EIP: [<00000180>] 0x180 SS:ESP 0068:e4563e7c <0>Kernel panic - not syncing: Fatal exception It's still a shooting yourself in the foot type of situation, since you don't need a xen balloon driver if you don't have xen. However, I believe in gun control so I'll look into taking the gun away from the sysadmin with a simple if-statement. balloon_init() already starts with a call to is_running_on_xen(). Unfortunately, is_running_on_xen is a function-like macro always returning 1, it is defined in <include/asm-i386/mach-xen/asm/hypervisor.h>. The idea is probably that the module is only ever built with CONFIG_XEN. This compile-time-static definition is wrong when we're inserting the module in an HVM guest or a bare metal kernel. http://xenbits.xensource.com/linux-2.6.18-xen.hg/rev/407 After this change, modproble shouldn't even be able to load (attempt to initialize) whatever depends on is_running_on_xen(), unless "xen-platform-pci.ko" is loaded. If "xen-platform-pci.ko" is loaded in the bare metal kernel, then is_running_on_xen() will return false. (In reply to comment #4) > If "xen-platform-pci.ko" is loaded in the bare > metal kernel, then is_running_on_xen() will return false. See cpuid(0x40000000) in get_hypercall_stubs(). Created attachment 522731 [details]
don't hardcode is_running_on_xen() for pv-on-hvm drivers
Allowing graceful failure of these modules when inadvertently loaded on native kernels.
(Backport from linux-2.6.18-xen.hg changeset 407:5c61cd349b20.)
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. (Re)tested PV-on-HVM drivers (disk and network) in HVM guest, they work. Bare metal: [root@lacos-workstation 2.6.18-284.el5.pv_modprobe_bz734708]# insmod \ ./kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko insmod: error inserting './kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko': -1 Unknown symbol in module This is because the patch makes the balloon driver dependent on "hypercall_stubs", which is defined by xen-platform-pci. dmesg: xen_balloon: Unknown symbol xenbus_scanf xen_balloon: Unknown symbol xen_features xen_balloon: Unknown symbol register_xenstore_notifier xen_balloon: Unknown symbol register_xenbus_watch xen_balloon: Unknown symbol hypercall_stubs [root@lacos-workstation ~]# modprobe xen-balloon FATAL: Error inserting xen_balloon (/lib/modules/2.6.18-284.el5.pv_modprobe_bz734708/kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko): No such device (This added xen_platform_pci first, and then the patched is_running_on_xen() macro worked.) Similarly for xen-vbd. xen-vnif depends on xen-balloon. xen_platform_pci is permanent. Patch(es) available in kernel-2.6.18-287.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Patch(es) available in kernel-2.6.18-287.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Verify this fix with 2.6.18-300.el5. Also can reproduce it with 5.7 released kernel(2.6.18-274.el5). Verify steps: 1. Install RHEL5.8 host with kernel 2.6.18-300.el5 2. boot up with normal linux kernel (without xen) 3. Install kernel-xen with the same version as kernel 4. Add the xen modules by modprobe, will get FATAL messages for xen-balloon, xen-vnif and xen-vbd: # modprobe xen-platform-pci # modprobe xen-balloon FATAL: Error inserting xen_balloon (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/balloon/xen-balloon.ko): No such device # modprobe xen-vnif FATAL: Error inserting xen_vnif (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/netfront/xen-vnif.ko): No such device # modprobe xen-vbd FATAL: Error inserting xen_vbd (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/blkfront/xen-vbd.ko): No such device 5. check that only the modules xen_platform_pci is loaded: # lsmod | grep xen xen_platform_pci 118125 0 [permanent] 6. Remove the xen modules, the modules can not be removed with below ERROR/SARNING message: # modprobe -r xen-platform-pci FATAL: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy # modprobe -r xen-vnif WARNING: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy # modprobe -r xen-vbd WARNING: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy # modprobe -r xen-balloon WARNING: Error removing xen_platform_pci (/lib/modules/2.6.18-298.el5/kernel/drivers/xenpv_hvm/platform-pci/xen-platform-pci.ko): Device or resource busy 7. Repeat tesps 4 and 5 for 10 times, no Call Trace or crash happens. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0150.html |