Bug 619268 - rmmod kvm modules cause host kernel panic
Summary: rmmod kvm modules cause host kernel panic
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm   
(Show other bugs)
Version: 5.6
Hardware: All Linux
Target Milestone: rc
: ---
Assignee: Eduardo Habkost
QA Contact: Virtualization Bugs
Depends On:
Blocks: Rhel5KvmTier1
TreeView+ depends on / blocked
Reported: 2010-07-29 06:22 UTC by Suqin Huang
Modified: 2013-01-09 22:57 UTC (History)
6 users (show)

Fixed In Version: kvm-83-199.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-01-13 23:37:07 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0028 normal SHIPPED_LIVE Low: kvm security and bug fix update 2011-01-13 11:03:39 UTC

Description Suqin Huang 2010-07-29 06:22:16 UTC
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. start kvm_stat, and repeat rmmod kvm modules and modprobe kvm modules 

modprobe  ksm
modprobe  kvm_amd
modprobe  kvm
modprobe -r ksm
modprobe -r kvm_amd
modprobe -r kvm


Actual results:
host kernel panic

Expected results:

Additional info:
1. host kernel:

2. kernel panic:
unable to handle kernel paging request at ffffffff88410f20 RIP: 
 [<ffffffff8001ea91>] __dentry_open+0x6e/0x1dc
PGD 203067 PUD 205063 PMD 21bd81067 PTE 0
Oops: 0000 [1] SMP 
last sysfs file: /module/kvm/version
CPU 2 
Modules linked in: nls_utf8 vfat fat nfsd exportfs auth_rpcgss tun nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table bridge ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep tpm_infineon snd tpm tg3 i2c_piix4 shpchp tpm_bios i2c_core sr_mod cdrom soundcore amd64_edac_mod edac_mc serio_raw sg pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 16409, comm: kvm_stat Tainted: G      2.6.18-209.el5 #1
RIP: 0010:[<ffffffff8001ea91>]  [<ffffffff8001ea91>] __dentry_open+0x6e/0x1dc
RSP: 0018:ffff81021a9ffe68  EFLAGS: 00010282
RAX: ffffffff88410f20 RBX: ffff81021cb03980 RCX: ffff81021cb03980
RDX: 0000000000000000 RSI: ffff8101077ae280 RDI: ffff81021f447228
RBP: ffff81021fdc1910 R08: 0000000000000000 R09: ffffff9c8c94d000
R10: ffff81021cb03980 R11: ffffffff8002c51d R12: 00000000ffffff9c
R13: 0000000000000000 R14: ffff8101077ae280 R15: ffff81021f447228
FS:  00002b0588461190(0000) GS:ffff81021fc1ce40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff88410f20 CR3: 000000021d36a000 CR4: 00000000000006e0
Process kvm_stat (pid: 16409, threadinfo ffff81021a9fe000, task ffff81021d2550c0)
Stack:  00002b058c94c000 0000000000008000 0000000000008000 00000000ffffff9c
 0000000000000011 ffff81016e3bb000 0000000000000000 ffffffff800275aa
 ffff81021f447228 ffff8101077ae280 ffff81021a9ffdb8 ffff81021a9ffdb8
Call Trace:
 [<ffffffff800275aa>] do_filp_open+0x2a/0x38
 [<ffffffff80019e81>] do_sys_open+0x44/0xbe
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Code: 48 8b 10 48 85 d2 74 1e 65 8b 04 25 2c 00 00 00 83 3a 02 74 
RIP  [<ffffffff8001ea91>] __dentry_open+0x6e/0x1dc
 RSP <ffff81021a9ffe68>
CR2: ffffffff88410f20
 <0>Kernel panic - not syncing: Fatal exception

Comment 3 Eduardo Habkost 2010-09-16 20:19:58 UTC
Found the likely cause: the problem is that debugfs_remove() is called when kvm-intel (or kvm-amd) is unloaded, but the .owner field of the debugfs fops structs point to the kvm.ko module.

This way, we may end up calling debugfs_remove() while the KVM debugfs files are still open, and that shouldn't be allowed to happen.

Comment 4 Eduardo Habkost 2010-09-17 21:19:56 UTC
After some analysis, I concluded that the wrong .owner field may be a problem after the file is already open but the crash here is before fops_get() returns inside __dentry_open(). The main issue seems to be a potential race on __dentry_open(). I will ask for feedback on rhkernel-list as I am not sure I didn't miss anything when reading the sys_open() and module unload codepath.

Rewording BZ summary to make it explicit that a module unload is necessary to reproduce it (making it less serious).

Comment 5 Eduardo Habkost 2010-09-17 22:10:59 UTC
Even without the potential race condition on __dentry_open(), the crash may be reproduced more easily without any complex race condition, by just doing this on an Intel machine:

modprobe kvm
modprobe kvm-amd
rmmod kvm
cat /sys/kernel/debug/kvm/largepages

Maybe there is a race condition too, but the more easy-to-reproduce case doesn't involve a race condition, just failure to clean up after errors on kvm_init().

@Suqin Huang: are you able to reproduce this only using "modprobe kvm-amd" on Intel machines (or vice-versa), or it is reproducible also when you are loading the right module? (kvm-intel on Intel machine or kvm-amd on AMD machine)

Comment 6 Suqin Huang 2010-09-19 03:01:40 UTC
could not rmmod kvm_intel module on AMD machine.
it reproduce when I load/unload right module.

Comment 12 Suqin Huang 2010-11-03 09:14:55 UTC
try 500 times
fixed on kvm-83-207.el5
kernel: 2.6.18-230.el5

Comment 14 Amos Kong 2010-12-17 02:48:14 UTC
This bug was reproduced with 2.6.18-194.30.1.el5 + kvm-83-164.el5_5.30,
do we need clone it to 5.5.z ?

Comment 17 errata-xmlrpc 2011-01-13 23:37:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.