Description of problem: Version-Release number of selected component (if applicable): kvm-83-191.el5 How reproducible: sometime Steps to Reproduce: 1. start kvm_stat, and repeat rmmod kvm modules and modprobe kvm modules modprobe ksm modprobe kvm_amd modprobe kvm modprobe -r ksm modprobe -r kvm_amd modprobe -r kvm 2. 3. Actual results: host kernel panic Expected results: Additional info: 1. host kernel: 2.6.18-209.el5 2. kernel panic: unable to handle kernel paging request at ffffffff88410f20 RIP: [<ffffffff8001ea91>] __dentry_open+0x6e/0x1dc PGD 203067 PUD 205063 PMD 21bd81067 PTE 0 Oops: 0000 [1] SMP last sysfs file: /module/kvm/version CPU 2 Modules linked in: nls_utf8 vfat fat nfsd exportfs auth_rpcgss tun nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table bridge ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep tpm_infineon snd tpm tg3 i2c_piix4 shpchp tpm_bios i2c_core sr_mod cdrom soundcore amd64_edac_mod edac_mc serio_raw sg pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 16409, comm: kvm_stat Tainted: G 2.6.18-209.el5 #1 RIP: 0010:[<ffffffff8001ea91>] [<ffffffff8001ea91>] __dentry_open+0x6e/0x1dc RSP: 0018:ffff81021a9ffe68 EFLAGS: 00010282 RAX: ffffffff88410f20 RBX: ffff81021cb03980 RCX: ffff81021cb03980 RDX: 0000000000000000 RSI: ffff8101077ae280 RDI: ffff81021f447228 RBP: ffff81021fdc1910 R08: 0000000000000000 R09: ffffff9c8c94d000 R10: ffff81021cb03980 R11: ffffffff8002c51d R12: 00000000ffffff9c R13: 0000000000000000 R14: ffff8101077ae280 R15: ffff81021f447228 FS: 00002b0588461190(0000) GS:ffff81021fc1ce40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffff88410f20 CR3: 000000021d36a000 CR4: 00000000000006e0 Process kvm_stat (pid: 16409, threadinfo ffff81021a9fe000, task ffff81021d2550c0) Stack: 00002b058c94c000 0000000000008000 0000000000008000 00000000ffffff9c 0000000000000011 ffff81016e3bb000 0000000000000000 ffffffff800275aa ffff81021f447228 ffff8101077ae280 ffff81021a9ffdb8 ffff81021a9ffdb8 Call Trace: [<ffffffff800275aa>] do_filp_open+0x2a/0x38 [<ffffffff80019e81>] do_sys_open+0x44/0xbe [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 48 8b 10 48 85 d2 74 1e 65 8b 04 25 2c 00 00 00 83 3a 02 74 RIP [<ffffffff8001ea91>] __dentry_open+0x6e/0x1dc RSP <ffff81021a9ffe68> CR2: ffffffff88410f20 <0>Kernel panic - not syncing: Fatal exception
Found the likely cause: the problem is that debugfs_remove() is called when kvm-intel (or kvm-amd) is unloaded, but the .owner field of the debugfs fops structs point to the kvm.ko module. This way, we may end up calling debugfs_remove() while the KVM debugfs files are still open, and that shouldn't be allowed to happen.
After some analysis, I concluded that the wrong .owner field may be a problem after the file is already open but the crash here is before fops_get() returns inside __dentry_open(). The main issue seems to be a potential race on __dentry_open(). I will ask for feedback on rhkernel-list as I am not sure I didn't miss anything when reading the sys_open() and module unload codepath. Rewording BZ summary to make it explicit that a module unload is necessary to reproduce it (making it less serious).
Even without the potential race condition on __dentry_open(), the crash may be reproduced more easily without any complex race condition, by just doing this on an Intel machine: modprobe kvm modprobe kvm-amd rmmod kvm cat /sys/kernel/debug/kvm/largepages Maybe there is a race condition too, but the more easy-to-reproduce case doesn't involve a race condition, just failure to clean up after errors on kvm_init(). @Suqin Huang: are you able to reproduce this only using "modprobe kvm-amd" on Intel machines (or vice-versa), or it is reproducible also when you are loading the right module? (kvm-intel on Intel machine or kvm-amd on AMD machine)
could not rmmod kvm_intel module on AMD machine. it reproduce when I load/unload right module.
try 500 times fixed on kvm-83-207.el5 kernel: 2.6.18-230.el5
This bug was reproduced with 2.6.18-194.30.1.el5 + kvm-83-164.el5_5.30, do we need clone it to 5.5.z ?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0028.html