Bug 734360
Summary: | "opcontrol --deinit" cause kernel panic inside guest os. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Kirby Zhou <kirbyzhou> |
Component: | kernel | Assignee: | Jiri Olsa <jolsa> |
kernel sub component: | Oprofile | QA Contact: | Michael Petlan <mpetlan> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | jwilleford, kernel-mgr, mcermak, robert.richter, yanwang |
Version: | 6.1 | ||
Target Milestone: | rc | ||
Target Release: | 6.6 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-536.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-07-22 07:57:31 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1164899, 1209644 | ||
Attachments: |
catched with virtual serial port: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 PGD 7bd06067 PUD 79503067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus CPU 1 Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log i2c_piix4 i2c_core microcode virtio_net virtio_balloon ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log i2c_piix4 i2c_core microcode virtio_net virtio_balloon ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] Pid: 1625, comm: rmmod Not tainted 2.6.32-131.12.1.el6.x86_64 #1 KVM RIP: 0010:[<ffffffff813396f9>] [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 RSP: 0018:ffff880079c8de98 EFLAGS: 00010286 RAX: ffff880079c8c000 RBX: 0000000000000000 RCX: ffffffffa0348f80 RDX: 0000000000000000 RSI: ffffffffa03496b0 RDI: ffffffff81afe800 RBP: ffff880079c8dea8 R08: ffffffff81bfdf40 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffffa0348f20 R13: ffff880079c8df18 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f8fddfda700(0000) GS:ffff880002280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000037392000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 1625, threadinfo ffff880079c8c000, task ffff8800377f6ac0) Stack: 0000000000000880 ffffffffa03496e0 ffff880079c8deb8 ffffffffa0343e25 <0> ffff880079c8dec8 ffffffffa034396e ffff880079c8ded8 ffffffffa0346ed8 <0> ffff880079c8df78 ffffffff810a9af4 ffff880079c8df48 ffff880079c8df58 Call Trace: [<ffffffffa0343e25>] op_nmi_exit+0x15/0x30 [oprofile] [<ffffffffa034396e>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0346ed8>] oprofile_exit+0x18/0x1a [oprofile] [<ffffffff810a9af4>] sys_delete_module+0x194/0x260 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af RIP [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 RSP <ffff880079c8de98> CR2: 0000000000000000 ---[ end trace 28f965972e3c9cc3 ]--- Kernel panic - not syncing: Fatal exception Pid: 1625, comm: rmmod Tainted: G D ---------------- 2.6.32-131.12.1.el6.x86_64 #1 Call Trace: [<ffffffff814da648>] ? panic+0x78/0x143 [<ffffffff814de694>] ? oops_end+0xe4/0x100 [<ffffffff81040c9b>] ? no_context+0xfb/0x260 [<ffffffff81040f25>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff814dad57>] ? thread_return+0x4e/0x777 [<ffffffff8104104e>] ? bad_area+0x4e/0x60 [<ffffffff81041773>] ? __do_page_fault+0x3c3/0x480 [<ffffffff8105fa7a>] ? __cond_resched+0x2a/0x40 [<ffffffff814db5d0>] ? _cond_resched+0x30/0x40 [<ffffffff814db61c>] ? wait_for_common+0x3c/0x180 [<ffffffff814e067e>] ? do_page_fault+0x3e/0xa0 [<ffffffff814dda05>] ? page_fault+0x25/0x30 [<ffffffff813396f9>] ? sysdev_unregister+0x49/0x80 [<ffffffff813396cb>] ? sysdev_unregister+0x1b/0x80 [<ffffffffa0343e25>] ? op_nmi_exit+0x15/0x30 [oprofile] [<ffffffffa034396e>] ? oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0346ed8>] ? oprofile_exit+0x18/0x1a [oprofile] [<ffffffff810a9af4>] ? sys_delete_module+0x194/0x260 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b How was oprofile set up? To do the "opcontrol --deinit" and "opcontrol --init" or "opcontrol --setup ..." must have been performed. How was the oprofile module set up. Also what is the output of /dev/oprofile/cpu_type when the module is loaded on the guest machine? Maybe include the contents of /root/.oprofile/daemonrc for the information on how oprofile was setupup. What are the details on the guest VM configuration? -number of processors for guest -amount of memory There is no '/root/.oprofile/daemonrc' on neigther Host nor Guest. guest configuration is listed below. Host ~]# virsh dumpxml 9 <domain type='kvm' id='9'> <name>rhel6.1-kvm-203</name> <uuid>96116a30-011a-381d-fd72-6e40da4a495c</uuid> <memory>2097152</memory> <currentMemory>2097152</currentMemory> <vcpu>4</vcpu> <os> <type arch='x86_64' machine='rhel6.1.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <cpu match='exact'> <model>Westmere</model> <topology sockets='1' cores='2' threads='2'/> </cpu> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/vgext1/lv-rhel6.1-kvm-203'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <interface type='bridge'> <mac address='52:54:00:6d:43:85'/> <source bridge='br0'/> <target dev='vnet6'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='bridge'> <mac address='52:54:00:6d:43:86'/> <source bridge='br1'/> <target dev='vnet7'/> <model type='virtio'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/2'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <input type='tablet' bus='usb'> <alias name='input0'/> </input> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='5903' autoport='yes' listen='0.0.0.0'/> <video> <model type='cirrus' vram='9216' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </memballoon> </devices> </domain> Host ~]# free total used free shared buffers cached Mem: 49417812 10078216 39339596 0 114696 483292 -/+ buffers/cache: 9480228 39937584 Swap: 8388600 0 8388600 Additionally, '--deinit' also cause RHEL6 guest reboot with RHEL5-XEN-PV hypervisior. But it didnot made any problem on the KVM hypervisor host itself. For "opcontrol --deinit" to remove the oprofile the oprofile module needs to have been loaded with some opcontrol, modprobe, or insmod command. How is oprofile module getting loaded on the guest? Something must be loading the oprofile module. Could you look to see what is loading the oprofile module? So there is no /dev/oprofile/cpu_type file? If so then "opcontrol", is unlikely to be doing the initial load. Could you supply output of the following from the host: cat /proc/cpuinfo From the guest machine the output of: opcontrol --init cat /dev/oprofile/cpu_type cat /proc/cpuinfo ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 cpu MHz : 2400.104 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm bogomips : 4800.20 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 cpu MHz : 2400.104 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm bogomips : 4800.20 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 cpu MHz : 2400.104 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm bogomips : 4800.20 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 cpu MHz : 2400.104 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm bogomips : 4800.20 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ~]# lsmod Module Size Used by ipv6 322291 26 xfs 982056 1 exportfs 4202 1 xfs ext3 133539 1 jbd 54480 1 ext3 dm_mirror 14067 0 dm_region_hash 12136 1 dm_mirror dm_log 10120 2 dm_mirror,dm_region_hash microcode 112781 0 virtio_net 15741 0 virtio_balloon 4281 0 i2c_piix4 12574 0 i2c_core 31274 1 i2c_piix4 ext4 359671 3 mbcache 7918 2 ext3,ext4 jbd2 88768 1 ext4 virtio_blk 5692 3 pata_acpi 3667 0 ata_generic 3611 0 ata_piix 22652 0 virtio_pci 6653 0 virtio_ring 7169 4 virtio_net,virtio_balloon,virtio_blk,virtio_pci virtio 4824 4 virtio_net,virtio_balloon,virtio_blk,virtio_pci dm_mod 75539 17 dm_mirror,dm_log ]# cat /dev/oprofile/cpu_type cat: /dev/oprofile/cpu_type: No such file or directory ~]# opcontrol --deinit Unloading oprofile module BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 PGD 79f2e067 PUD 79ce9067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus CPU 3 Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_net virtio_balloon i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan] Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_net virtio_balloon i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan] Pid: 4941, comm: rmmod Not tainted 2.6.32-131.12.1.el6.x86_64 #1 KVM RIP: 0010:[<ffffffff813396f9>] [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 RSP: 0018:ffff88007950be98 EFLAGS: 00010286 RAX: ffff88007950a000 RBX: 0000000000000000 RCX: ffffffffa00e4f80 RDX: 0000000000000000 RSI: ffffffffa00e56b0 RDI: ffffffff81afe800 RBP: ffff88007950bea8 R08: ffffffff81bfdf40 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffffa00e4f20 R13: ffff88007950bf18 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f134d62b700(0000) GS:ffff880002380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000037abb000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 4941, threadinfo ffff88007950a000, task ffff8800370a9500) Stack: 0000000000000880 ffffffffa00e56e0 ffff88007950beb8 ffffffffa00dfe25 <0> ffff88007950bec8 ffffffffa00df96e ffff88007950bed8 ffffffffa00e2ed8 <0> ffff88007950bf78 ffffffff810a9af4 ffff88007950bf48 ffff88007950bf58 Call Trace: [<ffffffffa00dfe25>] op_nmi_exit+0x15/0x30 [oprofile] [<ffffffffa00df96e>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa00e2ed8>] oprofile_exit+0x18/0x1a [oprofile] [<ffffffff810a9af4>] sys_delete_module+0x194/0x260 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af RIP [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 RSP <ffff88007950be98> CR2: 0000000000000000 ---[ end trace 6ce825a202449e30 ]--- Message from syslogd@djt_8_203 at Sep 1 11:41:11 ... kernel:Oops: 0000 [#1] SMP Message from syslogd@djt_8_203 atKernel panic - not syncing: Fatal exception Sep 1 11:41:11 ... kernel:last sysfs file: /sys/devices/pci0000:00/000P0:00:id: 4941, comm: rmmod Tainted: G D ---------------- 2.6.32-131.12.1.el6.x86_64 #1 06.0/local_cpus Message from syslogd@djt_8_203 at Sep 1 11:41:11 ... kernel:Stack: Message from syslogd@djt_8_203 at Sep 1 11:41:11 ... kernel:Call Trace: Message from syslogd@djt_8_203 at Sep 1 11:41:11 ... kernel:Code: 24 08Call Trace: 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 [d8< 4f8 c7fffffff814da648>] ? panic+0x78/0x143 c7 00 e8 af Message from syslogd@djt_8_203 at Sep 1 11:41:11 ... kernel:CR2: 0000000000000000 Message from syslogd@djt_8_203 at Sep 1 11:41:11 ... kernel:Kernel pa[n<icf -f fnfffff814de694>] ? oops_end+0xe4/0x100 ot syncing: Fatal exception [<ffffffff81040c9b>] ? no_context+0xfb/0x260 [<ffffffff81040f25>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff814dad57>] ? thread_return+0x4e/0x777 [<ffffffff8104104e>] ? bad_area+0x4e/0x60 [<ffffffff81041773>] ? __do_page_fault+0x3c3/0x480 [<ffffffff8105fa7a>] ? __cond_resched+0x2a/0x40 [<ffffffff814db5d0>] ? _cond_resched+0x30/0x40 [<ffffffff814db61c>] ? wait_for_common+0x3c/0x180 [<ffffffff814e067e>] ? do_page_fault+0x3e/0xa0 [<ffffffff814dda05>] ? page_fault+0x25/0x30 [<ffffffff813396f9>] ? sysdev_unregister+0x49/0x80 [<ffffffff813396cb>] ? sysdev_unregister+0x1b/0x80 [<ffffffffa00dfe25>] ? op_nmi_exit+0x15/0x30 [oprofile] [<ffffffffa00df96e>] ? oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa00e2ed8>] ? oprofile_exit+0x18/0x1a [oprofile] [<ffffffff810a9af4>] ? sys_delete_module+0x194/0x260 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b ~]# opcontrol --init [@djt_8_203 ~]# lsmod Module Size Used by oprofile 46533 1 ipv6 322291 26 xfs 982056 1 exportfs 4202 1 xfs ext3 133539 1 jbd 54480 1 ext3 dm_mirror 14067 0 dm_region_hash 12136 1 dm_mirror dm_log 10120 2 dm_mirror,dm_region_hash microcode 112781 0 virtio_balloon 4281 0 virtio_net 15741 0 i2c_piix4 12574 0 i2c_core 31274 1 i2c_piix4 ext4 359671 3 mbcache 7918 2 ext3,ext4 jbd2 88768 1 ext4 virtio_blk 5692 3 pata_acpi 3667 0 ata_generic 3611 0 ata_piix 22652 0 virtio_pci 6653 0 virtio_ring 7169 4 virtio_balloon,virtio_net,virtio_blk,virtio_pci virtio 4824 4 virtio_balloon,virtio_net,virtio_blk,virtio_pci dm_mod 75539 17 dm_mirror,dm_log ~]# cat /dev/oprofile/cpu_type timer ~]# opcontrol --deinit Unloading oprofile module BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 PGD 77cc5067 PUD 7ab06067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus CPU 3 Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan] Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan] Pid: 1623, comm: rmmod Not tainted 2.6.32-131.12.1.el6.x86_64 #1 KVM RIP: 0010:[<ffffffff813396f9>] [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 RSP: 0018:ffff88007bc63e98 EFLAGS: 00010286 RAX: ffff88007bc62000 RBX: 0000000000000000 RCX: ffffffffa0348f80 RDX: 0000000000000000 RSI: ffffffffa03496b0 RDI: ffffffff81afe800 RBP: ffff88007bc63ea8 R08: ffffffff81bfdf40 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffffa0348f20 R13: ffff88007bc63f18 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f37cf143700(0000) GS:ffff880002380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000373bb000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 1623, threadinfo ffff88007bc62000, task ffff8800376f4a80) Stack: 0000000000000880 ffffffffa03496e0 ffff88007bc63eb8 ffffffffa0343e25 <0> ffff88007bc63ec8 ffffffffa034396e ffff88007bc63ed8 ffffffffa0346ed8 <0> ffff88007bc63f78 ffffffff810a9af4 ffff88007bc63f48 ffff88007bc63f58 Call Trace: [<ffffffffa0343e25>] op_nmi_exit+0x15/0x30 [oprofile] [<ffffffffa034396e>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0346ed8>] oprofile_exit+0x18/0x1a [oprofile] [<ffffffff810a9af4>] sys_delete_module+0x194/0x260 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af RIP [<ffffffff813396f9>] sysdev_unregister+0x49/0x80 RSP <ffff88007bc63e98> CR2: 0000000000000000 ---[ end trace 173aac3dcae7e69a ]--- Message from syslogd@djt_8_203 at Sep 1 12:33:49Kern e.l. p.a nkeirc - not syncing: Fatal exception nel:Oops: 0000 [#1] SMP Message from syslogd@djt_8_203 at Sep 1 12:33:49 ... kernel:last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus Message from syslogd@djt_8Pid: 1623, comm: rmmod Tainted: G D ---------------- 2.6.32-131.12.1.el6.x86_64 #1 _203 at Sep 1 12:33:49 ... kernel:Stack: Message from syslogd@djt_8_203 at Sep 1 12:33:49 ... kernel:Call Trace: Message from syslogd@djt_8_203 at Sep 1 12:33:49 ... 0kernel:Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 0C0a l0l0 0T0r ac0e0: 0 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af [<ffffffff814da648>] ? panic+0x78/0x143 Message from syslogd@djt_8_203 at Sep 1 12:33:49 ... kernel:CR2: 0000000000000000 Message f [r<ofm fsyfsfffff814de694>] ? oops_end+0xe4/0x100 logd@djt_8_203 at Sep 1 12:33:49 ... kernel:Kernel panic - not syncing: Fatal exception [<ffffffff81040c9b>] ? no_context+0xfb/0x260 [<ffffffff81040f25>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff814dad57>] ? thread_return+0x4e/0x777 [<ffffffff8104104e>] ? bad_area+0x4e/0x60 [<ffffffff81041773>] ? __do_page_fault+0x3c3/0x480 [<ffffffff8105fa7a>] ? __cond_resched+0x2a/0x40 [<ffffffff814db5d0>] ? _cond_resched+0x30/0x40 [<ffffffff814db61c>] ? wait_for_common+0x3c/0x180 [<ffffffff814e067e>] ? do_page_fault+0x3e/0xa0 [<ffffffff814dda05>] ? page_fault+0x25/0x30 [<ffffffff813396f9>] ? sysdev_unregister+0x49/0x80 [<ffffffff813396cb>] ? sysdev_unregister+0x1b/0x80 [<ffffffffa0343e25>] ? op_nmi_exit+0x15/0x30 [oprofile] [<ffffffffa034396e>] ? oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0346ed8>] ? oprofile_exit+0x18/0x1a [oprofile] [<ffffffff810a9af4>] ? sys_delete_module+0x194/0x260 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b There is a known bug in the stable kernel for the case oprofile is compiled as module and runs in timer mode. Will send a fix. -Robert (In reply to comment #10) > ~]# cat /dev/oprofile/cpu_type > timer What does dmesg | grep -i oprofile show? There are 2 different timer modes: oprofile: using timer interrupt. or oprofile: using NMI timer interrupt. Created attachment 526916 [details]
oprofile, x86: Fix crash when unloading module
I analyzed the code and think the following happens: The guest reports a Westmere cpu (model 44/2ch). Oprofile does not support this model number and tries to fall back to i386/arch_perfmon. This fails in the guest since X86_FEATURE_ARCH_PERFMON is not set there (need to be confirmed). Now oprofile_arch_init() fails and oprofile_timer_init() is setup. On oprofile_exit() both are wiped out oprofile_timer_exit() *and* oprofile_arch_exit(). But oprofile_arch_exit() may not be called because oprofile_arch_init() failed. It tries to unregister some sysdev which does not exist and crashs. -Robert (In reply to comment #13) > Created attachment 526916 [details] > oprofile, x86: Fix crash when unloading module Please not that I could only compile test this patch due to missing hardware setup. Thanks, -Robert Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Created attachment 527491 [details]
oprofile, x86: Fix crash when unloading module (NMI timer mode)
This second patch is similar but for oprofile x86 implementation. I could reproduce this bug and tested the fix with the upstream kernel too. I will send the fixes to lkml soon. -Robert Created attachment 529061 [details] [BZ 734360][PATCH 1/2] oprofile: Fix crash when unloading module (hr timer mode) Created attachment 529062 [details] [BZ 734360][PATCH 2/2] oprofile, x86: Fix crash when unloading module (nmi timer mode) I have updated the fixes with the version that I sent out to lkml for review. There is also one section mismatch fix. -Robert Robert, Have you verified this bug against current release, RHEL 6.5? This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Patch(es) available on kernel-2.6.32-536.el6 I am still not able to reproduce the bug on a RHEL 6.1 KVM guest. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1272.html |
Created attachment 520544 [details] kernelpanic Description of problem: Under RHEL6 guest system, type 'opcontrol --deinit', then the guest os hang with a 'kernel panic' Version-Release number of selected component (if applicable): HostOS kernel-2.6.32-131.12.1.el6.x86_64 libvirt-0.8.7-18.el6_1.1.x86_64 qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 GuestOS kernel-2.6.32-131.12.1.el6.x86_64 oprofile-0.9.6-12.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. boot a RHEL-6.1 guest on a RHEL-6.1 host 2. type 'opcontrol --deinit' under guest os. Actual results: kernel panic Expected results: Additional info: See attachment