Bug 734360

Summary: "opcontrol --deinit" cause kernel panic inside guest os.
Product: Red Hat Enterprise Linux 6 Reporter: Kirby Zhou <kirbyzhou>
Component: kernelAssignee: Jiri Olsa <jolsa>
kernel sub component: Oprofile QA Contact: Michael Petlan <mpetlan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: jwilleford, kernel-mgr, mcermak, robert.richter, yanwang
Version: 6.1   
Target Milestone: rc   
Target Release: 6.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-536.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 07:57:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1164899, 1209644    
Attachments:
Description Flags
kernelpanic
none
oprofile, x86: Fix crash when unloading module
none
oprofile, x86: Fix crash when unloading module (NMI timer mode)
none
[BZ 734360][PATCH 1/2] oprofile: Fix crash when unloading module (hr timer mode)
none
[BZ 734360][PATCH 2/2] oprofile, x86: Fix crash when unloading module (nmi timer mode) none

Description Kirby Zhou 2011-08-30 07:56:58 UTC
Created attachment 520544 [details]
kernelpanic

Description of problem:

Under RHEL6 guest system, type 'opcontrol --deinit', then the guest os hang with a 'kernel panic'

Version-Release number of selected component (if applicable):

HostOS
kernel-2.6.32-131.12.1.el6.x86_64
libvirt-0.8.7-18.el6_1.1.x86_64
qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64

GuestOS
kernel-2.6.32-131.12.1.el6.x86_64
oprofile-0.9.6-12.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. boot a RHEL-6.1 guest on a RHEL-6.1 host
2. type 'opcontrol --deinit' under guest os. 
  
Actual results:

kernel panic

Expected results:


Additional info:

See attachment

Comment 1 Kirby Zhou 2011-08-30 08:11:10 UTC
catched with virtual serial port:


BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
PGD 7bd06067 PUD 79503067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus
CPU 1 
Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log i2c_piix4 i2c_core microcode virtio_net virtio_balloon ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log i2c_piix4 i2c_core microcode virtio_net virtio_balloon ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
Pid: 1625, comm: rmmod Not tainted 2.6.32-131.12.1.el6.x86_64 #1 KVM
RIP: 0010:[<ffffffff813396f9>]  [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
RSP: 0018:ffff880079c8de98  EFLAGS: 00010286
RAX: ffff880079c8c000 RBX: 0000000000000000 RCX: ffffffffa0348f80
RDX: 0000000000000000 RSI: ffffffffa03496b0 RDI: ffffffff81afe800
RBP: ffff880079c8dea8 R08: ffffffff81bfdf40 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffffa0348f20
R13: ffff880079c8df18 R14: 0000000000000000 R15: 0000000000000001
FS:  00007f8fddfda700(0000) GS:ffff880002280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000037392000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 1625, threadinfo ffff880079c8c000, task ffff8800377f6ac0)
Stack:
 0000000000000880 ffffffffa03496e0 ffff880079c8deb8 ffffffffa0343e25
<0> ffff880079c8dec8 ffffffffa034396e ffff880079c8ded8 ffffffffa0346ed8
<0> ffff880079c8df78 ffffffff810a9af4 ffff880079c8df48 ffff880079c8df58
Call Trace:
 [<ffffffffa0343e25>] op_nmi_exit+0x15/0x30 [oprofile]
 [<ffffffffa034396e>] oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa0346ed8>] oprofile_exit+0x18/0x1a [oprofile]
 [<ffffffff810a9af4>] sys_delete_module+0x194/0x260
 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af 
RIP  [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
 RSP <ffff880079c8de98>
CR2: 0000000000000000
---[ end trace 28f965972e3c9cc3 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1625, comm: rmmod Tainted: G      D    ----------------   2.6.32-131.12.1.el6.x86_64 #1
Call Trace:
 [<ffffffff814da648>] ? panic+0x78/0x143
 [<ffffffff814de694>] ? oops_end+0xe4/0x100
 [<ffffffff81040c9b>] ? no_context+0xfb/0x260
 [<ffffffff81040f25>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff814dad57>] ? thread_return+0x4e/0x777
 [<ffffffff8104104e>] ? bad_area+0x4e/0x60
 [<ffffffff81041773>] ? __do_page_fault+0x3c3/0x480
 [<ffffffff8105fa7a>] ? __cond_resched+0x2a/0x40
 [<ffffffff814db5d0>] ? _cond_resched+0x30/0x40
 [<ffffffff814db61c>] ? wait_for_common+0x3c/0x180
 [<ffffffff814e067e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff814dda05>] ? page_fault+0x25/0x30
 [<ffffffff813396f9>] ? sysdev_unregister+0x49/0x80
 [<ffffffff813396cb>] ? sysdev_unregister+0x1b/0x80
 [<ffffffffa0343e25>] ? op_nmi_exit+0x15/0x30 [oprofile]
 [<ffffffffa034396e>] ? oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa0346ed8>] ? oprofile_exit+0x18/0x1a [oprofile]
 [<ffffffff810a9af4>] ? sys_delete_module+0x194/0x260
 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b

Comment 3 William Cohen 2011-08-30 13:04:59 UTC
How was oprofile set up? To do the "opcontrol --deinit" and "opcontrol --init" or "opcontrol --setup ..." must have been performed. How was the oprofile module set up. Also what is the output of /dev/oprofile/cpu_type when the module is loaded on the guest machine?

Comment 4 William Cohen 2011-08-30 13:33:35 UTC
Maybe include the contents of /root/.oprofile/daemonrc for the information on how oprofile was setupup.

What are the details on the guest VM configuration?
-number of processors for guest
-amount of memory

Comment 5 Kirby Zhou 2011-08-31 09:33:15 UTC
There is no '/root/.oprofile/daemonrc' on neigther Host nor Guest.
guest configuration is listed below.

Host ~]# virsh dumpxml 9
<domain type='kvm' id='9'>
  <name>rhel6.1-kvm-203</name>
  <uuid>96116a30-011a-381d-fd72-6e40da4a495c</uuid>
  <memory>2097152</memory>
  <currentMemory>2097152</currentMemory>
  <vcpu>4</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu match='exact'>
    <model>Westmere</model>
    <topology sockets='1' cores='2' threads='2'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/vgext1/lv-rhel6.1-kvm-203'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <interface type='bridge'>
      <mac address='52:54:00:6d:43:85'/>
      <source bridge='br0'/>
      <target dev='vnet6'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:6d:43:86'/>
      <source bridge='br1'/>
      <target dev='vnet7'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5903' autoport='yes' listen='0.0.0.0'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Host ~]# free
             total       used       free     shared    buffers     cached
Mem:      49417812   10078216   39339596          0     114696     483292
-/+ buffers/cache:    9480228   39937584
Swap:      8388600          0    8388600

Comment 6 Kirby Zhou 2011-08-31 09:35:25 UTC
Additionally, '--deinit' also cause RHEL6 guest reboot with RHEL5-XEN-PV hypervisior.

But it didnot made any problem on the KVM hypervisor host itself.

Comment 7 William Cohen 2011-08-31 14:54:22 UTC
For "opcontrol --deinit" to remove the oprofile the oprofile module needs to have been loaded with some opcontrol, modprobe, or insmod command. How is oprofile module getting loaded on the guest? Something must be loading the oprofile module. Could you look to see what is loading the oprofile module?

So there is no /dev/oprofile/cpu_type  file? If so then "opcontrol", is unlikely to be doing the initial load.

Could you supply output of the following from the host:

cat /proc/cpuinfo

From the guest machine the output of:

opcontrol --init
cat /dev/oprofile/cpu_type
cat /proc/cpuinfo

Comment 8 Kirby Zhou 2011-09-01 03:39:35 UTC
~]# cat /proc/cpuinfo  
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)
stepping        : 1
cpu MHz         : 2400.104
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm
bogomips        : 4800.20
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)
stepping        : 1
cpu MHz         : 2400.104
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm
bogomips        : 4800.20
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)
stepping        : 1
cpu MHz         : 2400.104
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm
bogomips        : 4800.20
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)
stepping        : 1
cpu MHz         : 2400.104
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm
bogomips        : 4800.20
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Comment 9 Kirby Zhou 2011-09-01 03:42:45 UTC
 ~]# lsmod
Module                  Size  Used by
ipv6                  322291  26 
xfs                   982056  1 
exportfs                4202  1 xfs
ext3                  133539  1 
jbd                    54480  1 ext3
dm_mirror              14067  0 
dm_region_hash         12136  1 dm_mirror
dm_log                 10120  2 dm_mirror,dm_region_hash
microcode             112781  0 
virtio_net             15741  0 
virtio_balloon          4281  0 
i2c_piix4              12574  0 
i2c_core               31274  1 i2c_piix4
ext4                  359671  3 
mbcache                 7918  2 ext3,ext4
jbd2                   88768  1 ext4
virtio_blk              5692  3 
pata_acpi               3667  0 
ata_generic             3611  0 
ata_piix               22652  0 
virtio_pci              6653  0 
virtio_ring             7169  4 virtio_net,virtio_balloon,virtio_blk,virtio_pci
virtio                  4824  4 virtio_net,virtio_balloon,virtio_blk,virtio_pci
dm_mod                 75539  17 dm_mirror,dm_log

]# cat /dev/oprofile/cpu_type
cat: /dev/oprofile/cpu_type: No such file or directory

~]# opcontrol --deinit
Unloading oprofile module
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
PGD 79f2e067 PUD 79ce9067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus
CPU 3 
Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_net virtio_balloon i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_net virtio_balloon i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan]
Pid: 4941, comm: rmmod Not tainted 2.6.32-131.12.1.el6.x86_64 #1 KVM
RIP: 0010:[<ffffffff813396f9>]  [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
RSP: 0018:ffff88007950be98  EFLAGS: 00010286
RAX: ffff88007950a000 RBX: 0000000000000000 RCX: ffffffffa00e4f80
RDX: 0000000000000000 RSI: ffffffffa00e56b0 RDI: ffffffff81afe800
RBP: ffff88007950bea8 R08: ffffffff81bfdf40 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffffa00e4f20
R13: ffff88007950bf18 R14: 0000000000000000 R15: 0000000000000001
FS:  00007f134d62b700(0000) GS:ffff880002380000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000037abb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 4941, threadinfo ffff88007950a000, task ffff8800370a9500)
Stack:
 0000000000000880 ffffffffa00e56e0 ffff88007950beb8 ffffffffa00dfe25
<0> ffff88007950bec8 ffffffffa00df96e ffff88007950bed8 ffffffffa00e2ed8
<0> ffff88007950bf78 ffffffff810a9af4 ffff88007950bf48 ffff88007950bf58
Call Trace:
 [<ffffffffa00dfe25>] op_nmi_exit+0x15/0x30 [oprofile]
 [<ffffffffa00df96e>] oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa00e2ed8>] oprofile_exit+0x18/0x1a [oprofile]
 [<ffffffff810a9af4>] sys_delete_module+0x194/0x260
 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af 
RIP  [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
 RSP <ffff88007950be98>
CR2: 0000000000000000
---[ end trace 6ce825a202449e30 ]---

Message from syslogd@djt_8_203 at Sep  1 11:41:11 ...
 kernel:Oops: 0000 [#1] SMP 

Message from syslogd@djt_8_203 atKernel panic - not syncing: Fatal exception
 Sep  1 11:41:11 ...
 kernel:last sysfs file: /sys/devices/pci0000:00/000P0:00:id: 4941, comm: rmmod Tainted: G      D    ----------------   2.6.32-131.12.1.el6.x86_64 #1
06.0/local_cpus

Message from syslogd@djt_8_203 at Sep  1 11:41:11 ...
 kernel:Stack:

Message from syslogd@djt_8_203 at Sep  1 11:41:11 ...
 kernel:Call Trace:

Message from syslogd@djt_8_203 at Sep  1 11:41:11 ...
 kernel:Code: 24 08Call Trace:
 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75  [d8< 4f8 c7fffffff814da648>] ? panic+0x78/0x143
 c7 00 e8 af 

Message from syslogd@djt_8_203 at Sep  1 11:41:11 ...
 kernel:CR2: 0000000000000000

Message from syslogd@djt_8_203 at Sep  1 11:41:11 ...
 kernel:Kernel  pa[n<icf -f fnfffff814de694>] ? oops_end+0xe4/0x100
ot syncing: Fatal exception
 [<ffffffff81040c9b>] ? no_context+0xfb/0x260
 [<ffffffff81040f25>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff814dad57>] ? thread_return+0x4e/0x777
 [<ffffffff8104104e>] ? bad_area+0x4e/0x60
 [<ffffffff81041773>] ? __do_page_fault+0x3c3/0x480
 [<ffffffff8105fa7a>] ? __cond_resched+0x2a/0x40
 [<ffffffff814db5d0>] ? _cond_resched+0x30/0x40
 [<ffffffff814db61c>] ? wait_for_common+0x3c/0x180
 [<ffffffff814e067e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff814dda05>] ? page_fault+0x25/0x30
 [<ffffffff813396f9>] ? sysdev_unregister+0x49/0x80
 [<ffffffff813396cb>] ? sysdev_unregister+0x1b/0x80
 [<ffffffffa00dfe25>] ? op_nmi_exit+0x15/0x30 [oprofile]
 [<ffffffffa00df96e>] ? oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa00e2ed8>] ? oprofile_exit+0x18/0x1a [oprofile]
 [<ffffffff810a9af4>] ? sys_delete_module+0x194/0x260
 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b

Comment 10 Kirby Zhou 2011-09-01 04:34:29 UTC
~]# opcontrol --init
[@djt_8_203 ~]# lsmod 
Module                  Size  Used by
oprofile               46533  1 
ipv6                  322291  26 
xfs                   982056  1 
exportfs                4202  1 xfs
ext3                  133539  1 
jbd                    54480  1 ext3
dm_mirror              14067  0 
dm_region_hash         12136  1 dm_mirror
dm_log                 10120  2 dm_mirror,dm_region_hash
microcode             112781  0 
virtio_balloon          4281  0 
virtio_net             15741  0 
i2c_piix4              12574  0 
i2c_core               31274  1 i2c_piix4
ext4                  359671  3 
mbcache                 7918  2 ext3,ext4
jbd2                   88768  1 ext4
virtio_blk              5692  3 
pata_acpi               3667  0 
ata_generic             3611  0 
ata_piix               22652  0 
virtio_pci              6653  0 
virtio_ring             7169  4 virtio_balloon,virtio_net,virtio_blk,virtio_pci
virtio                  4824  4 virtio_balloon,virtio_net,virtio_blk,virtio_pci
dm_mod                 75539  17 dm_mirror,dm_log

~]# cat /dev/oprofile/cpu_type
timer

~]# opcontrol --deinit        
Unloading oprofile module
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
PGD 77cc5067 PUD 7ab06067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus
CPU 3 
Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: oprofile(-) ipv6 xfs exportfs ext3 jbd dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan]
Pid: 1623, comm: rmmod Not tainted 2.6.32-131.12.1.el6.x86_64 #1 KVM
RIP: 0010:[<ffffffff813396f9>]  [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
RSP: 0018:ffff88007bc63e98  EFLAGS: 00010286
RAX: ffff88007bc62000 RBX: 0000000000000000 RCX: ffffffffa0348f80
RDX: 0000000000000000 RSI: ffffffffa03496b0 RDI: ffffffff81afe800
RBP: ffff88007bc63ea8 R08: ffffffff81bfdf40 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffffa0348f20
R13: ffff88007bc63f18 R14: 0000000000000000 R15: 0000000000000001
FS:  00007f37cf143700(0000) GS:ffff880002380000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000373bb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 1623, threadinfo ffff88007bc62000, task ffff8800376f4a80)
Stack:
 0000000000000880 ffffffffa03496e0 ffff88007bc63eb8 ffffffffa0343e25
<0> ffff88007bc63ec8 ffffffffa034396e ffff88007bc63ed8 ffffffffa0346ed8
<0> ffff88007bc63f78 ffffffff810a9af4 ffff88007bc63f48 ffff88007bc63f58
Call Trace:
 [<ffffffffa0343e25>] op_nmi_exit+0x15/0x30 [oprofile]
 [<ffffffffa034396e>] oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa0346ed8>] oprofile_exit+0x18/0x1a [oprofile]
 [<ffffffff810a9af4>] sys_delete_module+0x194/0x260
 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af 
RIP  [<ffffffff813396f9>] sysdev_unregister+0x49/0x80
 RSP <ffff88007bc63e98>
CR2: 0000000000000000
---[ end trace 173aac3dcae7e69a ]---

Message from syslogd@djt_8_203 at Sep  1 12:33:49Kern e.l. p.a
 nkeirc - not syncing: Fatal exception
nel:Oops: 0000 [#1] SMP 

Message from syslogd@djt_8_203 at Sep  1 12:33:49 ...
 kernel:last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/local_cpus

Message from syslogd@djt_8Pid: 1623, comm: rmmod Tainted: G      D    ----------------   2.6.32-131.12.1.el6.x86_64 #1
_203 at Sep  1 12:33:49 ...
 kernel:Stack:

Message from syslogd@djt_8_203 at Sep  1 12:33:49 ...
 kernel:Call Trace:

Message from syslogd@djt_8_203 at Sep  1 12:33:49 ...
0kernel:Code: 24 08 48 8b 59 08 eb 23 66 2e 0f 1f 84 0C0a l0l0  0T0r ac0e0: 0
  48 8b 43 18 48 85 c0 74 0d 4c 89 e7 ff d0 48 8b 13 49 8b 4c 24 08 48 89 d3 <48> 8b 13 48 8d 41 08 48 39 c3 0f 18 0a 75 d8 48 c7 c7 00 e8 af 
 [<ffffffff814da648>] ? panic+0x78/0x143

Message from syslogd@djt_8_203 at Sep  1 12:33:49 ...
 kernel:CR2: 0000000000000000

Message f [r<ofm fsyfsfffff814de694>] ? oops_end+0xe4/0x100
logd@djt_8_203 at Sep  1 12:33:49 ...
 kernel:Kernel panic - not syncing: Fatal exception
 [<ffffffff81040c9b>] ? no_context+0xfb/0x260
 [<ffffffff81040f25>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff814dad57>] ? thread_return+0x4e/0x777
 [<ffffffff8104104e>] ? bad_area+0x4e/0x60
 [<ffffffff81041773>] ? __do_page_fault+0x3c3/0x480
 [<ffffffff8105fa7a>] ? __cond_resched+0x2a/0x40
 [<ffffffff814db5d0>] ? _cond_resched+0x30/0x40
 [<ffffffff814db61c>] ? wait_for_common+0x3c/0x180
 [<ffffffff814e067e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff814dda05>] ? page_fault+0x25/0x30
 [<ffffffff813396f9>] ? sysdev_unregister+0x49/0x80
 [<ffffffff813396cb>] ? sysdev_unregister+0x1b/0x80
 [<ffffffffa0343e25>] ? op_nmi_exit+0x15/0x30 [oprofile]
 [<ffffffffa034396e>] ? oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa0346ed8>] ? oprofile_exit+0x18/0x1a [oprofile]
 [<ffffffff810a9af4>] ? sys_delete_module+0x194/0x260
 [<ffffffff814e0600>] ? arch_prepare_kprobe+0x50/0x90
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b

Comment 11 Robert Richter 2011-10-06 09:40:30 UTC
There is a known bug in the stable kernel for the case oprofile is compiled as module and runs in timer mode.

Will send a fix.

-Robert

Comment 12 Robert Richter 2011-10-06 13:04:24 UTC
(In reply to comment #10)
> ~]# cat /dev/oprofile/cpu_type
> timer

What does dmesg | grep -i oprofile show?

There are 2 different timer modes:

 oprofile: using timer interrupt.

or

 oprofile: using NMI timer interrupt.

Comment 13 Robert Richter 2011-10-07 15:33:20 UTC
Created attachment 526916 [details]
oprofile, x86: Fix crash when unloading module

Comment 14 Robert Richter 2011-10-07 15:33:34 UTC
I analyzed the code and think the following happens:

The guest reports a Westmere cpu (model 44/2ch).

Oprofile does not support this model number and tries to fall back to i386/arch_perfmon. This fails in the guest since X86_FEATURE_ARCH_PERFMON is not set there (need to be confirmed).

Now oprofile_arch_init() fails and oprofile_timer_init() is setup.

On oprofile_exit() both are wiped out oprofile_timer_exit() *and* oprofile_arch_exit(). But oprofile_arch_exit() may not be called because oprofile_arch_init() failed. It tries to unregister some sysdev which does not exist and crashs.

-Robert

Comment 15 Robert Richter 2011-10-07 15:34:37 UTC
(In reply to comment #13)
> Created attachment 526916 [details]
> oprofile, x86: Fix crash when unloading module

Please not that I could only compile test this patch due to missing hardware setup.

Thanks,

-Robert

Comment 16 RHEL Program Management 2011-10-07 16:02:54 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 17 Robert Richter 2011-10-11 16:13:47 UTC
Created attachment 527491 [details]
oprofile, x86: Fix crash when unloading module (NMI timer mode)

Comment 18 Robert Richter 2011-10-11 16:18:16 UTC
This second patch is similar but for oprofile x86 implementation. I could reproduce this bug and tested the fix with the upstream kernel too.

I will send the fixes to lkml soon.

-Robert

Comment 19 Robert Richter 2011-10-19 17:24:24 UTC
Created attachment 529061 [details]
[BZ 734360][PATCH 1/2] oprofile: Fix crash when unloading module (hr  timer mode)

Comment 20 Robert Richter 2011-10-19 17:25:12 UTC
Created attachment 529062 [details]
[BZ 734360][PATCH 2/2] oprofile, x86: Fix crash when unloading  module (nmi timer mode)

Comment 21 Robert Richter 2011-10-19 17:27:37 UTC
I have updated the fixes with the version that I sent out to lkml for review. There is also one section mismatch fix.

-Robert

Comment 23 Jason Willeford 2014-01-23 21:14:19 UTC
Robert,
Have you verified this bug against current release, RHEL 6.5?

Comment 27 RHEL Program Management 2014-12-19 14:19:53 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 28 Rafael Aquini 2015-02-21 04:06:46 UTC
Patch(es) available on kernel-2.6.32-536.el6

Comment 31 Michael Petlan 2015-04-09 16:24:52 UTC
I am still not able to reproduce the bug on a RHEL 6.1 KVM guest.

Comment 34 errata-xmlrpc 2015-07-22 07:57:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html