Bug 985246 - NMIs don't work on RHEL6.5 guest
NMIs don't work on RHEL6.5 guest
Status: CLOSED DUPLICATE of bug 928284
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Luiz Capitulino
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-17 03:18 EDT by ShupingCui
Modified: 2016-05-16 00:06 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-04 15:07:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg1-nmi_watchdog=1 (24.58 KB, text/plain)
2013-08-27 22:01 EDT, ShupingCui
no flags Details
dmesg1-nmi_watchdog=2 (24.58 KB, text/plain)
2013-08-27 22:02 EDT, ShupingCui
no flags Details
dmesg_host (50.60 KB, text/plain)
2013-10-09 04:06 EDT, ShupingCui
no flags Details
dmesg_guest (23.03 KB, text/plain)
2013-10-09 04:07 EDT, ShupingCui
no flags Details

  None (edit)
Description ShupingCui 2013-07-17 03:18:18 EDT
Description of problem:
rhel6.5 guest's NMI counter did not increase when boot with nmi_watchdog=1 

Version-Release number of selected component (if applicable):
Host:
kernel-3.10.0-0.rc7.64.el7.x86_64
qemu-kvm-1.5.1-2.el7.x86_64 
Guest:
2.6.32-395.el6.i686

How reproducible:
100%

Steps to Reproduce:
1. boot the guest
/root/staf-kvm-devel/autotest-devel/client/tests/virt/qemu/qemu \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20130708-104617-FPLdNLY4,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20130708-104617-FPLdNLY4,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20130708-104617-FPLdNLY4,path=/tmp/seabios-20130708-104617-FPLdNLY4,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20130708-104617-FPLdNLY4,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 \
    -drive file='/root/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-6.5-32-virtio.raw',if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=on,format=raw,aio=native \
    -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,bootindex=0 \
    -device virtio-net-pci,netdev=idXvBPgx,mac='9a:90:91:92:93:94',bus=pci.0,addr=0x3,id='idJRemuC' \
    -netdev tap,id=idXvBPgx,vhost=on,fd=20 \
    -m 4096 \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \
    -cpu 'Opteron_G2' \
    -M pc \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga cirrus \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -no-kvm-pit-reinjection \
    -enable-kvm
2. add 'nmi_watchdog=1' to guest kernel cmdline and reboot
3. in guest, check the NMI counter
# grep NMI /proc/interrupts
4. in guest, wait 60 seconds and check the NMI counter
# grep NMI /proc/interrupts

Actual results:
NMI counter did not increase after 60 seconds

Expected results:
NMI counter should be increased after 60 seconds

Additional info:
host cpuinfo:
processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 67
model name	: Dual-Core AMD Opteron(tm) Processor 1216
stepping	: 3
cpu MHz		: 1000.000
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips	: 2009.25
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
Comment 2 Luiz Capitulino 2013-08-22 17:44:04 EDT
This works for me. The differences between my test and yours are:

1. My kernel is newer 2.6.32-412.el6.i686
2. I didn't try with the exact same qemu-kvm command-line
3. I built qemu-kvm-rhel7 from sources on a F18 host

Can you please try the following:

1. Update your guest *and* host

2. Make sure you added nmi_watchdog=1 correctly by doing:

 $ grep nmi_watchdog /proc/cmdline

3. If after steps 1 and 2 things still don't work, can you try again with a reduced command-line?

Btw, you should be able to see some NMI interrupts in /proc/interrupts right after boot.
Comment 3 ShupingCui 2013-08-27 05:19:22 EDT
(In reply to Luiz Capitulino from comment #2)
> This works for me. The differences between my test and yours are:
> 
> 1. My kernel is newer 2.6.32-412.el6.i686
> 2. I didn't try with the exact same qemu-kvm command-line
> 3. I built qemu-kvm-rhel7 from sources on a F18 host
> 
> Can you please try the following:
> 
> 1. Update your guest *and* host
> 
> 2. Make sure you added nmi_watchdog=1 correctly by doing:
> 
>  $ grep nmi_watchdog /proc/cmdline
> 
> 3. If after steps 1 and 2 things still don't work, can you try again with a
> reduced command-line?
> 
> Btw, you should be able to see some NMI interrupts in /proc/interrupts right
> after boot.

Hi Luiz,
I tried it via above steps, it still doesn't work for me, could you help check it? 

1. host and guest kernel
(host) # uname -r
3.10.0-9.el7.x86_64
(guest) # uname -r
2.6.32-412.el6.i686
2. # grep nmi_watchdog /proc/cmdline
ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=129M@0M rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM nmi_watchdog=1
3. # grep NMI /proc/interrupts
NMI:          0          0          0          0   Non-maskable interrupts


Best Regards,
Shuping
Comment 4 Luiz Capitulino 2013-08-27 09:55:11 EDT
Can you post your guest's dmesg?

Also, you could try with nmi_watchdog=2. I'll try to reproduce on a RHEL7 host (this can take a few days).
Comment 5 ShupingCui 2013-08-27 22:01:46 EDT
Created attachment 791219 [details]
dmesg1-nmi_watchdog=1
Comment 6 ShupingCui 2013-08-27 22:02:42 EDT
Created attachment 791220 [details]
dmesg1-nmi_watchdog=2
Comment 7 ShupingCui 2013-08-27 22:05:19 EDT
(In reply to Luiz Capitulino from comment #4)
> Can you post your guest's dmesg?
> 
> Also, you could try with nmi_watchdog=2. I'll try to reproduce on a RHEL7
> host (this can take a few days).

Hi Luiz,
I added two dmesg files, please check it.

Thanks,
Shuping
Comment 8 Luiz Capitulino 2013-09-09 11:16:54 EDT
Can you try with "-cpu host"?

The problem seems to be that it works only with -cpu host. If you try any other CPU other than host, the PMU subsystem will fail to initialize hardware events:

"""
Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only
"""

And the NMI watchdog depends on it to work:

"""
NMI watchdog disabled (cpu0): hardware events not enabled
"""
Comment 9 Luiz Capitulino 2013-09-11 13:27:44 EDT
Talked to Paolo about this one and the problem is that we don't do PMU emulation right now, so the only way to get this working is to pass -cpu host.

There's a plan to implement PMU emulation, but it's not high priority.

Paolo is going to create a new bz requesting PMU emulation, when he does that I'll make this bz depend on it.
Comment 10 juzhang 2013-09-11 22:26:38 EDT
(In reply to Luiz Capitulino from comment #8)
> Can you try with "-cpu host"?
> 
> The problem seems to be that it works only with -cpu host. If you try any
> other CPU other than host, the PMU subsystem will fail to initialize
> hardware events:
> 
> """
> Performance Events: unsupported p6 CPU model 42 no PMU driver, software
> events only
> """
> 
> And the NMI watchdog depends on it to work:
> 
> """
> NMI watchdog disabled (cpu0): hardware events not enabled
> """

Hi Scui,

Can you have a try?

Best Regards,
Junyi
Comment 11 ShupingCui 2013-09-11 23:01:05 EDT
(In reply to Luiz Capitulino from comment #8)
> Can you try with "-cpu host"?
> 
> The problem seems to be that it works only with -cpu host. If you try any
> other CPU other than host, the PMU subsystem will fail to initialize
> hardware events:
> 
> """
> Performance Events: unsupported p6 CPU model 42 no PMU driver, software
> events only
> """
> 
> And the NMI watchdog depends on it to work:
> 
> """
> NMI watchdog disabled (cpu0): hardware events not enabled
> """

Hi Luiz,

it still doesn't work with "-cpu host", please help check it, and fix me if I'm wrong.

cmd line:
/usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -drive file='/root/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-6.5-32-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=off,format=qcow2,aio=native \
    -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,bootindex=0 \
    -device virtio-net-pci,netdev=idBNVFfD,mac='9a:20:21:22:23:24',bus=pci.0,addr=0x3,id='idMpTZ9w' \
    -netdev tap,id=idBNVFfD,vhost=on \
    -m 2048 \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \
    -cpu host \
    -M pc \
    -vnc :0 \
    -vga cirrus \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -no-kvm-pit-reinjection \
    -enable-kvm \
    -monitor stdio

host:
# uname -r
3.10.0-0.rc7.64.el7.x86_64
# rpm -qa | grep qemu-kvm
qemu-kvm-1.5.3-2.el7.x86_64

guest:
# uname -r
2.6.32-417.el6.i686
# cat /proc/cmdline 
ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=129M@0M rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM nmi_watchdog=1


Best Regards,
Shuping
Comment 12 Luiz Capitulino 2013-09-13 09:34:12 EDT
You're trying this on the same host as the description, right? Can you provide the host's dmesg please?

Also, are you still checking /proc/interrupts to see if it's working or are you grepping for "NMI watchdog disabled" in guest's dmesg? I think you should do both.
Comment 18 ShupingCui 2013-10-07 23:04:19 EDT
Hi Luiz,

i tested on my local machine, i found the following msg in guest's dmesg with "-cpu 'SandyBridge'":
Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
NMI watchdog disabled (cpu0): hardware events not enabled

and it works fine now with "-cpu host", the following msg in guest's dmesg:
Performance Events: 16-deep LBR, SandyBridge events, Intel PMU driver.
PEBS disabled due to CPU errata.
NMI watchdog enabled, takes one hw-pmu counter.


host info:
3.10.0-31.el7.x86_64
qemu-kvm-1.5.3-7.el7.x86_64
[root@localhost ~]# cat /var/log/dmesg | grep PMU
[    0.281392] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, Intel PMU driver.
[    0.349927] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
Comment 19 Luiz Capitulino 2013-10-08 09:07:41 EDT
Thanks for your testing, but we still need you to test on that machine you mention in comment 11. Because there you have a guest failing even when you specify -cpu.
Comment 20 ShupingCui 2013-10-09 04:05:36 EDT
(In reply to Luiz Capitulino from comment #19)
> Thanks for your testing, but we still need you to test on that machine you
> mention in comment 11. Because there you have a guest failing even when you
> specify -cpu.

Hi Luiz,

I tested on that machine i mention in comment 11, it still doesn't work with "-cpu host".

cmd line:
/root/staf-kvm-devel/autotest-devel/client/tests/virt/qemu/qemu \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20131009-111427-PWW9XLD4,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 \
    -drive file='/root/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-6.5-32-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=off,format=qcow2,aio=native \
    -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,bootindex=0 \
    -device virtio-net-pci,netdev=idXm8ieA,mac='9a:82:83:84:85:86',bus=pci.0,addr=0x3,id='idiItSQu' \
    -netdev tap,id=idXm8ieA,vhost=on \
    -m 4096 \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \
    -cpu host \
    -M pc \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga cirrus \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -no-kvm-pit-reinjection \
    -enable-kvm \
    -monitor stdio


Host info:
3.10.0-31.el7.x86_64
qemu-kvm-1.5.3-7.el7.x86_64

processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 67
model name	: Dual-Core AMD Opteron(tm) Processor 1216
stepping	: 3
cpu MHz		: 1000.000
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips	: 2009.41
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc


guest info:
2.6.32-422.el6.i686

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 67
model name	: Dual-Core AMD Opteron(tm) Processor 1216
stepping	: 3
cpu MHz		: 2400.000
cache size	: 512 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up extd_apicid unfair_spinlock pni cx16 x2apic hypervisor lahf_lm cmp_legacy svm cr8_legacy
bogomips	: 4800.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:
Comment 21 ShupingCui 2013-10-09 04:06:33 EDT
Created attachment 809733 [details]
dmesg_host

dmesg_host
Comment 22 ShupingCui 2013-10-09 04:07:34 EDT
Created attachment 809734 [details]
dmesg_guest
Comment 23 Luiz Capitulino 2013-10-09 08:31:47 EDT
Interesting, you do have hardware PMU in the host. Is it possible to give me remote access to it?
Comment 27 Luiz Capitulino 2013-12-03 16:22:00 EST
Logged into the machine where the problem happens and did some debugging. The problem is that the guest fails to interact with the PMU hardware during boot. This is printed:

"""
Performance Events: Broken PMU hardware detected, using software events only.
"""

It looks like to a problem specific to that machine, and it could be a QEMU or RHEL6 kernel bug.

Scui, could you please try to reproduce with a RHEL6.5 x86_64 guest and also with a RHEL7 x86_64 guest? I'd also appreciate if you write a 6.5_x8664_host and a 7.0_x8664_host scripts.

Thanks!
Comment 28 Luiz Capitulino 2013-12-03 16:36:36 EST
(I promise this is going to be the last summary change for today)
Comment 29 juzhang 2013-12-03 22:41:50 EST
(In reply to Luiz Capitulino from comment #27)
> Logged into the machine where the problem happens and did some debugging.
> The problem is that the guest fails to interact with the PMU hardware during
> boot. This is printed:
> 
> """
> Performance Events: Broken PMU hardware detected, using software events only.
> """
> 
> It looks like to a problem specific to that machine, and it could be a QEMU
> or RHEL6 kernel bug.
> 
> Scui, could you please try to reproduce with a RHEL6.5 x86_64 guest and also
> with a RHEL7 x86_64 guest? I'd also appreciate if you write a 6.5_x8664_host
> and a 7.0_x8664_host scripts.
> 
> Thanks!

Hi Scui,

Could you have a look and update your testing result in bz?

Best Regards,
Junyi
Comment 30 ShupingCui 2013-12-04 00:50:46 EST
(In reply to Luiz Capitulino from comment #27)
> Logged into the machine where the problem happens and did some debugging.
> The problem is that the guest fails to interact with the PMU hardware during
> boot. This is printed:
> 
> """
> Performance Events: Broken PMU hardware detected, using software events only.
> """
> 
> It looks like to a problem specific to that machine, and it could be a QEMU
> or RHEL6 kernel bug.
> 
> Scui, could you please try to reproduce with a RHEL6.5 x86_64 guest and also
> with a RHEL7 x86_64 guest? I'd also appreciate if you write a 6.5_x8664_host
> and a 7.0_x8664_host scripts.
> 
> Thanks!

Hi Luiz,

i tried with RHEL6.5 x86_64 guest and RHEL7 x86_64 guest, still doesn't work, the results as following:
==============================
RHEL6.5_x86_64, dmesg printed:
"""
Performance Events: Broken PMU hardware detected, using software events only.
NMI watchdog disabled (cpu0): hardware events not enabled
"""

RHEL7.0_x86_64, dmesg printed:
"""
[    0.267000] Performance Events: Broken PMU hardware detected, using software events only.
[    0.268002] Failed to access perfctr msr (MSR c0010001 is ffffffffffffffff)
[    0.271220] Brought up 1 CPUs
[    0.272004] smpboot: Total of 1 processors activated (4800.00 BogoMIPS)
[    0.273852] NMI watchdog: disabled (cpu0): hardware events not enabled
"""

and I wrote two script(6.5_x86_64_host, 7.0_x86_64_host) on that machine, you can use it.
Comment 31 Karen Noel 2013-12-04 13:47:57 EST
Luiz and Scui,

AMD does not support an "architectural PMU" like Intel systems do. This means that the vPMU feature does not work on AMD host systems. Unfortunately, the error you see is "Broken PMU hardware detected". Maybe the error message should be better?

The documentation should also clearly state that vPMU is tech preview and only works on Intel host systems.

Similar BZ: https://bugzilla.redhat.com/show_bug.cgi?id=928284

Thanks, Karen
Comment 32 Luiz Capitulino 2013-12-04 15:07:38 EST
Oh, Thank you Karen. In my debugging I saw that some MSRs aren't emulated and I was wondering if there was a technical reason for that or if it was just a matter of implementing the code. I was taking a look at the AMD64 programmer's manual and was going to write to Paolo next.

Yes, this is the same as bug 928284. The difference is that recent kernels have an additional debug message:

"Failed to access perfctr msr (MSR c0010001 is ffffffffffffffff) "

This message doesn't exist in RHEL6, but it's the same problem.

*** This bug has been marked as a duplicate of bug 928284 ***

Note You need to log in before you can comment on or make changes to this bug.