Bug 1638289 - Qemu/Kvm using CPU even when VM is paused, polling KVM_IRQ_LINE_STATUS
Summary: Qemu/Kvm using CPU even when VM is paused, polling KVM_IRQ_LINE_STATUS
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-11 09:25 UTC by Patrick O'Callaghan
Modified: 2023-08-02 23:28 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-28 23:53:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
XML dump of VM (7.56 KB, text/plain)
2018-10-11 21:30 UTC, Patrick O'Callaghan
no flags Details

Description Patrick O'Callaghan 2018-10-11 09:25:45 UTC
Description of problem:
When running QEMU/KVM on a Fedora host with Windows 10 guest, the qemu-system-x86 process is using about 10% of my CPU even when the guest is in paused state.

Version-Release number of selected component (if applicable):
qemu-system-x86-2.11.2-4.fc28.x86_64 (also on earlier versions going back at least a year)

How reproducible:
100%

Steps to Reproduce:
1.Run guest under QEMU.KVM
2.Pause the guest
3.

Actual results:
qemu-system-x86 process is using 10% of CPU (according to 'top') and is usually the busiest process on the system.

Expected results:
Process should be using little or no CPU while guest is paused.

Additional info:
Guest is a Windows 10 VM using VFIO (additional GPU passthrough).
Host is an i7-3770 with 16GB RAM and 4 cores (8 threads)
VM is using 2 pinned cores (4 threads)
VM is using 8GB of locked hugepages
VM has a dedicated NTFS hard disk, not shared with Linux

Running strace on the qemu-system-x86 process shows a constant stream of:
ioctl(12, KVM_IRQ_LINE_STATUS, 0x7fff88be75c0) = 0
ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}, {fd=42, events=POLLIN}, {fd=43, events=POLLIN}, {fd=45, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}, {fd=52, events=POLLIN}, {fd=54, events=POLLIN}, {fd=55, events=POLLOUT}, {fd=133, events=POLLIN}, {fd=134, events=POLLIN}, {fd=135, events=POLLIN}, {fd=136, events=POLLIN}, {fd=137, events=POLLIN}, {fd=140, events=POLLIN}], 20, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}, {fd=42, events=POLLIN}, {fd=43, events=POLLIN}, {fd=45, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}, {fd=52, events=POLLIN}, {fd=54, events=POLLIN}, {fd=55, events=POLLOUT}, {fd=133, events=POLLIN}, {fd=134, events=POLLIN}, {fd=135, events=POLLIN}, {fd=136, events=POLLIN}, {fd=137, events=POLLIN}, {fd=140, events=POLLIN}], 20, {tv_sec=0, tv_nsec=170140}, NULL, 8) = 0 (Timeout)
ioctl(12, KVM_IRQ_LINE_STATUS, 0x7fff88be75c0) = 0
ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}, {fd=42, events=POLLIN}, {fd=43, events=POLLIN}, {fd=45, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}, {fd=52, events=POLLIN}, {fd=54, events=POLLIN}, {fd=55, events=POLLOUT}, {fd=133, events=POLLIN}, {fd=134, events=POLLIN}, {fd=135, events=POLLIN}, {fd=136, events=POLLIN}, {fd=137, events=POLLIN}, {fd=140, events=POLLIN}], 20, {tv_sec=0, tv_nsec=28006}, NULL, 8) = 0 (Timeout)

However event polling shouldn't consume this much CPU.

Comment 1 Cole Robinson 2018-10-11 19:58:06 UTC
There's a thread here that is asking the same thing, doesn't really reach a conclusion that I can see: https://www.spinics.net/lists/kvm/msg157148.html

Can you provide your VM XML? sudo virsh dumpxml $vmname

Comment 2 Patrick O'Callaghan 2018-10-11 21:28:19 UTC
(In reply to Cole Robinson from comment #1)
> There's a thread here that is asking the same thing, doesn't really reach a
> conclusion that I can see: https://www.spinics.net/lists/kvm/msg157148.html
> 
> Can you provide your VM XML? sudo virsh dumpxml $vmname

OK, see attached file.

Comment 3 Patrick O'Callaghan 2018-10-11 21:30:58 UTC
Created attachment 1493060 [details]
XML dump of VM

Output of 'sudo virsh dumpxml'

Comment 4 Cole Robinson 2018-10-12 01:04:28 UTC
it might be a stretch, but usb3 is supposed to be nicer for cpu usage than usb2. can you try 'sudo virsh edit $vmname', delete all the controller type='usb' blocks, and add a single

<controller type="usb" model="qemu-xhci" ports="8"/>

full power off the vm, restart it, see if it makes any difference

Comment 5 Patrick O'Callaghan 2018-10-12 11:25:03 UTC
(In reply to Cole Robinson from comment #4)
> it might be a stretch, but usb3 is supposed to be nicer for cpu usage than
> usb2. can you try 'sudo virsh edit $vmname', delete all the controller
> type='usb' blocks, and add a single
> 
> <controller type="usb" model="qemu-xhci" ports="8"/>
> 
> full power off the vm, restart it, see if it makes any difference

Tried it but it made no difference.

I should say that after my original report I noticed that the CPU usage reported by 'top' actually means 10% of one of the 4 cpus in the system, not 10% of the total. This is rather misleading and may mean there isn't a serious problem.

However the constant stream of poll timeouts does indicate a bug of some sort.

Comment 6 Cole Robinson 2018-10-16 14:27:40 UTC
Paolo I see you responding to a thread last year about this issue, any thoughts here? https://www.spinics.net/lists/kvm/msg157148.html

Comment 7 Paolo Bonzini 2019-04-05 12:40:18 UTC
I think this is caused by Windows using a very frequent periodic timer to update the clock.  There is no solution really.  The timeouts are not an issue, it simply means that the event loop is executing timers rather than detecting activity on file descriptors.

Comment 8 Patrick O'Callaghan 2019-04-05 15:17:02 UTC
(In reply to Paolo Bonzini from comment #7)
> I think this is caused by Windows using a very frequent periodic timer to
> update the clock.  There is no solution really.  The timeouts are not an
> issue, it simply means that the event loop is executing timers rather than
> detecting activity on file descriptors.

As I said, the Windows guest is paused when this is happening, so it can't be an event loop in the guest itself. Or does "paused" not mean what it appears to mean?

Comment 9 Ben Cotton 2019-05-02 19:43:02 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Ben Cotton 2019-05-28 23:53:08 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 11 Patrick O'Callaghan 2019-05-29 10:39:31 UTC
Bug still present in F30.

Comment 12 Alexandra Botez 2023-01-07 10:08:43 UTC Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.