Bug 1259380 - Windows guests consumes full CPU core when host side of virtio-serial is closed
Windows guests consumes full CPU core when host side of virtio-serial is closed
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.1
x86_64 Windows
unspecified Severity urgent
: rc
: ---
Assigned To: Gal Hammer
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-02 09:40 EDT by Nat Meo
Modified: 2016-08-22 15:33 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-22 15:33:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vmc-cat tool and other information (43.44 KB, application/x-gzip)
2015-09-02 09:40 EDT, Nat Meo
no flags Details
vmc-cat V2 to demonstrate host connected flag (26.87 KB, application/x-gzip)
2015-09-08 08:45 EDT, Nat Meo
no flags Details

  None (edit)
Description Nat Meo 2015-09-02 09:40:07 EDT
Created attachment 1069446 [details]
vmc-cat tool and other information

Description of problem:
When using a virtio-serial device in qemu-kvm to communicate between a Windows guest and the Linux host, closing the device on the host side will cause qemu-kvm to consume a full CPU core when any amount of data is written to the virtio-serial device on the guest side. This only happens for Windows guests and not for Linux guests. The version of Windows does not appear to matter as I have observed it on many different versions. For most situations I demonstrate this happening on Windows 7.

When the CPU spike occurs, there is no discernible change inside the guest itself as task manager does not report any increase in CPU usage despite the fact that qemu-kvm is consuming a full CPU core. If the guest is allocated only a single CPU core then it will become completely unresponsive. If the guest is allocated more than one core it will be responsive but run slower.

Profiling qemu-kvm when the CPU spike occurs (see opreport.txt inside attached archive) appears to show a significant amount of time spent inside the Linux kernel in polling and spinlock functions.

Attached is an archive file containing a simple tool I use on the Windows guest to demonstrate this problem. It includes both the source code and a pre-built binary compiled in Visual Studio 2010 for your convenience. Also in the archive are the XML of the Windows guest from libvirt and the output of pstack when the process has the CPU spike.

The virtio drivers used were obtained from the following location:

https://fedoraproject.org/wiki/Windows_Virtio_Drivers

Both the "stable" 102 drivers and the "latest" 109 driver versions produce this problem.

Version-Release number of selected component (if applicable):
kernel-3.10.0-229.11.1.el7.x86_64
qemu-kvm-1.5.3-86.el7_1.5.x86_64
libvirt-1.2.8-16.el7_1.3.x86_64
virtio-win 102 & 109

How reproducible:
100%

Steps to Reproduce:
1. Create a Windows guest with 2 CPU cores and a virtio-serial device with an assigned name.
2. Copy the attached vmc-cat tool into the Windows guest.
3. When the guest is running, open the host side of the virtio-serial device for reading by executing "cat /dev/pts/X" with X being the assigned number of the virtio-serial device.
4. Inside the Windows guest, launch the task manager and go to the "Performance" tab to monitor CPU usage inside the guest.
5. On the host, run "top" or if using virt-manager look at the "Performance" area of the guest to monitor CPU usage.
6. Inside the Windows guest, open a command prompt and run "vmc-cat w XXX" where XXX is the assigned name of the virtio-serial device defined in step 1.
7. With vmc-cat running, type something simple like "Hello" and press ENTER. Make sure that you see the message on the host side where the "cat" command is being executed. Notice no change in the CPU usage.
8. On the host, press CTRL-C to kill the "cat" command to close the host side of the virtio-serial device.
9. Inside the Windows guest, type another message and press ENTER. Observe task manager and notice that there is no discernable change in CPU usage.
10. On the host observe the CPU usage of qemu-kvm. You will see that it is consuming 100% of a single CPU core.
11. On the host execute "cat /dev/pts/X" again. This will display the last message entered on the guest.
12. Observe the CPU usage of qemu-kvm and notice that it no longer consumes a full CPU core.

Actual results:
qemu-kvm will consume a full CPU core and freeze a guest with only one CPU core allocated.

Expected results:
qemu-kvm does not consume a full CPU core and guests can still be responsive when the host side of the virtio-serial device is closed.

Additional info:
In trying to work around this problem I tried to determine if there is a way to detect if the host side was closed from within the Windows guest to prevent any writes to the virtio-serial device that would trigger this. Unfortunately the virtio-serial driver for Windows does not appear to provide any event to report this and the only way we know for sure this problem occurs is that writing to the virtio-serial device after it has been closed will return an error code of 554, but by that point the data has already been written and the CPU spike has started so there is no apparent way to get around the problem.
Comment 2 Gal Hammer 2015-09-08 03:29:08 EDT
The described behavior is by (lack of) design. When the host side is not connected, all read/write requests to/from the driver are completed with an error. If the guest runs in a loop and keeps retrying without a delay then you'll see a 100% CPU usage of the running process' core.

In order to avoid this you can either:

1. Use IOCTL_GET_INFORMATION and check the HostConnected field's value.

2. Register to the GUID_VIOSERIAL_PORT_CHANGE_STATUS notification (similar to a CD change notification).
Comment 3 Nat Meo 2015-09-08 08:45:50 EDT
Created attachment 1071336 [details]
vmc-cat V2 to demonstrate host connected flag
Comment 4 Nat Meo 2015-09-08 09:03:47 EDT
Unfortunately those options don't really work out. As for #1, the HostConnected status does not correctly report when the host side has disconnected. I have attached an updated version of the vmc-cat tool to demonstrate this. The code has been updated so that before each call to WriteFile it will check the HostConnected status with IOCTL_GET_INFORMATION. The following steps are performed:

1. Execute "cat /dev/pts/1" on the host side.
2. Execute "vmc-cat w test" on the guest side.
3. Type a message as follows and observe the output of the status:

Test1
Host Status: Connected

4. Observe the "Test1" is received on the host side.
5. Press CTRL-C in "cat" on the host side to close the host side of the virtio-serial device.
6. Type a second message as follows and observe the output of the status:

Test2
Host Status: Connected

7. CPU spike occurs

So the HostConnected status is not reporting correctly that it is disconnected and the guest is unable to actually determine that it should not be writing data.

Using GUID_VIOSERIAL_PORT_CHANGE_STATUS doesn't appear to be a viable option either. Ignoring the potential race condition that a write could be occurring in another thread when the event is received, it does not appear the virtio-serial driver actually reports when the host side has closed. I have looked through the source code at the following link and don't see anything that would be doing this:

https://github.com/YanVugenfirer/kvm-guest-drivers-windows/tree/master/vioserial

There are events for when the device is physically added and removed as well as when it is opened, but nothing apparent for when it is closed:

https://github.com/YanVugenfirer/kvm-guest-drivers-windows/blob/master/vioserial/sys/Control.c#L122

If I am misreading the source code for virtio-serial then please let me know. The observed behavior of the CPU spike appears to be inside qemu-kvm itself though and not the Windows driver as task manager does not report any increase in CPU usage when this happens. This behavior is not observed on Linux guests though which is the rather strange part. Disconnecting the host side of a virtio-serial device for a Linux guest does not cause the CPU to spike when data is written on the guest side.
Comment 5 Gal Hammer 2015-09-09 06:34:00 EDT
(In reply to Nat Meo from comment #4)
> Unfortunately those options don't really work out. As for #1, the
> HostConnected status does not correctly report when the host side has
> disconnected. I have attached an updated version of the vmc-cat tool to
> demonstrate this. The code has been updated so that before each call to
> WriteFile it will check the HostConnected status with IOCTL_GET_INFORMATION.
> The following steps are performed:
> 
> 1. Execute "cat /dev/pts/1" on the host side.
> 2. Execute "vmc-cat w test" on the guest side.
> 3. Type a message as follows and observe the output of the status:
> 
> Test1
> Host Status: Connected
> 
> 4. Observe the "Test1" is received on the host side.
> 5. Press CTRL-C in "cat" on the host side to close the host side of the
> virtio-serial device.
> 6. Type a second message as follows and observe the output of the status:
> 
> Test2
> Host Status: Connected

If the host status field is not updated then it might be a bug. Either in the driver or in qemu. I need to check it.
 
> 7. CPU spike occurs
> 
> So the HostConnected status is not reporting correctly that it is
> disconnected and the guest is unable to actually determine that it should
> not be writing data.
> 
> Using GUID_VIOSERIAL_PORT_CHANGE_STATUS doesn't appear to be a viable option
> either. Ignoring the potential race condition that a write could be
> occurring in another thread when the event is received, it does not appear
> the virtio-serial driver actually reports when the host side has closed. I
> have looked through the source code at the following link and don't see
> anything that would be doing this:
> 
> https://github.com/YanVugenfirer/kvm-guest-drivers-windows/tree/master/
> vioserial
> 
> There are events for when the device is physically added and removed as well
> as when it is opened, but nothing apparent for when it is closed:
> 
> https://github.com/YanVugenfirer/kvm-guest-drivers-windows/blob/master/
> vioserial/sys/Control.c#L122

The VIRTIO_CONSOLE_PORT_OPEN event is expected when the host either is closed or opened.
 
> If I am misreading the source code for virtio-serial then please let me
> know. The observed behavior of the CPU spike appears to be inside qemu-kvm
> itself though and not the Windows driver as task manager does not report any
> increase in CPU usage when this happens. This behavior is not observed on
> Linux guests though which is the rather strange part. Disconnecting the host
> side of a virtio-serial device for a Linux guest does not cause the CPU to
> spike when data is written on the guest side.

Are you sure that the CPU is not consumed by the vm-cat program?

Can you try with a newer version of qemu-kvm? qemu-kvm-1.5.3-102 is the latest, I think.
Comment 6 Nat Meo 2015-09-09 08:05:25 EDT
The HostConnected status not being updated correctly appears to be a bug to me. You can use the attached source code to see the behavior yourself. It is not a small race condition either where it could be a microsecond between WriteFile and IOCTL_GET_INFORMATION since I can close the host side and then perform the IOCTL_GET_INFORMATION call 15 seconds later or something and it will still report back as connected.

The CPU is definitely not being consumed by the example vmc-cat program. I have attached the source code so you can see for yourself. It works by blocking on a ReadFile call from standard input and only performs a WriteFile call when anything is typed by the user. It only makes the WriteFile call once and does not retry on error. There is no repeated busy loop that would cause the CPU to thrash. As I have mentioned before, if you observe task manager inside the Windows guest it will show no CPU usage at all but on the host side the qemu-kvm process will consume a full CPU core.

Thanks for the info on the VIRTIO_CONSOLE_PORT_OPEN that it also works on closed. That may help prevent the need to call IOCTL_GET_INFORMATION before every call to WriteFile. Given that it is used both for port open and close though it sounds like I still need to check the HostConnected state upon that event which currently does not report back correctly, so I am still stuck with the same problem.

Where can I get qemu-kvm-1.5.3-102? Currently I am using qemu-kvm-1.5.3-86 which is showing up as the latest version through yum. I have observed this problem for over a year now though and it existed on EL6 as well. It just hasn't been too much trouble in what I have been working on until recently since I am depending more on using virtio-serial to communicate between guests and hosts.
Comment 8 Nat Meo 2015-12-05 13:35:01 EST
I tried making a build of QEMU 2.3.1 and with that version this problem no longer occurs. There is no CPU spike when data is written to the virtio serial device when the host side has been closed. I may go along with using QEMU 2.3.1 for my purposes, but this problem still exists in the EL7 QEMU 1.5.3 packages and may affect other people so it may be worth looking into backporting whatever fixes this problem from 2.3.1.
Comment 9 Gal Hammer 2016-08-22 15:33:08 EDT
Problem is solved in QEMU 2.3.1 (comment #8). No reason to backport fix to previous version was given.

Note You need to log in before you can comment on or make changes to this bug.