Bug 1296099

Summary: [virtio-win][vioser] Race condition in read and write cancellation logic
Product: Red Hat Enterprise Linux 7 Reporter: Ladi Prosek <lprosek>
Component: virtio-winAssignee: Ladi Prosek <lprosek>
virtio-win sub component: virtio-win-prewhql QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: unspecified CC: ailan, jen, juzhang, lijin, lprosek, michen, phou, vrozenfe, wyu
Version: 7.4   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-06 06:58:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ladi Prosek 2016-01-06 10:27:02 UTC
Description of problem:
The vioser I/O cancellation logic has a race condition which may result in an I/O request never completing.

How reproducible:
- easy for write, see below
- hard for read, requires specific host side timing

Steps to Reproduce:
1. Issue an async write on a serial port with WriteFile()
2. Cancel the write immediately with CancelIoEx()

Actual results:
The cancel never completes and the process hangs with ~1% probability on my system.

Expected results:
Writes and reads can always be cancelled.

Additional info:
The driver does not properly synchronize between cancellation and regular I/O completion. If the I/O is being completed at about the same time as the EvtRequestCancel is being delivered to the driver, it is possible that both code paths will skip the completion of the I/O request because both will think that the other party has already done it.

Related reading:
https://msdn.microsoft.com/en-us/library/windows/hardware/ff544726(v=vs.85).aspx

Comment 3 Peixiu Hou 2016-03-23 10:22:46 UTC
Hi Ladi,

About this bug:
I need to reproduce this issue and to verify it on the latest version. 
1. I want to know which the vioser driver version was used when the issue happen?
2. I also want to know the detail steps of reproducing it, eg: how to issue an async write on a serial port with WriteFile()? How to cancel the write immediately with CancelIoEx()? Where the WriteFile() and CancelIoEx() files can be found? etc...


Thanks~~
Peixiu Hou

Comment 4 Ladi Prosek 2016-03-28 14:58:28 UTC
Hi Peixiu Hou,

You should be able to reproduce this with version 112 and older. One approach to building a repro would be to start with the I/O benchmark (vioserial/benchmark in the source tree) which uses async I/O so the only thing missing is the CancelIoEx (or CancelIo) call right after WriteFile. WriteFile, CancelIo/Ex are Win32 API calls:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365747(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363791(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363792(v=vs.85).aspx

Calling Win32 directly using C/C++ may not be the only way though. If the language and framework you're using supports I/O cancellation, you should be able to hit the bug too.

Here's an article about I/O cancellation:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa363789(v=vs.85).aspx

Let me know if this helps. Thanks!

Comment 8 Peixiu Hou 2016-07-12 07:04:27 UTC
Hi Ladi,

I reproduced this bug with virtio-win-prewhql-112. 

Run benchmark.exe w com.redhat.rhevm.vdsm, getting a hang, benchmark.exe stops producing output and cannot be killed.

Verified with virtio-win-prewhql-121. The behavior is same as your system, printing 'CompleteIO failed with error 995' but not getting stuck.

According above, the issue has been fixed.

Thank you so much~~


Best Regards~
Peixiu Hou

Comment 9 lijin 2016-08-01 02:58:33 UTC
change status to verified according to comment#8

Comment 10 lijin 2017-01-06 06:58:40 UTC
close as this issue has already been fixed in rhel7.3 virtio-win package