Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1296099

Summary: [virtio-win][vioser] Race condition in read and write cancellation logic
Product: Red Hat Enterprise Linux 7 Reporter: Ladi Prosek <lprosek>
Component: virtio-winAssignee: Ladi Prosek <lprosek>
virtio-win sub component: virtio-win-prewhql QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: unspecified CC: ailan, jen, juzhang, lijin, lprosek, michen, phou, vrozenfe, wyu
Version: 7.4   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-06 06:58:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ladi Prosek 2016-01-06 10:27:02 UTC
Description of problem:
The vioser I/O cancellation logic has a race condition which may result in an I/O request never completing.

How reproducible:
- easy for write, see below
- hard for read, requires specific host side timing

Steps to Reproduce:
1. Issue an async write on a serial port with WriteFile()
2. Cancel the write immediately with CancelIoEx()

Actual results:
The cancel never completes and the process hangs with ~1% probability on my system.

Expected results:
Writes and reads can always be cancelled.

Additional info:
The driver does not properly synchronize between cancellation and regular I/O completion. If the I/O is being completed at about the same time as the EvtRequestCancel is being delivered to the driver, it is possible that both code paths will skip the completion of the I/O request because both will think that the other party has already done it.

Related reading:
https://msdn.microsoft.com/en-us/library/windows/hardware/ff544726(v=vs.85).aspx

Comment 3 Peixiu Hou 2016-03-23 10:22:46 UTC
Hi Ladi,

About this bug:
I need to reproduce this issue and to verify it on the latest version. 
1. I want to know which the vioser driver version was used when the issue happen?
2. I also want to know the detail steps of reproducing it, eg: how to issue an async write on a serial port with WriteFile()? How to cancel the write immediately with CancelIoEx()? Where the WriteFile() and CancelIoEx() files can be found? etc...


Thanks~~
Peixiu Hou

Comment 4 Ladi Prosek 2016-03-28 14:58:28 UTC
Hi Peixiu Hou,

You should be able to reproduce this with version 112 and older. One approach to building a repro would be to start with the I/O benchmark (vioserial/benchmark in the source tree) which uses async I/O so the only thing missing is the CancelIoEx (or CancelIo) call right after WriteFile. WriteFile, CancelIo/Ex are Win32 API calls:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365747(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363791(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363792(v=vs.85).aspx

Calling Win32 directly using C/C++ may not be the only way though. If the language and framework you're using supports I/O cancellation, you should be able to hit the bug too.

Here's an article about I/O cancellation:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa363789(v=vs.85).aspx

Let me know if this helps. Thanks!

Comment 8 Peixiu Hou 2016-07-12 07:04:27 UTC
Hi Ladi,

I reproduced this bug with virtio-win-prewhql-112. 

Run benchmark.exe w com.redhat.rhevm.vdsm, getting a hang, benchmark.exe stops producing output and cannot be killed.

Verified with virtio-win-prewhql-121. The behavior is same as your system, printing 'CompleteIO failed with error 995' but not getting stuck.

According above, the issue has been fixed.

Thank you so much~~


Best Regards~
Peixiu Hou

Comment 9 lijin 2016-08-01 02:58:33 UTC
change status to verified according to comment#8

Comment 10 lijin 2017-01-06 06:58:40 UTC
close as this issue has already been fixed in rhel7.3 virtio-win package