Bug 1169055

Summary: Race condition in qemuGetProcessInfo()
Product: [Community] Virtualization Tools Reporter: Eduardo Costa <eduardobmc>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: eblake, eduardobmc, rbalakri
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-01 22:44:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test program describing the bug
none
Proposed patch none

Description Eduardo Costa 2014-11-29 15:35:25 UTC
Created attachment 962749 [details]
Test program describing the bug

Description of problem:
There is a race condition in qemuGetProcessInfo() (qemu_driver.c). It occurs when the /proc file for the qemu process can be opened (fopen() succeeds), but when one tries to read from it, it is gone (fscanf() fails). The behavior is not consistent when both fopen and fscanf succeed and when only fopen succeeds.

Version-Release number of selected component (if applicable):
1.2.2

How reproducible:
Occasionaly. Has been observed in some OpenStack CI builds.

Steps to Reproduce:
Can be simulated with the test program attached.
The steps are:
1. Create a subprocess.
2. Open /proc/{pid}/stat.
3. Wait for subprocess to exit.
4. Try to read from the descriptor opened in step #2.

Actual results:
qemuGetProcessInfo() returns -1 and sets errno = -EINVAL.

Expected results:
qemuGetProcessInfo() should return 0 and set output arguments to 0,
as it does when fopen() fails.


Additional info:
Logs from OpenStack CI builds: http://logstash.openstack.org/#eyJmaWVsZHMiOiBbXSwgInNlYXJjaCI6ICJtZXNzYWdlOlwib3BlcmF0aW9uIGZhaWxlZDogY2Fubm90IHJlYWQgY3B1dGltZSBmb3IgZG9tYWluXCIgQU5EIHRhZ3M6XCJzY3JlZW4tbi1jcHUudHh0XCJcbiIsICJ0aW1lZnJhbWUiOiAiODY0MDAwIiwgImdyYXBobW9kZSI6ICJjb3VudCIsICJvZmZzZXQiOiAwfQ==
sample libvirt log from one of the failed jobs: http://logs.openstack.org/17/135917/3/check/gate-tempest-dsvm-neutron-src-python-neutronclient/1880f95/logs/libvirt/libvirtd.txt.gz#_2014-11-27_06_46_39_818

Comment 1 Eduardo Costa 2014-11-29 15:38:12 UTC
Created attachment 962752 [details]
Proposed patch

Comment 2 Eric Blake 2014-12-01 19:03:41 UTC
Please send the patch upstream to libvir-list. Patches in attachments are harder to apply and get fewer eyes looking at them, so they tend to be forgotten in comparison to patches directly on the mailing list.

Comment 3 Eduardo Costa 2014-12-01 20:31:32 UTC
Patch sent to mailing list, as requested.

Comment 4 Eric Blake 2014-12-01 22:43:45 UTC
Fixed in git:
commit ff018e686a8a412255bc34d3dc558a1bcf74fac5
Author: Eduardo Costa <eduardobmc>
Date:   Mon Dec 1 18:24:20 2014 -0200

    Fix race condition in qemuGetProcessInfo
    
    There is a race condition between the fopen and fscanf calls
    in qemuGetProcessInfo. If fopen succeeds, there is a small
    possibility that the file no longer exists before reading from it.
    Now, if either fopen or fscanf calls fail, the function will behave
    just as only fopen had failed.
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1169055
    
    Signed-off-by: Eric Blake <eblake>