Bug 790721

Summary: multiprovider build error: RuntimeError: link: /tmp/.guestfs-0/kernel /tmp/.guestfs-0/kernel.10139: File exists
Product: [Community] Virtualization Tools Reporter: Richard W.M. Jones <rjones>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED UPSTREAM QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: akarol, brad, dajohnso, deltacloud-maint, dgao, mbooth, rjones, ssachdev, virt-maint, whayutin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 790528
: 790958 (view as bug list) Environment:
Last Closed: 2012-02-17 15:59:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 790528, 790958    

Description Richard W.M. Jones 2012-02-15 09:21:31 UTC
+++ This bug was initially created as a clone of Bug #790528 +++

So, this error seems to originate in the code that creates hard links for launching the appliance VM that libguestfs uses to do its job.  The relevant code is here:

https://github.com/libguestfs/libguestfs/blob/master/src/appliance.c

The comments at the top make it sound as if this activity _should_ be thread/concurrency safe.

However, maybe not.  

Guestfs is using a read lock on the checksum file to avoid walking on itself when creating the links.  Oz did something similar to avoid having two concurrent Oz instances download the same ISO file into the same canonical location.

The issue with Oz, which may come into play here, is that we (Factory) are a single process multi-threaded application and multiple threads of the same process seem to be able to acquire a lock on the same file at the same time without error.

So, perhaps we have two threads doing g.launch() at the same time.  Both libguestfs instances try, and succeeded, in acquiring a read lock on the checksum file (as they are both part of the same process), one creates the kernel.$PID symlink first and the second one fails because the link is already there.

Am going to ask Mr. Jones to weigh in on my largely uninformed speculation above to see if there's any sense in it.

If this is indeed the case we may be able to work around the race in the short term by launching and then stopping a dummy libguestfs instance during Factory startup, before we start the server and "go multithreaded".

Comment 1 Richard W.M. Jones 2012-02-15 19:34:17 UTC
Patch posted:
https://www.redhat.com/archives/libguestfs/2012-February/msg00068.html

Comment 2 Richard W.M. Jones 2012-02-17 15:59:12 UTC
Pushed upstream:
https://github.com/libguestfs/libguestfs/commit/afed7e493dcd594620f19b93e9fb73e58553f60a
and available in libguestfs >= 1.17.8.

I can't build libguestfs for F17/Rawhide at the moment
because of bug 791032.