Bug 880855

Summary: race in createRepo (_link_rpms trying to link non-existent rpm)
Product: [Retired] Beaker Reporter: Dan Callaghan <dcallagh>
Component: schedulerAssignee: Nick Coghlan <ncoghlan>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: high    
Version: 0.10CC: asaha, dcallagh, llim, pbunyan, qwan, rglasz, rmancy, xjia
Target Milestone: 0.13Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Scheduler
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-25 06:27:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Callaghan 2012-11-27 22:49:22 UTC
This shows up occasionally in beakerd.log:

2012-11-27 04:51:02,064 beakerd ERROR Failed to commit in queued_recipes
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 399, in queued_recipes
    recipe.createRepo()
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4957, in createRepo
    self._link_rpms(directory)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4933, in _link_rpms
    os.link(srcname, dstname)
OSError: [Errno 2] No such file or directory

which suggests a race in _link_rpms between reading the directory and linking the rpms. We were supposed to be avoiding that by holding a flock on the directory whenever it is manipulated. But perhaps there is something touching the directory without acquiring the flock.

This is in the improved createrepo code in 0.10 (bug 835367) so it is essentially a regression.

Comment 1 Dan Callaghan 2012-11-27 22:54:49 UTC
This is not too serious since the recipe will be retried (and should proceed) on the next iteration of queued_recipes. And it has only happened 11 times in our production Beaker environment since 0.10 was released. However we should fix it since it might be indicative of other worse problems.

Also not giving this devel_ack+ yet since I'm not sure where the missing flock acquisition is (nor whether that is even the problem).

Comment 2 Dan Callaghan 2013-04-18 22:41:30 UTC
*** Bug 953209 has been marked as a duplicate of this bug. ***

Comment 3 Nick Coghlan 2013-04-19 08:15:29 UTC
Need to reconsider this for 1.0, since it appears the symptoms have changed in 0.12 (to abort rather than retry)

Comment 4 Nick Coghlan 2013-04-22 05:04:06 UTC
Unfortunately, we still don't now how this is getting triggered, so we can't commit to having it fixed in 1.0 :(

Comment 5 Nick Coghlan 2013-04-22 05:41:16 UTC
Then again... Task.disable unlinks RPMs without holding the flock [1], which could definitely cause these symptoms.

The other Task XML-RPC APIs (upload and save) only work with new tasks, so couldn't cause any problems, but a disable operation while a createRepo() call was running could definitely cause these symptoms.

[1] http://git.beaker-project.org/cgit/beaker/tree/Server/bkr/server/model.py?h=develop#n6477

Comment 6 Nick Coghlan 2013-05-07 08:55:59 UTC
Initial patch: http://gerrit.beaker-project.org/1926

Comment 7 Nick Coghlan 2013-05-09 00:00:28 UTC
Based on feedback on the patch, I'm going to rework this to create a clear TaskLibrary abstraction that localises all responsibility for manipulation of the RPM library in one place.

Comment 10 Nick Coghlan 2013-05-21 07:24:40 UTC
dcallagh found a couple of errors in this patch:

2013-05-21 16:55:19,267 beakerd ERROR Error in schedule_queued_recipe(541)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 397, in schedule_queued_recipes
    schedule_queued_recipe(recipe_id)
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 496, in schedule_queued_recipe
    recipe.createRepo()
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 5278, in createRepo
    Task.make_snapshot_repo(snapshot_repo)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6587, in make_snapshot_repo
    return cls.library.make_snapshot_repo(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6451, in make_snapshot_repo
    self._link_rpms(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6436, in _link_rpms
    for srcpath in self._all_rpms():
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6429, in _all_rpms
    if os.path.isdir(srcname):
NameError: global name 'srcname' is not defined

2013-05-21 16:59:15,453 beakerd ERROR Error in schedule_queued_recipe(542)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 397, in schedule_queued_recipes
    schedule_queued_recipe(recipe_id)
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 496, in schedule_queued_recipe
    recipe.createRepo()
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 5278, in createRepo
    Task.make_snapshot_repo(snapshot_repo)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6587, in make_snapshot_repo
    return cls.library.make_snapshot_repo(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6451, in make_snapshot_repo
    self._link_rpms(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6437, in _link_rpms
    dstname = os.path.join(dst, name)
NameError: global name 'name' is not defined

Comment 11 Nick Coghlan 2013-05-21 08:22:46 UTC
I tracked down the gap in the test suite: those two code paths are only hit when there are tasks in the task library, and the test suite doesn't add any before it checks that provisioning works.

/me bumps "getting basic coverage data for the current test suite" further up the wish list...

Comment 12 Nick Coghlan 2013-05-21 09:12:05 UTC
Updated with fixes and test suite enhancements: http://gerrit.beaker-project.org/#/c/1958/

Comment 17 Amit Saha 2013-06-25 06:27:45 UTC
Beaker 0.13.1 has been released.