Bug 880855 - race in createRepo (_link_rpms trying to link non-existent rpm)
race in createRepo (_link_rpms trying to link non-existent rpm)
Status: CLOSED CURRENTRELEASE
Product: Beaker
Classification: Community
Component: scheduler (Show other bugs)
0.10
Unspecified Unspecified
high Severity unspecified (vote)
: 0.13
: ---
Assigned To: Nick Coghlan
tools-bugs
Scheduler
: Regression
: 953209 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-27 17:49 EST by Dan Callaghan
Modified: 2013-06-25 02:27 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-25 02:27:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Dan Callaghan 2012-11-27 17:49:22 EST
This shows up occasionally in beakerd.log:

2012-11-27 04:51:02,064 beakerd ERROR Failed to commit in queued_recipes
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 399, in queued_recipes
    recipe.createRepo()
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4957, in createRepo
    self._link_rpms(directory)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4933, in _link_rpms
    os.link(srcname, dstname)
OSError: [Errno 2] No such file or directory

which suggests a race in _link_rpms between reading the directory and linking the rpms. We were supposed to be avoiding that by holding a flock on the directory whenever it is manipulated. But perhaps there is something touching the directory without acquiring the flock.

This is in the improved createrepo code in 0.10 (bug 835367) so it is essentially a regression.
Comment 1 Dan Callaghan 2012-11-27 17:54:49 EST
This is not too serious since the recipe will be retried (and should proceed) on the next iteration of queued_recipes. And it has only happened 11 times in our production Beaker environment since 0.10 was released. However we should fix it since it might be indicative of other worse problems.

Also not giving this devel_ack+ yet since I'm not sure where the missing flock acquisition is (nor whether that is even the problem).
Comment 2 Dan Callaghan 2013-04-18 18:41:30 EDT
*** Bug 953209 has been marked as a duplicate of this bug. ***
Comment 3 Nick Coghlan 2013-04-19 04:15:29 EDT
Need to reconsider this for 1.0, since it appears the symptoms have changed in 0.12 (to abort rather than retry)
Comment 4 Nick Coghlan 2013-04-22 01:04:06 EDT
Unfortunately, we still don't now how this is getting triggered, so we can't commit to having it fixed in 1.0 :(
Comment 5 Nick Coghlan 2013-04-22 01:41:16 EDT
Then again... Task.disable unlinks RPMs without holding the flock [1], which could definitely cause these symptoms.

The other Task XML-RPC APIs (upload and save) only work with new tasks, so couldn't cause any problems, but a disable operation while a createRepo() call was running could definitely cause these symptoms.

[1] http://git.beaker-project.org/cgit/beaker/tree/Server/bkr/server/model.py?h=develop#n6477
Comment 6 Nick Coghlan 2013-05-07 04:55:59 EDT
Initial patch: http://gerrit.beaker-project.org/1926
Comment 7 Nick Coghlan 2013-05-08 20:00:28 EDT
Based on feedback on the patch, I'm going to rework this to create a clear TaskLibrary abstraction that localises all responsibility for manipulation of the RPM library in one place.
Comment 10 Nick Coghlan 2013-05-21 03:24:40 EDT
dcallagh found a couple of errors in this patch:

2013-05-21 16:55:19,267 beakerd ERROR Error in schedule_queued_recipe(541)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 397, in schedule_queued_recipes
    schedule_queued_recipe(recipe_id)
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 496, in schedule_queued_recipe
    recipe.createRepo()
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 5278, in createRepo
    Task.make_snapshot_repo(snapshot_repo)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6587, in make_snapshot_repo
    return cls.library.make_snapshot_repo(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6451, in make_snapshot_repo
    self._link_rpms(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6436, in _link_rpms
    for srcpath in self._all_rpms():
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6429, in _all_rpms
    if os.path.isdir(srcname):
NameError: global name 'srcname' is not defined

2013-05-21 16:59:15,453 beakerd ERROR Error in schedule_queued_recipe(542)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 397, in schedule_queued_recipes
    schedule_queued_recipe(recipe_id)
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 496, in schedule_queued_recipe
    recipe.createRepo()
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 5278, in createRepo
    Task.make_snapshot_repo(snapshot_repo)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6587, in make_snapshot_repo
    return cls.library.make_snapshot_repo(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6451, in make_snapshot_repo
    self._link_rpms(repo_dir)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6437, in _link_rpms
    dstname = os.path.join(dst, name)
NameError: global name 'name' is not defined
Comment 11 Nick Coghlan 2013-05-21 04:22:46 EDT
I tracked down the gap in the test suite: those two code paths are only hit when there are tasks in the task library, and the test suite doesn't add any before it checks that provisioning works.

/me bumps "getting basic coverage data for the current test suite" further up the wish list...
Comment 12 Nick Coghlan 2013-05-21 05:12:05 EDT
Updated with fixes and test suite enhancements: http://gerrit.beaker-project.org/#/c/1958/
Comment 17 Amit Saha 2013-06-25 02:27:45 EDT
Beaker 0.13.1 has been released.

Note You need to log in before you can comment on or make changes to this bug.