Bug 855735 - Race between beaker-transfer, recipeset finish time, and job delete
Summary: Race between beaker-transfer, recipeset finish time, and job delete
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Beaker
Classification: Retired
Component: lab controller
Version: 0.9
Hardware: Unspecified
OS: Unspecified
low
unspecified
Target Milestone: ---
Assignee: beaker-dev-list
QA Contact:
URL:
Whiteboard: LogStorage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-10 06:54 UTC by Raymond Mancy
Modified: 2018-05-08 03:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-08 03:11:52 UTC
Embargoed:


Attachments (Terms of Use)

Description Raymond Mancy 2012-09-10 06:54:00 UTC
I've been having a problem on my labcontroller where it's trying to transfer logs of a deleted recipe. The exception thrown is

Fault: <Fault 1: "<type 'exceptions.UnboundLocalError'>:local variable 'myserver' referenced before assignment">

This is because of the following block on the server:

  for mylog in recipe.all_logs:
    myserver = '%s/%s' % (server, mylog['filepath'])
    mybasepath = '%s/%s' % (basepath, mylog['filepath'])
    self.change_file(mylog['tid'], myserver, mybasepath)
  recipe.log_server = urlparse.urlparse(myserver)[1]

This is happening to me because my rsync config was wrong, but I think the problem should be able to be duplicated in the following manner:

1) Set 'CACHE = True'
2) Create new job
3) Wait until at least one log has been at least partially uploaded
4) stop beaker-transfer process. 
5) Wait until the recipeset has finished
6) Delete job
7) start beaker-transfer process

The Recipe.log_server will still be the LC, so beaker-transfer will make a call get details of the logs it needs to transfer, howeverthe logs will have been deleted and it will get stuck.

Comment 1 Dan Callaghan 2012-09-10 06:56:58 UTC
Is this fixed by the recent 0.9.3-7 hotfix?

http://git.beaker-project.org/cgit/beaker/commit/?h=release-0.9.3&id=6a149822706ef40cda17edd228ba3b2be75e3c16

Comment 2 Raymond Mancy 2012-09-10 07:15:19 UTC
Yes that traceback will be, but it still won't be right.

I think what will happen, is we will end up with logs on our archive server which are undeletable. This is because we can't delete the same recipe twice, and if we're hitting this problem it's because the recipe has already been deleted, but it's logs still live on disk. 

Ideally we should make the process more robust so this scenario cannot come about. If that's too hard, than we should at least be able to signal to the LC that this recipe is in fact deleted, and any references you have to it's logs locally should also be deleted.

As it stands with that hotfix, the rsync process has already synced it to the archive server.

Comment 3 Bill Peck 2012-09-10 15:13:06 UTC
(In reply to comment #2)
> Yes that traceback will be, but it still won't be right.
> 
> I think what will happen, is we will end up with logs on our archive server
> which are undeletable. This is because we can't delete the same recipe
> twice, and if we're hitting this problem it's because the recipe has already
> been deleted, but it's logs still live on disk. 

We don't have webdav configured for our lab controllers do we?  So if beaker-transfer hasn't run in a long time for some reason then log-delete could try and delete some logs from the lab controller which have expired.  If webdav is configured they will be deleted from there, if not then they should still be in the DB right?  We don't remove log entries for logs we haven't been able to delete right?  Once they are moved from the lab controller to the archive server log-delete should be able to complete the delete via webdav.

Am I missing something?

> 
> Ideally we should make the process more robust so this scenario cannot come
> about. If that's too hard, than we should at least be able to signal to the
> LC that this recipe is in fact deleted, and any references you have to it's
> logs locally should also be deleted.
> 
> As it stands with that hotfix, the rsync process has already synced it to
> the archive server.

Comment 4 Raymond Mancy 2012-09-11 03:43:10 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Yes that traceback will be, but it still won't be right.
> > 
> > I think what will happen, is we will end up with logs on our archive server
> > which are undeletable. This is because we can't delete the same recipe
> > twice, and if we're hitting this problem it's because the recipe has already
> > been deleted, but it's logs still live on disk. 
> 
> We don't have webdav configured for our lab controllers do we?  So if
> beaker-transfer hasn't run in a long time for some reason then log-delete
> could try and delete some logs from the lab controller which have expired. 
> If webdav is configured they will be deleted from there, if not then they
> should still be in the DB right?  We don't remove log entries for logs we
> haven't been able to delete right?  Once they are moved from the lab
> controller to the archive server log-delete should be able to complete the
> delete via webdav.
> 
> Am I missing something?
> 

This is all correct if webdav is configured for the LC (Which I've just checked with mgrigull, and it is). 

However this it kind of besides the point here, what I mean to say is this;
the traceback that originates from recipes.change_files(), is called _after_ we have synced the logs to the archive server:

  rc = self.rsync('%s/' % tmpdir, '%s' % self.conf.get("ARCHIVE_RSYNC"))
  logger.debug("rsync rc=%s", rc)
  if rc == 0:
      # if the logs have been transfered then tell the server the new location
      self.hub.recipes.change_files(recipe_id,   self.conf.get("ARCHIVE_SERVER"),self.conf.get("ARCHIVE_BASEPATH"))

So this means that we've just synced the files of a deleted recipe to the archive server. This recipe will never be picked up again by a log-delete run,
so those files will live on the archive server forever.

This is how I read the situation, perhaps I'm missing something though?


> > 
> > Ideally we should make the process more robust so this scenario cannot come
> > about. If that's too hard, than we should at least be able to signal to the
> > LC that this recipe is in fact deleted, and any references you have to it's
> > logs locally should also be deleted.
> > 
> > As it stands with that hotfix, the rsync process has already synced it to
> > the archive server.

Comment 5 Dan Callaghan 2012-09-11 03:50:33 UTC
(In reply to comment #4)

Did the rsync actually sync any files across? Or did it just create an empty directory?

Comment 6 Nick Coghlan 2012-10-17 04:38:13 UTC
Bulk reassignment of issues as Bill has moved to another team.

Comment 9 Roman Joost 2018-05-08 03:11:52 UTC
Guess we'll never know. I'm closing this bug, since we won't be able to reproduce it with the information here.


Note You need to log in before you can comment on or make changes to this bug.