Bug 636574

Summary: [Beaker] System fails to reboot from webUI if distro profile has been removed.
Product: [Fedora] Fedora Reporter: Jeff Burke <jburke>
Component: cobblerAssignee: Dan Callaghan <dcallagh>
Status: CLOSED DEFERRED QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 15CC: awood, bpeck, davids, dcallagh, jimi, kbaker, mcsontos, mganisin, rmancy, scott, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-26 06:40:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Dan Callaghan 2010-09-29 05:47:33 UTC
Beaker will be getting this exception in the save_system call to cobbler at bkr/server/model.py:1326 in the power method.

It wouldn't happen when Beaker provisions a system, since in that case we will have set a (hopefully) valid profile just before the call to power(). But when someone uses the manual power controls on the system page, we never bother updating the profile and so cobbler will barf if it's still set to an old profile which has been cleaned out. Same deal when Beaker powers off a system after the user returns it (bug 634247).

I'm disinclined to catch the exception inside the power method and just reset the profile somehow, because if we called power() as part of a provision we really do want to know if the profile we are using has disappeared. On the other hand, if we're just using cobbler for power control then I don't think it matters what profile we are using.

Comment 2 Dan Callaghan 2010-09-29 06:44:21 UTC
I think maybe the right fix here is to fix cobbler so that it removes its record of the system when the distro and its profile are removed.

In my testing with the cobbler CLI (which is just a wrapper for the XMLRPC API I think?) this does actually work properly -- if I create a system, referencing a profile, referencing a distro, then delete that distro, all three are removed. So I need to figure out how the old distros are being removed from cobbler on our lab controllers and why it isn't cleaning up system records properly. Maybe it's a bug in a newer version of cobbler? If so, I think we're better off getting it fixed in cobbler rather than adding code to Beaker to work around it.

Comment 3 Bill Peck 2010-09-29 13:03:41 UTC
Hi Dan,

Thanks for investigating this.  ultimately this is a bug in cobbler, but I don't have a lot of hope for getting it fixed there.

Any other thoughts on how to fix it then?

Comment 4 Bill Peck 2010-09-29 13:05:47 UTC
I should read all your comments before adding mine. :-)

If it works on your local version of cobbler then it should be fixable.  

What version of cobbler are you using?  and can you try it on lab-devel or lab-stage?

Comment 5 Dan Callaghan 2010-09-29 22:09:59 UTC
I've got cobbler-2.0.3.1-3.el5 on my VM, which is slightly older than what we have on beaker-devel. It could also be something to do with our usage of cobbler in the labs. I will look into this further today to try and pin down exactly why the system records aren't being removed.

Comment 6 Dan Callaghan 2010-10-01 00:58:23 UTC
I think this might actually be a race of sorts. So far the only way I've succeeded in reproducing it is as follows:

>>> import xmlrpclib
>>> c = xmlrpclib.ServerProxy('http://localhost/cobbler_api')
>>> tok = c.login('testing', 'testing')
>>> s = c.new_system(tok)
>>> c.modify_system(s, 'name', 'testing.example.com', tok)
True
>>> c.modify_system(s, 'profile', 'Fedora11-x86_64', tok)
True
>>> c.save_system(s, tok)
True
>>> # in another window: cobbler distro remove --name=Fedora11-x86_64
>>> # now cobbler list shows the distro, profile, and system have been removed
>>> c.modify_system(s, 'power_address', 'notexist.example.com', tok)
True
>>> # at this point, cobbler has somehow resurrected the deleted system record
>>> c.save_system(s, tok)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1096, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1383, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1147, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1286, in _parse_response
    return u.close()
  File "/usr/lib64/python2.4/xmlrpclib.py", line 744, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault 1: "cobbler.cexceptions.CX:'system testing.example.com references a missing profile Fedora11-x86_64'">

But I would expect this race to be really hard to hit, and yet we've seen it at least three times recently (bug 63247, this bug, and it happened to me the other week). So I must be missing something here still.

Comment 7 Dan Callaghan 2010-10-01 01:16:02 UTC
Maybe the race window is not so small as I thought. If my theory is right then it would happen if, at the moment of deleting a distro, there is some system which had used that distro and was in the middle of being powered on or provisioned. Given how busy Beaker can be, maybe this is not so unlikely?

Comment 8 Jeff Burke 2010-10-01 12:36:31 UTC
What happens is developers run a reserve task and it gets installed with a nightly tree(I think the nightly trees are only kept for 7 days, rel-eng question). So if they happen to install a nightly that is due to expire in the next 24 hours and have the system reserved for several more days. Once that distro is deleted they can no longer use the webUI to reboot.

Look for the oldest nightly tree, It will have a .n extension (RHEL5.6-Server-20100927.n). That tree will be the next tree to be expired. Run a reserve task/provision, wait for that tree to be deleted. Then try and click the reboot button. You will see the error.

Comment 9 Dan Callaghan 2010-10-06 00:18:50 UTC
Jeff, I tried reproducing on beaker-stage using those steps, but it didn't trigger the bug. The cobbler system record was removed (as expected) and when I tried to reboot the system Beaker recreated the system record with a dummy profile (as expected). bpeck pointed out that the cobbler version in production is different though, so maybe it behaves differently.

I'm dropping this from 0.5.59 because it's low priority and (so far as I know) users don't hit this very often.

Comment 11 Dan Callaghan 2010-10-07 00:17:19 UTC
As a workaround you should be able to provision the machine with a new distro (assuming you've manually reserved it). After that, power controls should work again.

Comment 13 Jeff Burke 2012-05-18 16:00:15 UTC
Has this issue been resolved? I have not seen it in quite a while

Best,
Jeff

Comment 14 Dan Callaghan 2012-05-20 22:28:40 UTC
We haven't seen it happening lately.

This bug will finally be solved in Beaker 0.9.0 with the removal of Cobbler.

Comment 15 Dan Callaghan 2012-05-23 05:27:51 UTC
*** Bug 752330 has been marked as a duplicate of this bug. ***

Comment 16 Dan Callaghan 2012-06-26 06:40:23 UTC
Beaker 0.9.0 has been released.