Bug 636574
Summary: | [Beaker] System fails to reboot from webUI if distro profile has been removed. | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jeff Burke <jburke> |
Component: | cobbler | Assignee: | Dan Callaghan <dcallagh> |
Status: | CLOSED DEFERRED | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 15 | CC: | awood, bpeck, davids, dcallagh, jimi, kbaker, mcsontos, mganisin, rmancy, scott, vanmeeuwen+fedora |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-26 06:40:23 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 1
Dan Callaghan
2010-09-29 05:47:33 UTC
I think maybe the right fix here is to fix cobbler so that it removes its record of the system when the distro and its profile are removed. In my testing with the cobbler CLI (which is just a wrapper for the XMLRPC API I think?) this does actually work properly -- if I create a system, referencing a profile, referencing a distro, then delete that distro, all three are removed. So I need to figure out how the old distros are being removed from cobbler on our lab controllers and why it isn't cleaning up system records properly. Maybe it's a bug in a newer version of cobbler? If so, I think we're better off getting it fixed in cobbler rather than adding code to Beaker to work around it. Hi Dan, Thanks for investigating this. ultimately this is a bug in cobbler, but I don't have a lot of hope for getting it fixed there. Any other thoughts on how to fix it then? I should read all your comments before adding mine. :-) If it works on your local version of cobbler then it should be fixable. What version of cobbler are you using? and can you try it on lab-devel or lab-stage? I've got cobbler-2.0.3.1-3.el5 on my VM, which is slightly older than what we have on beaker-devel. It could also be something to do with our usage of cobbler in the labs. I will look into this further today to try and pin down exactly why the system records aren't being removed. I think this might actually be a race of sorts. So far the only way I've succeeded in reproducing it is as follows: >>> import xmlrpclib >>> c = xmlrpclib.ServerProxy('http://localhost/cobbler_api') >>> tok = c.login('testing', 'testing') >>> s = c.new_system(tok) >>> c.modify_system(s, 'name', 'testing.example.com', tok) True >>> c.modify_system(s, 'profile', 'Fedora11-x86_64', tok) True >>> c.save_system(s, tok) True >>> # in another window: cobbler distro remove --name=Fedora11-x86_64 >>> # now cobbler list shows the distro, profile, and system have been removed >>> c.modify_system(s, 'power_address', 'notexist.example.com', tok) True >>> # at this point, cobbler has somehow resurrected the deleted system record >>> c.save_system(s, tok) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib64/python2.4/xmlrpclib.py", line 1096, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.4/xmlrpclib.py", line 1383, in __request verbose=self.__verbose File "/usr/lib64/python2.4/xmlrpclib.py", line 1147, in request return self._parse_response(h.getfile(), sock) File "/usr/lib64/python2.4/xmlrpclib.py", line 1286, in _parse_response return u.close() File "/usr/lib64/python2.4/xmlrpclib.py", line 744, in close raise Fault(**self._stack[0]) xmlrpclib.Fault: <Fault 1: "cobbler.cexceptions.CX:'system testing.example.com references a missing profile Fedora11-x86_64'"> But I would expect this race to be really hard to hit, and yet we've seen it at least three times recently (bug 63247, this bug, and it happened to me the other week). So I must be missing something here still. Maybe the race window is not so small as I thought. If my theory is right then it would happen if, at the moment of deleting a distro, there is some system which had used that distro and was in the middle of being powered on or provisioned. Given how busy Beaker can be, maybe this is not so unlikely? What happens is developers run a reserve task and it gets installed with a nightly tree(I think the nightly trees are only kept for 7 days, rel-eng question). So if they happen to install a nightly that is due to expire in the next 24 hours and have the system reserved for several more days. Once that distro is deleted they can no longer use the webUI to reboot. Look for the oldest nightly tree, It will have a .n extension (RHEL5.6-Server-20100927.n). That tree will be the next tree to be expired. Run a reserve task/provision, wait for that tree to be deleted. Then try and click the reboot button. You will see the error. Jeff, I tried reproducing on beaker-stage using those steps, but it didn't trigger the bug. The cobbler system record was removed (as expected) and when I tried to reboot the system Beaker recreated the system record with a dummy profile (as expected). bpeck pointed out that the cobbler version in production is different though, so maybe it behaves differently. I'm dropping this from 0.5.59 because it's low priority and (so far as I know) users don't hit this very often. As a workaround you should be able to provision the machine with a new distro (assuming you've manually reserved it). After that, power controls should work again. Has this issue been resolved? I have not seen it in quite a while Best, Jeff We haven't seen it happening lately. This bug will finally be solved in Beaker 0.9.0 with the removal of Cobbler. *** Bug 752330 has been marked as a duplicate of this bug. *** Beaker 0.9.0 has been released. |