Created attachment 360058 [details]
Trivial fix to make rename-restart work again
Description of problem:
First, "rename-restart" behavior is completely broken as
preserveForRestart() method of XendDomainInfo class calls
self._removeVm() instead of self.removeVm(), which results in
AttributeError: XendDomainInfo instance has no attribute '_removeVm'
When a domain is configured for "rename-restart" behavior instead of
"restart", the old instance of the domain is renamed and preserved during
restarts. There is a bug in our xend code which breaks restart_count for
those domains. The counter is incremented for the old instance instead
of the new one. That is, the running instance would seem like it was
never restarted and older instances would have restart_count set to 1.
It works only by an accident for domains with "restart" behavior,
because both the old instance and the just created domain share the same
path in xenstore.
The bug was found during code inspection when fixing another bug.
Version-Release number of selected component (if applicable):
xen-3.0.3-94 (most likely every version since xen-3.0.3-76)
Steps to Reproduce:
1. create a domain with on_reboot = "rename-restart"
2. restart the domain
3. check /vm/UUID/xend/restart_count of the restarted domain
The domain remains shutdown.
The domain is successfully restarted and restart_count is increased by each restart.
Both parts of this bug were introduced by inaccurate backport of upstream cs 16977 in xen-3.0.3-76. Trivial fix attached.
Yes, that's correct, xenpv-1-1 is the renamed guest and it should be in state s. However, you should also see a new running xenpv-1 guest. Value of restart_count should only be increased in the new guest.
As you see the increased value of restart_count, I guess you just issued the second xm li too soon after xm reboot finished. You should wait a bit to give the new domain a chance to appear.
Hmm, funny that virsh can see it but xm doesn't. Is it missing in xm output even when you run xm list after you see the guest in virsh list?
(In reply to comment #9)
> Hmm, funny that virsh can see it but xm doesn't. Is it missing in xm output
> even when you run xm list after you see the guest in virsh list?
Yes,it is only outputted in "virsh list".
(In reply to comment #10)
> Yes,it is only outputted in "virsh list".
That's weird. Anyway if the guest is running after rebooting and you can see its console and ssh to it, it's most likely some unrelated issue.
To be sure, could you attach output of xenstore-ls and /var/log/xen/xend.log after you tried listing running guests with virsh list and xm list?
xenstore database doesn't seem to be in the best shape. Are you sure you rebooted the host after trying to reproduce this bug on older package version? To be 100% sure there are no leftovers in xenstore, please reboot the host and try it again.
(In reply to comment #13)
> xenstore database doesn't seem to be in the best shape. Are you sure you
> rebooted the host after trying to reproduce this bug on older package version?
> To be 100% sure there are no leftovers in xenstore, please reboot the host and
> try it again.
Yes, I'm sure I rebooted xend. And I tried to reboot the host just now, but the result is the same: only virsh list can show the 2(old and new) domains.
Restarting xend is not enough in this case as it doesn't clean xenstore. Rebooting does, that's why I wanted you to reboot the host. Thanks for doing that.
Can you attach xenstore-ls and output of xm list and virsh list now when you rebooted the host?
There seems to be a race in rename-restart path similar to one we fixed in bug 358693 for normal restart. I'll respin the patch for this bug.
Ah, that was the attachment ID. I wanted to mention bug 513604 instead...
Created attachment 381996 [details]
Fix a race in rename-restart
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).