Bug 513604
| Summary: | Domain goes missing from xm list when rebooted | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Sachin Prabhu <sprabhu> | ||||
| Component: | xen | Assignee: | Jiri Denemark <jdenemar> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 5.4 | CC: | berrange, bmason, clalance, drjones, jdenemar, jplans, llim, mmilgram, mshao, pbonzini, riek, tao, xen-maint | ||||
| Target Milestone: | rc | Keywords: | Regression, ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | xen-3.0.3-95.el5 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2010-03-30 08:57:29 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 499522, 525143 | ||||||
| Attachments: |
|
||||||
|
Description
Sachin Prabhu
2009-07-24 13:06:35 UTC
I believe this must have always been there. When a guest is restarted, xend first destroys the guest and then creates and boots a new one. It's just a matter of luck if you manage to list running domains during the short time in between. Yes this is a fundamental limitation of this XenD version. It does not have any concept of 'inactive' guests, so if the guest is not running, XenD won't report it. During reboot you have a small window between the guest shutting down & new one booting, and thus thanks to lack of inactive guest mgmt, the guest can briefly disappear. If it goes missing "briefly" as written in the summary and in the issue tracker, then this resembles what Jiri and Daniel said: during reboot you have a small window between the guest shutting down & new one being created, and thus the guest can disappear. This matches the observation that it is reproducible 5% of the time only. It is implicit in the behavior of Xen, and libvirt (virsh) fixes it. If it never reappears, it is a different problem. The summary should be upgraded and the xend-debug.log and xend.log files should be attached. There is not enough information in this bug report to further diagnose the problem. Please provide - /var/log/xen/xend.log & xend-debug.log from the point in time immediately after doing a 'xm reboot' that exhibits the missing domain problem - Output of 'xm list --long' - Output of 'xenstore-ls' - The /etc/xen/$GUEST config file for the guest showing problems - The 'virsh dumpxml GUEST' output Hmm, I wasn't able to reproduce it even after 200 reboots of an hvm guest. Could you please try to reproduce the bug with packages from http://people.redhat.com/jdenemar/xen/bz513604/ and send xend.log after running xm list --long? Thanks. Thanks a lot. So the error is caused by missing entry for one of the block devices in /vm/UUID/device/vbd/:
vbd = ""
5632 = ""
frontend = "/local/domain/11/device/vbd/5632"
frontend-id = "11"
backend-id = "0"
backend = "/local/domain/0/backend/vbd/11/5632"
768 is missing in there.
In the previous report, it was the cdrom (5632) which was missing. Oops and vif got lost during restarts, which looks almost like bug #509099.
I might have an idea why this happens... I'll report once I know it's (in)correct.
Could you try the new package from http://people.redhat.com/jdenemar/xen/bz513604/ to see if that fixes the error? And report the results and logs even if it does, please. Thanks for the testing. Could you try yet another version of the package? http://people.redhat.com/jdenemar/xen/bz513604/ Thanks a lot. OK, another round... Could you follow the following steps, please? - install the new packages from http://people.redhat.com/jdenemar/xen/bz513604/ and restart xend or the whole machine - turn on logging in xen hotplug scripts: # echo 'SYSLOG=yes' >>/etc/sysconfig/xenhotplug - let udev log debugging messages: # udevcontrol log_priority=debug - let syslog write all (including debugging) messages into /var/log/debug: # echo '*.* /var/log/debug' >>/etc/syslog.conf # service syslog reload - reproduce the bug - send me everything you normally do together with /var/log/debug Thanks a lot So the race condition is confirmed. As usual, the race is between hotplug scripts and xend. Under very lucky conditions hotplug-cleanup script runs early enough to see /local/domain/ID/vm and then it's delayed so that it actually removes /vm/UUID/device/CLASS/ID from the newly created domain instead of the old one. It looks like IA64 is very lucky platform :-) By injecting some sleeps at right places, I'm able to reproduce it locally, which should speed up things quite a bit. Hi, could you try with the latest packages from http://people.redhat.com/jdenemar/xen/bz513604/ (xen-3.0.3-94.el5.bz513604.7)? Thanks Great, thank you very much for the testing. Created attachment 358693 [details]
Fix race condition on domain restart
*** Bug 513265 has been marked as a duplicate of this bug. *** Fix built into xen-3.0.3-95.el5 I verify this bug by following steps: (1) Create a HVM domain (2) Reboot the domain by `xm reboot' (3) Run `xm list' I try this about 30 times and find that the domain will no more missing from xm list when rebooted. So this bug is verified in xen-3.0.3-102.el5. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0294.html This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6). |