Bug 513604 - Domain goes missing from xm list when rebooted
Summary: Domain goes missing from xm list when rebooted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.4
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 513265 (view as bug list)
Depends On:
Blocks: 499522 525143
TreeView+ depends on / blocked
 
Reported: 2009-07-24 13:06 UTC by Sachin Prabhu
Modified: 2018-10-20 04:15 UTC (History)
13 users (show)

Fixed In Version: xen-3.0.3-95.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:57:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix race condition on domain restart (3.74 KB, patch)
2009-08-26 09:59 UTC, Jiri Denemark
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0294 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2010-03-29 14:20:32 UTC

Description Sachin Prabhu 2009-07-24 13:06:35 UTC
Customer reports a regression where the domain briefly goes missing from the list of domains in xm list when undergoing a reboot initiated by xm reboot. This is not seen on RHEL 5.3 kernels.
-----

Description of Problem:
 This is the REGRESSION issue. This issue does not occur on RHEL5.3GA.

 The `xm list' command may not output HVM domains if they are
 rebooted by `xm reboot'.

 Notice that this issue did not occur on PV domains.

Version-Release number of selected component:

 Red Hat Enterprise Linux Version Number: 5
 Release Number: 4 Beta
 Architecture: ia64
 Kernel Version: 2.6.18-155.el5xen
 Related Package Version: xen-3.0.3-88.el5
 Related Middleware / Application: None

Drivers or hardware or architecture dependency:
 None

How reproducible:
 1/20

Step to Reproduce:
 1. Create a HVM domain
 2. Reboot the domain by `xm reboot'
 3. Run `xm list'

Actual Results:
 Name                                      ID Mem(MiB) VCPUs State   
 Time(s)
 Domain-0                                   0      837     1 r-----    807.4

Expected Results:
 Name                                      ID Mem(MiB) VCPUs State   
 Time(s)
 Domain-0                                   0      837     1 r-----    807.4
 rhel54_74                                 25     1047     1 r-----     17.4

Summary of actions taken to resolve issue:
 None

Location of diagnostic data:
 None

Comment 3 Jiri Denemark 2009-07-27 10:33:12 UTC
I believe this must have always been there. When a guest is restarted, xend first destroys the guest and then creates and boots a new one. It's just a matter of luck if you manage to list running domains during the short time in between.

Comment 4 Daniel Berrangé 2009-07-27 10:52:41 UTC
Yes this is a fundamental limitation of this XenD version. It does not have any concept of 'inactive' guests, so if the guest is not running, XenD won't report it. During reboot you have a small window between the guest shutting down & new one booting, and thus thanks to lack of inactive guest mgmt, the guest can briefly disappear.

Comment 7 Paolo Bonzini 2009-07-30 11:57:23 UTC
If it goes missing "briefly" as written in the summary and in the issue tracker, then this resembles what Jiri and Daniel said: during reboot you have a small window between the guest shutting down & new one being created, and thus the guest can disappear.  This matches the observation that it is reproducible 5% of the time only.   It is implicit in the behavior of Xen, and libvirt (virsh) fixes it.

If it never reappears, it is a different problem.  The summary should be upgraded and the xend-debug.log and xend.log files should be attached.

Comment 14 Daniel Berrangé 2009-08-04 10:35:50 UTC
There is not enough information in this bug report to further diagnose the problem. Please provide

- /var/log/xen/xend.log & xend-debug.log  from the point in time immediately after doing a 'xm reboot' that exhibits the missing domain problem
- Output of 'xm list --long'
- Output of 'xenstore-ls'
- The /etc/xen/$GUEST   config file for the guest showing problems
- The 'virsh dumpxml GUEST'  output

Comment 23 Jiri Denemark 2009-08-06 11:45:57 UTC
Hmm, I wasn't able to reproduce it even after 200 reboots of an hvm guest. Could you please try to reproduce the bug with packages from http://people.redhat.com/jdenemar/xen/bz513604/ and send xend.log after running xm list --long?

Thanks.

Comment 25 Jiri Denemark 2009-08-07 13:35:55 UTC
Thanks a lot. So the error is caused by missing entry for one of the block devices in /vm/UUID/device/vbd/:

   vbd = ""
    5632 = ""
     frontend = "/local/domain/11/device/vbd/5632"
     frontend-id = "11"
     backend-id = "0"
     backend = "/local/domain/0/backend/vbd/11/5632"

768 is missing in there.

In the previous report, it was the cdrom (5632) which was missing. Oops and vif got lost during restarts, which looks almost like bug #509099.

I might have an idea why this happens... I'll report once I know it's (in)correct.

Comment 26 Jiri Denemark 2009-08-08 07:18:52 UTC
Could you try the new package from http://people.redhat.com/jdenemar/xen/bz513604/ to see if that fixes the error? And report the results and logs even if it does, please.

Comment 29 Jiri Denemark 2009-08-10 17:55:46 UTC
Thanks for the testing. Could you try yet another version of the package? http://people.redhat.com/jdenemar/xen/bz513604/

Thanks a lot.

Comment 31 Jiri Denemark 2009-08-11 16:08:28 UTC
OK, another round... Could you follow the following steps, please?

- install the new packages from http://people.redhat.com/jdenemar/xen/bz513604/ and restart xend or the whole machine

- turn on logging in xen hotplug scripts:
  # echo 'SYSLOG=yes' >>/etc/sysconfig/xenhotplug

- let udev log debugging messages:
  # udevcontrol log_priority=debug

- let syslog write all (including debugging) messages into /var/log/debug:
  # echo '*.* /var/log/debug' >>/etc/syslog.conf
  # service syslog reload

- reproduce the bug

- send me everything you normally do together with /var/log/debug

Thanks a lot

Comment 39 Jiri Denemark 2009-08-19 06:50:18 UTC
So the race condition is confirmed. As usual, the race is between hotplug scripts and xend. Under very lucky conditions hotplug-cleanup script runs early enough to see /local/domain/ID/vm and then it's delayed so that it actually removes /vm/UUID/device/CLASS/ID from the newly created domain instead of the old one.

It looks like IA64 is very lucky platform :-)

By injecting some sleeps at right places, I'm able to reproduce it locally, which should speed up things quite a bit.

Comment 43 Jiri Denemark 2009-08-20 14:17:43 UTC
Hi, could you try with the latest packages from http://people.redhat.com/jdenemar/xen/bz513604/ (xen-3.0.3-94.el5.bz513604.7)?

Thanks

Comment 45 Jiri Denemark 2009-08-24 06:41:27 UTC
Great, thank you very much for the testing.

Comment 46 Jiri Denemark 2009-08-26 09:59:42 UTC
Created attachment 358693 [details]
Fix race condition on domain restart

Comment 49 Chris Lalancette 2009-09-18 06:45:09 UTC
*** Bug 513265 has been marked as a duplicate of this bug. ***

Comment 50 Jiri Denemark 2009-09-22 09:32:10 UTC
Fix built into xen-3.0.3-95.el5

Comment 55 Yewei Shao 2009-12-28 03:27:00 UTC
I verify this bug by following steps:
(1) Create a HVM domain
(2) Reboot the domain by `xm reboot'
(3) Run `xm list'

I try this about 30 times and find that the domain will no more missing from xm list when rebooted. So this bug is verified in xen-3.0.3-102.el5.

Comment 57 errata-xmlrpc 2010-03-30 08:57:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0294.html

Comment 58 Paolo Bonzini 2010-04-08 15:41:23 UTC
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).


Note You need to log in before you can comment on or make changes to this bug.