Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 513604

Summary: Domain goes missing from xm list when rebooted
Product: Red Hat Enterprise Linux 5 Reporter: Sachin Prabhu <sprabhu>
Component: xenAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: urgent    
Version: 5.4CC: berrange, bmason, clalance, drjones, jdenemar, jplans, llim, mmilgram, mshao, pbonzini, riek, tao, xen-maint
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: xen-3.0.3-95.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:57:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 499522, 525143    
Attachments:
Description Flags
Fix race condition on domain restart none

Description Sachin Prabhu 2009-07-24 13:06:35 UTC
Customer reports a regression where the domain briefly goes missing from the list of domains in xm list when undergoing a reboot initiated by xm reboot. This is not seen on RHEL 5.3 kernels.
-----

Description of Problem:
 This is the REGRESSION issue. This issue does not occur on RHEL5.3GA.

 The `xm list' command may not output HVM domains if they are
 rebooted by `xm reboot'.

 Notice that this issue did not occur on PV domains.

Version-Release number of selected component:

 Red Hat Enterprise Linux Version Number: 5
 Release Number: 4 Beta
 Architecture: ia64
 Kernel Version: 2.6.18-155.el5xen
 Related Package Version: xen-3.0.3-88.el5
 Related Middleware / Application: None

Drivers or hardware or architecture dependency:
 None

How reproducible:
 1/20

Step to Reproduce:
 1. Create a HVM domain
 2. Reboot the domain by `xm reboot'
 3. Run `xm list'

Actual Results:
 Name                                      ID Mem(MiB) VCPUs State   
 Time(s)
 Domain-0                                   0      837     1 r-----    807.4

Expected Results:
 Name                                      ID Mem(MiB) VCPUs State   
 Time(s)
 Domain-0                                   0      837     1 r-----    807.4
 rhel54_74                                 25     1047     1 r-----     17.4

Summary of actions taken to resolve issue:
 None

Location of diagnostic data:
 None

Comment 3 Jiri Denemark 2009-07-27 10:33:12 UTC
I believe this must have always been there. When a guest is restarted, xend first destroys the guest and then creates and boots a new one. It's just a matter of luck if you manage to list running domains during the short time in between.

Comment 4 Daniel Berrangé 2009-07-27 10:52:41 UTC
Yes this is a fundamental limitation of this XenD version. It does not have any concept of 'inactive' guests, so if the guest is not running, XenD won't report it. During reboot you have a small window between the guest shutting down & new one booting, and thus thanks to lack of inactive guest mgmt, the guest can briefly disappear.

Comment 7 Paolo Bonzini 2009-07-30 11:57:23 UTC
If it goes missing "briefly" as written in the summary and in the issue tracker, then this resembles what Jiri and Daniel said: during reboot you have a small window between the guest shutting down & new one being created, and thus the guest can disappear.  This matches the observation that it is reproducible 5% of the time only.   It is implicit in the behavior of Xen, and libvirt (virsh) fixes it.

If it never reappears, it is a different problem.  The summary should be upgraded and the xend-debug.log and xend.log files should be attached.

Comment 14 Daniel Berrangé 2009-08-04 10:35:50 UTC
There is not enough information in this bug report to further diagnose the problem. Please provide

- /var/log/xen/xend.log & xend-debug.log  from the point in time immediately after doing a 'xm reboot' that exhibits the missing domain problem
- Output of 'xm list --long'
- Output of 'xenstore-ls'
- The /etc/xen/$GUEST   config file for the guest showing problems
- The 'virsh dumpxml GUEST'  output

Comment 23 Jiri Denemark 2009-08-06 11:45:57 UTC
Hmm, I wasn't able to reproduce it even after 200 reboots of an hvm guest. Could you please try to reproduce the bug with packages from http://people.redhat.com/jdenemar/xen/bz513604/ and send xend.log after running xm list --long?

Thanks.

Comment 25 Jiri Denemark 2009-08-07 13:35:55 UTC
Thanks a lot. So the error is caused by missing entry for one of the block devices in /vm/UUID/device/vbd/:

   vbd = ""
    5632 = ""
     frontend = "/local/domain/11/device/vbd/5632"
     frontend-id = "11"
     backend-id = "0"
     backend = "/local/domain/0/backend/vbd/11/5632"

768 is missing in there.

In the previous report, it was the cdrom (5632) which was missing. Oops and vif got lost during restarts, which looks almost like bug #509099.

I might have an idea why this happens... I'll report once I know it's (in)correct.

Comment 26 Jiri Denemark 2009-08-08 07:18:52 UTC
Could you try the new package from http://people.redhat.com/jdenemar/xen/bz513604/ to see if that fixes the error? And report the results and logs even if it does, please.

Comment 29 Jiri Denemark 2009-08-10 17:55:46 UTC
Thanks for the testing. Could you try yet another version of the package? http://people.redhat.com/jdenemar/xen/bz513604/

Thanks a lot.

Comment 31 Jiri Denemark 2009-08-11 16:08:28 UTC
OK, another round... Could you follow the following steps, please?

- install the new packages from http://people.redhat.com/jdenemar/xen/bz513604/ and restart xend or the whole machine

- turn on logging in xen hotplug scripts:
  # echo 'SYSLOG=yes' >>/etc/sysconfig/xenhotplug

- let udev log debugging messages:
  # udevcontrol log_priority=debug

- let syslog write all (including debugging) messages into /var/log/debug:
  # echo '*.* /var/log/debug' >>/etc/syslog.conf
  # service syslog reload

- reproduce the bug

- send me everything you normally do together with /var/log/debug

Thanks a lot

Comment 39 Jiri Denemark 2009-08-19 06:50:18 UTC
So the race condition is confirmed. As usual, the race is between hotplug scripts and xend. Under very lucky conditions hotplug-cleanup script runs early enough to see /local/domain/ID/vm and then it's delayed so that it actually removes /vm/UUID/device/CLASS/ID from the newly created domain instead of the old one.

It looks like IA64 is very lucky platform :-)

By injecting some sleeps at right places, I'm able to reproduce it locally, which should speed up things quite a bit.

Comment 43 Jiri Denemark 2009-08-20 14:17:43 UTC
Hi, could you try with the latest packages from http://people.redhat.com/jdenemar/xen/bz513604/ (xen-3.0.3-94.el5.bz513604.7)?

Thanks

Comment 45 Jiri Denemark 2009-08-24 06:41:27 UTC
Great, thank you very much for the testing.

Comment 46 Jiri Denemark 2009-08-26 09:59:42 UTC
Created attachment 358693 [details]
Fix race condition on domain restart

Comment 49 Chris Lalancette 2009-09-18 06:45:09 UTC
*** Bug 513265 has been marked as a duplicate of this bug. ***

Comment 50 Jiri Denemark 2009-09-22 09:32:10 UTC
Fix built into xen-3.0.3-95.el5

Comment 55 Yewei Shao 2009-12-28 03:27:00 UTC
I verify this bug by following steps:
(1) Create a HVM domain
(2) Reboot the domain by `xm reboot'
(3) Run `xm list'

I try this about 30 times and find that the domain will no more missing from xm list when rebooted. So this bug is verified in xen-3.0.3-102.el5.

Comment 57 errata-xmlrpc 2010-03-30 08:57:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0294.html

Comment 58 Paolo Bonzini 2010-04-08 15:41:23 UTC
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).