Bug 593120 - Discrepancy between xm and virsh output when listing active Xen domains
Summary: Discrepancy between xm and virsh output when listing active Xen domains
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.4
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 514902 (view as bug list)
Depends On:
Blocks: 618200
TreeView+ depends on / blocked
 
Reported: 2010-05-17 21:03 UTC by Nandini Chandra
Modified: 2018-10-27 12:15 UTC (History)
15 users (show)

Fixed In Version: libvirt-0.8.2-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 23:12:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0060 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-01-12 17:22:30 UTC

Description Nandini Chandra 2010-05-17 21:03:37 UTC
Description of problem:
There's a discrepancy  between xm and virsh output when listing active Xen domains. 'xm list' lists all the active domains. But 'virsh list' does not.

Version-Release number of selected component (if applicable):
    * Red Hat Enterprise Linux 5.4 (2.6.18-164.2.1.el5xen)
    * xen-3.0.3-94.el5_4.2
    * libvirt-0.6.3-20.1.el5_4


How reproducible:
Consistently


Steps to Reproduce:
1. Intentionally misconfigure a domU to panic on boot (I did this by removing the last character of the LV name in the 'root' arg of the domU's kernel command line).
2. Make sure the domain config file for the domU is set to reboot after a crash.
3. Start the domU.

This should create an endless reboot/crash cycle.I was able to reproduce this  by having 3 guests crashing roughly every 20 seconds

The domU may restart so fast that the dom0 may refuse to restart it.I modified the init file in the initrd so that it would pause for 20 seconds before trying to scan for lvm volumes, and also removed the lvm bits so that it would panic when trying to do the scan.  That prevents the VM from trying to restart too quickly.

[2010-03-18 09:59:18 xend.XendDomainInfo 7826] ERROR (XendDomainInfo:2654) VM rhel5u4 restarting too fast (5.831434 seconds since the last restart).  Refusing to restart to avoid loops.


  
Actual results:
[root@sun-x4600-1 ~]# virsh list; xm list
Id Name                 State
----------------------------------
 0 Domain-0             running
 3 it493693a            idle
 4 it493693b            idle
 5 it493693c            idle
45 it493693g            idle

Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0    28045    16 r-----   1334.2
it493693a                                  3      511     1 -b----     15.6
it493693b                                  4      511     1 -b----     16.2
it493693c                                  5      511     1 -b----     17.5
it493693d                                 79      512     1 -b----      0.3
it493693e                                 80      512     1 -b----      0.3
it493693f                                 78      512     1 -b----      0.3
it493693g                                 45      511     1 -b----     21.5

Domains it493693{d,e,f} are missing from the 'virsh list' output, but are present in the 'xm list' output.


Expected results:
There should be no discrepancy  between xm and virsh output when listing active Xen domains.


Additional info:
Following patch seems to have resolved the issue

http://libvirt.org/git/?p=libvirt.git;a=commit;h=7c34bb2681f48e2a4e76de7f54c58b630a861777

 From xenStoreDoListDomains() in xs_internal.c:

       /* Sometimes xenstore has stale domain IDs, so filter
          against the hypervisor's info */
       if (xenHypervisorHasDomain(conn, (int)id))
           ids[ret++] = (int) id;

Comment 6 Daniel Berrangé 2010-07-12 16:50:23 UTC
I am not convinced that this quoted changeset:

http://libvirt.org/git/?p=libvirt.git;a=commit;h=7c34bb2681f48e2a4e76de7f54c58b630a861777


can actually fix the issue described in the initial report, on its own. That changeset would generally make libvirt report *fewer* guests, making it even less likely to match 'xm' output.  I believe it probably needs to be combined with 

commit 2659b3f5aab0e64c150a8fb4e656aa7f4f4a91ed
Author: Jonas Eriksson <jonas.j.eriksson>
Date:   Fri Oct 9 10:23:23 2009 +0100

    Fix logic in xenUnifiedNumOfDomains to match xenUnifiedListDomains
    
    The xenUnifiedNumOfDomains and xenUnifiedListDomains methods work
    together as a pair, so it is critical they both apply the same
    logic. With the current mis-matched logic it is possible to sometimes
    get into a state when you miss certain active guests.
    
    * src/xen/xen_driver.c: Change xenUnifiedNumOfDomains ordering to
      match xenUnifiedListDomains.

Comment 7 Jiri Denemark 2010-07-23 14:56:56 UTC
Good, I was able to reproduce this issue. Luckily enough, once there is a difference between virsh list and xm list, it doesn't disappear and once can just install different versions of libvirt (no need to even restart libvirtd) to test if they fix the issue or not. If yes, the discrepancy disappears and after installing an older libvirt without the fix, the issue can be observed again.

The funny thing is that both change sets (the one mentioned in bug report and the one mentioned by Dan) fix the issue...

Comment 9 Jiri Denemark 2010-09-02 11:59:13 UTC
Fixed in libvirt-0.8.2-1.el5

Comment 13 Jiri Denemark 2010-10-19 10:15:51 UTC
*** Bug 514902 has been marked as a duplicate of this bug. ***

Comment 14 dyuan 2010-10-20 08:54:14 UTC
For reproduce steps, pls refer to https://bugzilla.redhat.com/show_bug.cgi?id=618200#c9

Comment 15 dyuan 2010-10-26 11:14:52 UTC
Verified the bug PASSED with libvirt-0.8.2-8.el5 on Server-x86_64-xen, Client-i386-xen, Server-ia64-xen.

1. Turn on dump switch in /etc/xen/xend-config.sxp, and restart xend service.
(enable-dump yes)
2. on guest of pv, removing the last character of the LV name in the 'root' arg of the domU's kernel command line
3. on xml, <on_crash>restart</on_crash>
4. # virsh list; xm list

 Id Name                 State
----------------------------------
  0 Domain-0             running
 41 crash2               idle
 42 crash1               idle

Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     4425     4 r-----   2885.2
crash1                                    42     1024     1 -b----      0.6
crash2                                    41     1024     1 -b----      0.6

Comment 16 xhu 2010-10-29 09:04:18 UTC
Verified on RHEL5u6-Client-i386-xen and it passed:
kernel-2.6.18-228.el5xen
libvirt-0.8.2-9.el5
xen-3.0.3-117.el5

Comment 18 errata-xmlrpc 2011-01-13 23:12:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0060.html


Note You need to log in before you can comment on or make changes to this bug.