Bug 593120

Summary: Discrepancy between xm and virsh output when listing active Xen domains
Product: Red Hat Enterprise Linux 5 Reporter: Nandini Chandra <nachandr>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.4CC: berrange, dallan, dyuan, herrold, jdenemar, jwest, llim, samuel.kielek, smayhew, tao, virt-maint, weizhan, xen-maint, xhu, yoyzhang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.8.2-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 23:12:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 618200    

Description Nandini Chandra 2010-05-17 21:03:37 UTC
Description of problem:
There's a discrepancy  between xm and virsh output when listing active Xen domains. 'xm list' lists all the active domains. But 'virsh list' does not.

Version-Release number of selected component (if applicable):
    * Red Hat Enterprise Linux 5.4 (2.6.18-164.2.1.el5xen)
    * xen-3.0.3-94.el5_4.2
    * libvirt-0.6.3-20.1.el5_4


How reproducible:
Consistently


Steps to Reproduce:
1. Intentionally misconfigure a domU to panic on boot (I did this by removing the last character of the LV name in the 'root' arg of the domU's kernel command line).
2. Make sure the domain config file for the domU is set to reboot after a crash.
3. Start the domU.

This should create an endless reboot/crash cycle.I was able to reproduce this  by having 3 guests crashing roughly every 20 seconds

The domU may restart so fast that the dom0 may refuse to restart it.I modified the init file in the initrd so that it would pause for 20 seconds before trying to scan for lvm volumes, and also removed the lvm bits so that it would panic when trying to do the scan.  That prevents the VM from trying to restart too quickly.

[2010-03-18 09:59:18 xend.XendDomainInfo 7826] ERROR (XendDomainInfo:2654) VM rhel5u4 restarting too fast (5.831434 seconds since the last restart).  Refusing to restart to avoid loops.


  
Actual results:
[root@sun-x4600-1 ~]# virsh list; xm list
Id Name                 State
----------------------------------
 0 Domain-0             running
 3 it493693a            idle
 4 it493693b            idle
 5 it493693c            idle
45 it493693g            idle

Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0    28045    16 r-----   1334.2
it493693a                                  3      511     1 -b----     15.6
it493693b                                  4      511     1 -b----     16.2
it493693c                                  5      511     1 -b----     17.5
it493693d                                 79      512     1 -b----      0.3
it493693e                                 80      512     1 -b----      0.3
it493693f                                 78      512     1 -b----      0.3
it493693g                                 45      511     1 -b----     21.5

Domains it493693{d,e,f} are missing from the 'virsh list' output, but are present in the 'xm list' output.


Expected results:
There should be no discrepancy  between xm and virsh output when listing active Xen domains.


Additional info:
Following patch seems to have resolved the issue

http://libvirt.org/git/?p=libvirt.git;a=commit;h=7c34bb2681f48e2a4e76de7f54c58b630a861777

 From xenStoreDoListDomains() in xs_internal.c:

       /* Sometimes xenstore has stale domain IDs, so filter
          against the hypervisor's info */
       if (xenHypervisorHasDomain(conn, (int)id))
           ids[ret++] = (int) id;

Comment 6 Daniel Berrangé 2010-07-12 16:50:23 UTC
I am not convinced that this quoted changeset:

http://libvirt.org/git/?p=libvirt.git;a=commit;h=7c34bb2681f48e2a4e76de7f54c58b630a861777


can actually fix the issue described in the initial report, on its own. That changeset would generally make libvirt report *fewer* guests, making it even less likely to match 'xm' output.  I believe it probably needs to be combined with 

commit 2659b3f5aab0e64c150a8fb4e656aa7f4f4a91ed
Author: Jonas Eriksson <jonas.j.eriksson>
Date:   Fri Oct 9 10:23:23 2009 +0100

    Fix logic in xenUnifiedNumOfDomains to match xenUnifiedListDomains
    
    The xenUnifiedNumOfDomains and xenUnifiedListDomains methods work
    together as a pair, so it is critical they both apply the same
    logic. With the current mis-matched logic it is possible to sometimes
    get into a state when you miss certain active guests.
    
    * src/xen/xen_driver.c: Change xenUnifiedNumOfDomains ordering to
      match xenUnifiedListDomains.

Comment 7 Jiri Denemark 2010-07-23 14:56:56 UTC
Good, I was able to reproduce this issue. Luckily enough, once there is a difference between virsh list and xm list, it doesn't disappear and once can just install different versions of libvirt (no need to even restart libvirtd) to test if they fix the issue or not. If yes, the discrepancy disappears and after installing an older libvirt without the fix, the issue can be observed again.

The funny thing is that both change sets (the one mentioned in bug report and the one mentioned by Dan) fix the issue...

Comment 9 Jiri Denemark 2010-09-02 11:59:13 UTC
Fixed in libvirt-0.8.2-1.el5

Comment 13 Jiri Denemark 2010-10-19 10:15:51 UTC
*** Bug 514902 has been marked as a duplicate of this bug. ***

Comment 14 dyuan 2010-10-20 08:54:14 UTC
For reproduce steps, pls refer to https://bugzilla.redhat.com/show_bug.cgi?id=618200#c9

Comment 15 dyuan 2010-10-26 11:14:52 UTC
Verified the bug PASSED with libvirt-0.8.2-8.el5 on Server-x86_64-xen, Client-i386-xen, Server-ia64-xen.

1. Turn on dump switch in /etc/xen/xend-config.sxp, and restart xend service.
(enable-dump yes)
2. on guest of pv, removing the last character of the LV name in the 'root' arg of the domU's kernel command line
3. on xml, <on_crash>restart</on_crash>
4. # virsh list; xm list

 Id Name                 State
----------------------------------
  0 Domain-0             running
 41 crash2               idle
 42 crash1               idle

Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     4425     4 r-----   2885.2
crash1                                    42     1024     1 -b----      0.6
crash2                                    41     1024     1 -b----      0.6

Comment 16 xhu 2010-10-29 09:04:18 UTC
Verified on RHEL5u6-Client-i386-xen and it passed:
kernel-2.6.18-228.el5xen
libvirt-0.8.2-9.el5
xen-3.0.3-117.el5

Comment 18 errata-xmlrpc 2011-01-13 23:12:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0060.html