Description of problem: There's a discrepancy between xm and virsh output when listing active Xen domains. 'xm list' lists all the active domains. But 'virsh list' does not. Version-Release number of selected component (if applicable): * Red Hat Enterprise Linux 5.4 (2.6.18-164.2.1.el5xen) * xen-3.0.3-94.el5_4.2 * libvirt-0.6.3-20.1.el5_4 How reproducible: Consistently Steps to Reproduce: 1. Intentionally misconfigure a domU to panic on boot (I did this by removing the last character of the LV name in the 'root' arg of the domU's kernel command line). 2. Make sure the domain config file for the domU is set to reboot after a crash. 3. Start the domU. This should create an endless reboot/crash cycle.I was able to reproduce this by having 3 guests crashing roughly every 20 seconds The domU may restart so fast that the dom0 may refuse to restart it.I modified the init file in the initrd so that it would pause for 20 seconds before trying to scan for lvm volumes, and also removed the lvm bits so that it would panic when trying to do the scan. That prevents the VM from trying to restart too quickly. [2010-03-18 09:59:18 xend.XendDomainInfo 7826] ERROR (XendDomainInfo:2654) VM rhel5u4 restarting too fast (5.831434 seconds since the last restart). Refusing to restart to avoid loops. Actual results: [root@sun-x4600-1 ~]# virsh list; xm list Id Name State ---------------------------------- 0 Domain-0 running 3 it493693a idle 4 it493693b idle 5 it493693c idle 45 it493693g idle Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 28045 16 r----- 1334.2 it493693a 3 511 1 -b---- 15.6 it493693b 4 511 1 -b---- 16.2 it493693c 5 511 1 -b---- 17.5 it493693d 79 512 1 -b---- 0.3 it493693e 80 512 1 -b---- 0.3 it493693f 78 512 1 -b---- 0.3 it493693g 45 511 1 -b---- 21.5 Domains it493693{d,e,f} are missing from the 'virsh list' output, but are present in the 'xm list' output. Expected results: There should be no discrepancy between xm and virsh output when listing active Xen domains. Additional info: Following patch seems to have resolved the issue http://libvirt.org/git/?p=libvirt.git;a=commit;h=7c34bb2681f48e2a4e76de7f54c58b630a861777 From xenStoreDoListDomains() in xs_internal.c: /* Sometimes xenstore has stale domain IDs, so filter against the hypervisor's info */ if (xenHypervisorHasDomain(conn, (int)id)) ids[ret++] = (int) id;
I am not convinced that this quoted changeset: http://libvirt.org/git/?p=libvirt.git;a=commit;h=7c34bb2681f48e2a4e76de7f54c58b630a861777 can actually fix the issue described in the initial report, on its own. That changeset would generally make libvirt report *fewer* guests, making it even less likely to match 'xm' output. I believe it probably needs to be combined with commit 2659b3f5aab0e64c150a8fb4e656aa7f4f4a91ed Author: Jonas Eriksson <jonas.j.eriksson> Date: Fri Oct 9 10:23:23 2009 +0100 Fix logic in xenUnifiedNumOfDomains to match xenUnifiedListDomains The xenUnifiedNumOfDomains and xenUnifiedListDomains methods work together as a pair, so it is critical they both apply the same logic. With the current mis-matched logic it is possible to sometimes get into a state when you miss certain active guests. * src/xen/xen_driver.c: Change xenUnifiedNumOfDomains ordering to match xenUnifiedListDomains.
Good, I was able to reproduce this issue. Luckily enough, once there is a difference between virsh list and xm list, it doesn't disappear and once can just install different versions of libvirt (no need to even restart libvirtd) to test if they fix the issue or not. If yes, the discrepancy disappears and after installing an older libvirt without the fix, the issue can be observed again. The funny thing is that both change sets (the one mentioned in bug report and the one mentioned by Dan) fix the issue...
Fixed in libvirt-0.8.2-1.el5
*** Bug 514902 has been marked as a duplicate of this bug. ***
For reproduce steps, pls refer to https://bugzilla.redhat.com/show_bug.cgi?id=618200#c9
Verified the bug PASSED with libvirt-0.8.2-8.el5 on Server-x86_64-xen, Client-i386-xen, Server-ia64-xen. 1. Turn on dump switch in /etc/xen/xend-config.sxp, and restart xend service. (enable-dump yes) 2. on guest of pv, removing the last character of the LV name in the 'root' arg of the domU's kernel command line 3. on xml, <on_crash>restart</on_crash> 4. # virsh list; xm list Id Name State ---------------------------------- 0 Domain-0 running 41 crash2 idle 42 crash1 idle Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 4425 4 r----- 2885.2 crash1 42 1024 1 -b---- 0.6 crash2 41 1024 1 -b---- 0.6
Verified on RHEL5u6-Client-i386-xen and it passed: kernel-2.6.18-228.el5xen libvirt-0.8.2-9.el5 xen-3.0.3-117.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0060.html