This is still happening with EL5.1 beta. When I reboot my dom0 machine, or simply run "service xendomains stop", the output for when saving the domains is confusing and ugly : Output indicates that processes have been terminated and some devices aren't connected, while in fact things are (usually) working fine. The terminated processes seem to be the watchdog processes. Sample output : [root@x195 ~]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 1660 8 r----- 36.7 b01 1 255 2 -b---- 5.3 emd01 2 2047 4 -b---- 6.0 mc03 3 2047 4 -b---- 5.7 w03 5 2047 4 -b---- 7.3 [root@x195 ~]# service xendomains stop Shutting down Xen domains: b01(save).......Error: Device 0 not connected ./etc/init.d/xendomains: line 181: 6152 Terminated watchdog_xm save emd01(save)/etc/init.d/xendomains: line 299: 6152 Terminated watchdog_xm save ......................................................Error: Device 0 not connected ./etc/init.d/xendomains: line 181: 6661 Terminated watchdog_xm save mc03(save)/etc/init.d/xendomains: line 299: 6661 Terminated watchdog_xm save ......................................................../etc/init.d/xendomains: line 181: 8674 Terminated watchdog_xm save w03(save)/etc/init.d/xendomains: line 299: 8674 Terminated watchdog_xm save ...........................................................Error: Device 0 not connected ../etc/init.d/xendomains: line 181: 10304 Terminated watchdog_xm save /etc/init.d/xendomains: line 299: 10304 Terminated watchdog_xm save [root@x195 ~]# [ OK ] Restoring saved domains isn't much better : [root@x195 ~]# service xendomains start Restoring Xen domains: b01 emd01 lost+foundError: Restore failed Usage: xm restore <CheckpointFile> Restore a domain from a saved state. ! mc03 w03. [root@x195 ~]# en domains: b01(skip) emd01(skip) mc03(skip)[ OK ]p) Not to mention that it doesn't even try to skip "lost+found", which is a directory, since I have /var/lib/xen/save on its own partition to make sure I'll always have enough free space to save all of the running domains (should I open another bug report for it?).
Seeing the same with final 5.1. # rpm -q xen xen-3.0.3-41.el5
Created attachment 332019 [details] Fix Xendomains output formatting This is patch to format output of Xendomains script to provide information to users about entire progress. Michal
A test package which fixes this issue (and several others as well) has been made available at: http://people.redhat.com/jdenemar/xen/ Could the reporter try it out and report if it fixes the problem or not? Thank you for your cooperation.
Looking much better now, but with the try I made, the "OK" was printed on the same line as my shell prompt : Shutting down Xen domains: genvm01(save)........ mcp01(save)............................. mcp04(save)............................... mcp07(save)................................ scanvm02(save)......... scanvm03(save)................. [root@x133 tmp]# [ OK ] The above is only 2 lines, the "Shutting down..." one then the prompt+OK one. Seems like the fix might now be as simple as calling "echo" after success instead of before. Thanks for looking at this!
Created attachment 333514 [details] New version of xendomain patch Ok, I've created some a new version of this patch, hopefully matthias will be happy ;) Anyway I've found several strange issues when shutting down all domains (SHUTDOWN_ALL code) when shutdown was taking very long time and finally it ended with result of no error but the return code was not 0. This was some `xm shutdown` issue because when I tried to run `xm shutdown` itself it still showing this domain in `xm list` but the domain seemed to be dead (no output, not accessible via ssh and only "Domain has shutdown: name=RHEL53 id=6 reason=suspend." line in /var/log/xen/xend.log, nothing more). Anyway I've tried to investigate this behaviour but I still don't know what the cause is so I am still investigating.
Created attachment 357401 [details] New version of this formatting patch Right, this is the new version of Xendomains output formatting patch to match upstream. Michal
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0294.html
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).