Created attachment 1072371 [details] virsh command log with debugging Dan's log file.
I've not figured out why this is broken, but I have narrowed it down to the Xen XM driver code. The guest is not running so Xend and XenStore report no info, so the name lookup moves onto the XM driver code. I can see the guest being initially loaded, but when it comes todo the lookup it non-deterministically fails. I can't tell what's broken in the XM driver code though.
This was indeed a nasty issue, only apparent because this test machine has a hell of a lot of guests defined in /etc/xen and is quite slow at loading them https://www.redhat.com/archives/libvir-list/2015-September/msg00432.html commit 4e7028a83d9932e89fb552b40221ecd844cbd690 Author: Daniel P. Berrange <berrange> Date: Fri Sep 11 14:15:50 2015 +0100 xen: fix race in refresh of config cache The xenXMConfigCacheRefresh method scans /etc/xen and loads all config files it finds. It then scans its internal hash table and purges any (previously) loaded config files whose refresh timestamp does not match the timestamp recorded at the start of xenXMConfigCacheRefresh(). There is unfortunately a subtle flaw in this, because if loading the config files takes longer than 1 second, some of the config files will have a refresh timestamp that is 1 or more seconds different (newer) than is checked for. So we immediately purge a bunch of valid config files we just loaded. To avoid this flaw, we must pass the timestamp we record at the start of xenXMConfigCacheRefresh() into the xenXMConfigCacheAddFile() method, instead of letting the latter call time(NULL) again. Signed-off-by: Daniel P. Berrange <berrange redhat com>
This bug would affect virt-v2v users on RHEL 7.2 who are converting guests from RHEL 5 Xen legacy systems (to modern KVM). It only affects slower systems with lots of Xen guests, and can usually be worked around by repeating the v2v command. I think we should wait to see if a customer hits this bug. So far we have only seen it on our own machines, and to my knowledge no customer has hit it.
Red Hat Enterprise Linux 5 shipped it's last minor release, 5.11, on September 14th, 2014. On March 31st, 2017 RHEL 5 exited Production Phase 3 and entered Extended Life Phase. For RHEL releases in the Extended Life Phase, Red Hat will provide limited ongoing technical support. No bug fixes, security fixes, hardware enablement or root-cause analysis will be available during this phase, and support will be provided on existing installations only. If the customer purchases the Extended Life-cycle Support (ELS), certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release will be provided. For more details please consult the Red Hat Enterprise Linux Life Cycle Page: https://access.redhat.com/support/policy/updates/errata This BZ does not appear to meet ELS criteria so is being closed WONTFIX. If this BZ is critical for your environment and you have an Extended Life-cycle Support Add-on entitlement, please open a case in the Red Hat Customer Portal, https://access.redhat.com ,provide a thorough business justification and ask that the BZ be re-opened for consideration of an errata. Please note, only certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release can be considered.