Created attachment 1072371 [details]
virsh command log with debugging
Dan's log file.
I've not figured out why this is broken, but I have narrowed it down to the Xen XM driver code. The guest is not running so Xend and XenStore report no info, so the name lookup moves onto the XM driver code. I can see the guest being initially loaded, but when it comes todo the lookup it non-deterministically fails. I can't tell what's broken in the XM driver code though.
This was indeed a nasty issue, only apparent because this test machine has a hell of a lot of guests defined in /etc/xen and is quite slow at loading them
Author: Daniel P. Berrange <email@example.com>
Date: Fri Sep 11 14:15:50 2015 +0100
xen: fix race in refresh of config cache
The xenXMConfigCacheRefresh method scans /etc/xen and loads
all config files it finds. It then scans its internal hash
table and purges any (previously) loaded config files whose
refresh timestamp does not match the timestamp recorded at
the start of xenXMConfigCacheRefresh(). There is unfortunately
a subtle flaw in this, because if loading the config files
takes longer than 1 second, some of the config files will
have a refresh timestamp that is 1 or more seconds different
(newer) than is checked for. So we immediately purge a bunch
of valid config files we just loaded.
To avoid this flaw, we must pass the timestamp we record at
the start of xenXMConfigCacheRefresh() into the
xenXMConfigCacheAddFile() method, instead of letting the
latter call time(NULL) again.
Signed-off-by: Daniel P. Berrange <berrange redhat com>
This bug would affect virt-v2v users on RHEL 7.2 who are
converting guests from RHEL 5 Xen legacy systems (to modern KVM).
It only affects slower systems with lots of Xen guests, and can
usually be worked around by repeating the v2v command.
I think we should wait to see if a customer hits this bug. So
far we have only seen it on our own machines, and to my knowledge
no customer has hit it.
Red Hat Enterprise Linux 5 shipped it's last minor release, 5.11, on September 14th, 2014. On March 31st, 2017 RHEL 5 exited Production Phase 3 and entered Extended Life Phase. For RHEL releases in the Extended Life Phase, Red Hat will provide limited ongoing technical support. No bug fixes, security fixes, hardware enablement or root-cause analysis will be available during this phase, and support will be provided on existing installations only. If the customer purchases the Extended Life-cycle Support (ELS), certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release will be provided. For more details please consult the Red Hat Enterprise Linux Life Cycle Page:
This BZ does not appear to meet ELS criteria so is being closed WONTFIX. If this BZ is critical for your environment and you have an Extended Life-cycle Support Add-on entitlement, please open a case in the Red Hat Customer Portal, https://access.redhat.com ,provide a thorough business justification and ask that the BZ be re-opened for consideration of an errata. Please note, only certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release can be considered.