Bug 220036 - PV guest install console broken with "KeyError" python traceback
PV guest install console broken with "KeyError" python traceback
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: virt-manager (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Daniel Berrange
:
: 211624 215638 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-18 11:21 EST by Stephen Tweedie
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version: RC
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-07 20:51:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
virt-manager log for that sequence, I created 3 guests one after the other (5.53 KB, text/plain)
2006-12-18 12:32 EST, Daniel Veillard
no flags Details
associated xend log for the 3 creations (35.57 KB, text/plain)
2006-12-18 12:34 EST, Daniel Veillard
no flags Details
Explicitly drop reference to virDomainPtr object (1.37 KB, patch)
2006-12-18 17:17 EST, Daniel Berrange
no flags Details | Diff

  None (edit)
Description Stephen Tweedie 2006-12-18 11:21:24 EST
Description of problem:
PV installs of either FC6 or RHEL5 guests on RHEL5 hosts are broken from
virt-manager.  The install appears to work OK, but virt-manager doesn't realise.

Version-Release number of selected component (if applicable):
virt-manager-0.2.6-4.el5
libvirt-0.1.8-10.el5
python-virtinst-0.99.0-1.el5

How reproducible:
Seems to be about 75% failure rate

Steps to Reproduce:
1. Install PV guest from virt-manager.  Doesn't seem to matter which: I've tried
both FC6 and RHEL5 20061218 (ie. both old and new PV FB code).
  
Actual results:

It attempts to bring up the console when the guest starts, but fails, leaving
the install wizard stuck on the last page, with a python error on stdout:

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/create.py", line 425, in finish
    vm = self.connection.get_vm(guest.uuid)
  File "/usr/share/virt-manager/virtManager/connection.py", line 74, in get_vm
    return self.vms[uuid]
KeyError: '67a48830-f0a6-77fa-32e7-b3e32675fd71'

However, the domain is created, and its console can be activated by hand once it
is started.

Expected results:

Console is brought up.
Comment 1 Daniel Berrange 2006-12-18 11:47:45 EST
Can you provide   

/root/.virt-manager/virt-manager.log
/var/log/xen/xend.log

Dating from a time immediately after the install failed. I want to try &
correlate the sequence of events from the 2 files based on timestamp to figure
out where the race condition is occuring...
Comment 2 Daniel Veillard 2006-12-18 12:22:40 EST
This happens if you try to create a new domain after a domain with the same name 
has been stopped. This is relatively frequent say if you gave wrong installer
informations, want to restart the install, stop the domain using the "Shutdown"
button, and then once it disapeared restart the creation process with the same
name. You will see that the UUID in the key error is the uuid that the previous
domain with the same name had.
One plausible explanation could be that:
   - virt-manager uses libvirt to see the current list of id
   - libvirt uses an hypervisor call to get the id list
   - the new id is seen by the hypervisor but the data are not fully set up 
     in xend (and for some reason they keep the old uuid somewhere)

Daniel
Comment 3 Daniel Veillard 2006-12-18 12:30:24 EST
Okay I was on slightly older versions, the fact I had the problem was 'normal'
virt-manager-0.2.5-1.el5
python-virtinst-0.98.0-1.el5.1pvfb
uploading my logs anyway

Daniel
Comment 4 Daniel Veillard 2006-12-18 12:32:20 EST
Created attachment 143915 [details]
virt-manager log for that sequence, I created 3 guests one after the other
Comment 5 Daniel Veillard 2006-12-18 12:34:07 EST
Created attachment 143916 [details]
associated xend log for the 3 creations
Comment 6 Stephen Tweedie 2006-12-18 14:05:43 EST
Also, I seem to be able to reproduce this reliably now, ONLY if there is already
a domU running when we try to install the new domain.  If only the dom0 is
running, install seems to proceed normally.
Comment 7 Daniel Berrange 2006-12-18 14:33:30 EST
Ok, this is dependant on what config settings. You need to have 'Automatically
open consoles'  set to 'For new domains' - if it is set to 'Never' or 'For all
domains', then the bug won't be visible.

Second, you have to create a domain with the same name, twice during lifetime of
virt-manager process. The second time its created, virt-manager will see the old
UUID. So this is definitely a bug somewhere with the virDomainPtr objet being
cached for too long (for ever?)
Comment 8 Stephen Tweedie 2006-12-18 16:12:11 EST
Creating it twice within the virt-manager session is not necessary: it only has
to be seen once by virt-manager to cause the problem.  If the named domain is
already running when virt-manager starts, and you then kill it and try to
reinstall, you'll see the same effect.

This seems consistent with what I've observed so far.  All instances of the bug
have occurred for me while testing installs of different guests, which meant I
was running repeated installs into the same domU name.
Comment 9 Daniel Berrange 2006-12-18 17:15:30 EST
So, after much investigation I've discovered that the problem is basically that
python wrapper around the underlying virDomainPtr object never gets released in
the python layer. Since libvirt maintains a cache of virDomainPtr objects
indexed on domain name, this means that next time you create a guest with the
same name, libvirt gives back the old cached virDomainPtr object. This has the
original domain's UUID fixed in it, rather than the new one. Thus we end up
seeing the KeyError mentioned in comment #1.

The question is why is the python layer not releasing its wrapper to the
virDomainPtr object. Well, this wrapper is in turn held in virt-manager's 
vmmDomain class. Best debugging efforts thus far indicate there is some circular
reference which is preventing python's garbage collector from releasing the
vmmDomain instance. Which in turns means the virDomainPtr instance is never
released.

Since I have been unable to find the cause of this circular reference, I'm come
up with a workaround. Explicitly set the 'vm' property of the vmmDomain object
to None. This causes the underlying virDomainPtr object to be released, even
though our higher level vmmDomain object is stuck with a circular reference.
Testing with this workaround shows the KeyError problem goes away.
Comment 10 Daniel Berrange 2006-12-18 17:17:07 EST
Created attachment 143951 [details]
Explicitly drop reference to virDomainPtr object

Nasty hack / workaround to ensure  virDomainPtr object is released.
Comment 11 Daniel Berrange 2006-12-19 16:22:32 EST
*** Bug 211624 has been marked as a duplicate of this bug. ***
Comment 12 RHEL Product and Program Management 2007-01-02 03:40:47 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 13 Daniel Berrange 2007-01-05 09:07:40 EST
*** Bug 215638 has been marked as a duplicate of this bug. ***
Comment 18 Daniel Berrange 2007-01-09 14:11:18 EST
Built with brew:

$ brew latest-pkg dist-5E virt-manager
Build                                     Tag                   Built by
----------------------------------------  --------------------  ----------------
virt-manager-0.2.6-7.el5                  dist-5E               berrange



* Tue Jan  9 2007 Daniel P. Berrange <berrange@redhat.com> - 0.2.6-7.el5
- Explicitly drop the libvirt virDomainPtr object when a guest shuts down
  to avoid hanging onto objects in libvirt cache (bz 220036)
Comment 19 RHEL Product and Program Management 2007-02-07 20:51:20 EST
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.

Note You need to log in before you can comment on or make changes to this bug.