Bug 1035484

Summary: Manager VM disappears unexpectedly during install
Product: Red Hat Enterprise Virtualization Manager Reporter: Andrew Dingman <adingman>
Component: ovirt-hosted-engine-setupAssignee: Sandro Bonazzola <sbonazzo>
Status: CLOSED ERRATA QA Contact: movciari
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: acathrow, bazulay, danken, dfediuck, iheim, lpeer, ofrenkel, pstehlik, sbonazzo, yeylon
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: cgconfig was not started causing libvirt to fail, this is not covered by libvirt for some reason. Consequence: libvirt is unusable. Fix: start cgconfig if exists. Result: libvirt is usable.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 16:57:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Dingman 2013-11-27 21:40:23 UTC
Description of problem:

When installing the OS and RHEV Manager, the virtual machine terminates unexpectedly. It may happen in either the OS install or the engine install phases.

When it happens in the OS install phase, trying again with the hosted-engine commands you are prompted for works. When it happens in the engine-install phase, the VM state appears to become inconsistent. --vm-start and --vm-poweroff both error out.


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.0.0-0.9.beta4.el6ev.noarch from is24.2

How reproducible:
About 50% in OS install phase. 100% in 'yum install rhevm'

Steps to Reproduce:
1. run hosted-engine --deploy
2. choose a vnc console and a cdrom install
3. Attempt to install. If the VM disappears, use ps to confirm no running kvm or qemu process
4. Follow prompts from hosted-engine --deploy to re-start install
5. If OS install succeeds, proceed to install 'rhevm'. Out of four attempts none have succeeded for me.

Actual results:

VM process dies unexpectedly

Expected results:

VM continues to run while I install the engine.
Additional info:

VM always seems to die durring IO intensive operations - the package install phase of the OS installation, or about halfway through the "installing" phase of 'yum install rhevm'

Comment 3 Dan Kenigsberg 2013-12-05 14:53:38 UTC
Sandro, why do you consider the VM as ever being Up?
I do see that vmSetTicket was sent before Vdsm was notified by libvirt about the new VM being up. Vdsm must not report Up before _domDependentInit has finished.

Thread-131::DEBUG::2013-11-27 12:57:31,964::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmSetTicket with ('8843865f-f656-4db2-9751-8c2ed203e781', '2136oXkr', '10800', 'disconnect', {}) {}
Thread-131::ERROR::2013-11-27 12:57:31,965::BindingXMLRPC::1003::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 240, in vmSetTicket
    return vm.setTicket(password, ttl, existingConnAction, params)
  File "/usr/share/vdsm/API.py", line 592, in setTicket
    return v.setTicket(password, ttl, existingConnAction, params)
  File "/usr/share/vdsm/vm.py", line 4303, in setTicket
    graphics = _domParseStr(self._dom.XMLDesc(0)).childNodes[0]. \
AttributeError: 'NoneType' object has no attribute 'XMLDesc'
libvirtEventLoop::DEBUG::2013-11-27 12:50:27,985::vm::4918::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`8843865f-f656-4db2-9751-8c2ed203e781`::event Started detail 0 opaque None

I suspect there's a bug in the client (you should wait for Up before you set a ticket).

Comment 4 Sandro Bonazzola 2013-12-05 14:59:35 UTC
(In reply to Dan Kenigsberg from comment #3)
> Sandro, why do you consider the VM as ever being Up?
> I do see that vmSetTicket was sent before Vdsm was notified by libvirt about
> the new VM being up. Vdsm must not report Up before _domDependentInit has
> finished.

Well, the client call setTicket until it return with success (for a max number of tries, 10 IIRC)

Maybe it would be better to poll the VM Status and then send setTicket.

Can you open a BZ specifying the correct sequence? Thanks!

However, this bug is about a different issue:

Thread-55::DEBUG::2013-11-27 12:50:43,106::libvirtconnection::108::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-55::ERROR::2013-11-27 12:50:43,106::sampling::355::vm.Vm::(collect) vmId=`8843865f-f656-4db2-9751-8c2ed203e781`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x19f6df8>
Traceback (most recent call last):
  File "/usr/share/vdsm/sampling.py", line 351, in collect
    statsFunction()
  File "/usr/share/vdsm/sampling.py", line 226, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/vm.py", line 522, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/vm.py", line 824, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1868, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted

Comment 5 Omer Frenkel 2013-12-10 07:54:44 UTC
(In reply to Sandro Bonazzola from comment #4)
> (In reply to Dan Kenigsberg from comment #3)
> > Sandro, why do you consider the VM as ever being Up?
> > I do see that vmSetTicket was sent before Vdsm was notified by libvirt about
> > the new VM being up. Vdsm must not report Up before _domDependentInit has
> > finished.
> 
> Well, the client call setTicket until it return with success (for a max
> number of tries, 10 IIRC)
> 
> Maybe it would be better to poll the VM Status and then send setTicket.
> 

moving back to sla as this originates in the hosted engine scripts

Comment 6 Sandro Bonazzola 2013-12-16 07:54:37 UTC
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted

Seems that libvirtd is still missing the dependency on cgconfig service.
I will fix this adding cgconfig as dependency and will start that service before starting libvirtd.




(In reply to Dan Kenigsberg from comment #3)
> Sandro, why do you consider the VM as ever being Up?
> I do see that vmSetTicket was sent before Vdsm was notified by libvirt about
> the new VM being up. Vdsm must not report Up before _domDependentInit has
> finished.
> 
> Thread-131::DEBUG::2013-11-27
> 12:57:31,964::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call
> vmSetTicket with ('8843865f-f656-4db2-9751-8c2ed203e781', '2136oXkr',
> '10800', 'disconnect', {}) {}
> Thread-131::ERROR::2013-11-27
> 12:57:31,965::BindingXMLRPC::1003::vds::(wrapper) unexpected error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper
>     res = f(*args, **kwargs)
>   File "/usr/share/vdsm/BindingXMLRPC.py", line 240, in vmSetTicket
>     return vm.setTicket(password, ttl, existingConnAction, params)
>   File "/usr/share/vdsm/API.py", line 592, in setTicket
>     return v.setTicket(password, ttl, existingConnAction, params)
>   File "/usr/share/vdsm/vm.py", line 4303, in setTicket
>     graphics = _domParseStr(self._dom.XMLDesc(0)).childNodes[0]. \
> AttributeError: 'NoneType' object has no attribute 'XMLDesc'
> libvirtEventLoop::DEBUG::2013-11-27
> 12:50:27,985::vm::4918::vm.Vm::(_onLibvirtLifecycleEvent)
> vmId=`8843865f-f656-4db2-9751-8c2ed203e781`::event Started detail 0 opaque
> None
> 
> I suspect there's a bug in the client (you should wait for Up before you set
> a ticket).

Please open a separate bug for this issue.

Comment 7 Sandro Bonazzola 2013-12-16 13:24:22 UTC
patch merged on upstream master and 1.0 branches

Comment 9 errata-xmlrpc 2014-01-21 16:57:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0083.html