Bug 716705 - VDSM: 'Running'/'Paused' VMs failed to recover after vdsmd service restart when OS_Name='UNKNOWN'.
Summary: VDSM: 'Running'/'Paused' VMs failed to recover after vdsmd service restart wh...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Dan Kenigsberg
QA Contact: Dafna Ron
URL:
Whiteboard:
Depends On: 735816
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-26 14:28 UTC by Omri Hochman
Modified: 2014-09-04 10:24 UTC (History)
8 users (show)

Fixed In Version: vdsm-4.9-96.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 07:25:01 UTC
Target Upstream Version:


Attachments (Terms of Use)
full VDSM.log (654.50 KB, application/octet-stream)
2011-06-26 14:30 UTC, Omri Hochman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 0 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description Omri Hochman 2011-06-26 14:28:30 UTC
VDSM: 'Running'/'Paused' VMs failed to recover after vdsmd service restart when OS_Name='UNKNOWN'.

Description: 
*************
- VDSM restart itself when SPM loses connection from master storage domain,  
- In case 'there's a problem' with redhat-release and OSNAME is not RHEL or RHEV, running VMs which displayed in 'virsh -r list' will fail to recover on vdsm side after vdsmd service restart. and there will be no way to re-initiate them. 

Note:
******
I had a problem with two instance of redhat-release-server which caused the OSNAME to return 'UNKNOWN'.

[root@red-vds3 /]# virsh -r list
 Id Name                 State
----------------------------------
  1 basic_xp             paused
  2 igor_test            running


[root@red-vds3 /]# vdsClient -s 0 list  
{RETURNS EMPTY ... }


VDSM.log:

When attempting to re-initiate the VM's from within RHEVM GUI the operation failed: 
*****************************************************************************


        </features>
        <cpu match="exact">
                <model>Conroe</model>
                <topology cores="1" sockets="1" threads="1"/>
        </cpu>
</domain>

Thread-198::DEBUG::2011-06-26 16:07:10,959::vm::359::vm.Vm::(_startUnderlyingVm) vmId=`6028cc60-341a-4dd9-910c-85e804bc9d35`:
:_ongoingCreations released
Thread-198::INFO::2011-06-26 16:07:10,959::vm::383::vm.Vm::(_startUnderlyingVm) vmId=`6028cc60-341a-4dd9-910c-85e804bc9d35`::
The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 349, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 939, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/share/vdsm/libvirtconnection.py", line 59, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1353, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Requested operation is not valid: domain is already active as 'igor_test'
Thread-198::DEBUG::2011-06-26 16:07:10,964::vm::777::vm.Vm::(setDownStatus) vmId=`6028cc60-341a-4dd9-910c-85e804bc9d35`::Chan
ged state to Down: Requested operation is not valid: domain is already active as 'igor_test'
Thread-201::DEBUG::2011-06-26 16:07:11,148::clientIF::55::vds::(wrapper) [10.35.64.12]::call getVmStats with ('6028cc60-341a-
4dd9-910c-85e804bc9d35',) {}

Comment 1 Omri Hochman 2011-06-26 14:30:13 UTC
Created attachment 509965 [details]
full VDSM.log

Comment 2 Dan Kenigsberg 2011-06-27 20:59:17 UTC
UNKOWN operating system is a misconfiguration. Let's block starting VMs when that is the case.

http://git.fedorahosted.org/git/?p=vdsm.git

Comment 3 Itamar Heim 2011-06-27 21:10:58 UTC
can rhev-m pass this information, or only host-guest level info?

Comment 4 Dan Kenigsberg 2011-06-28 06:26:23 UTC
oops, pasted the wrong link to patch.

http://gerrit.usersys.redhat.com/623

Itamar, yeah, rhev-m could send vdsm a bit of information about the host vdsm is running on. But I do not see how this helps to solve possible confusions here - it only adds another point of failure. I'd like to keep this on the the host-guest level only.

Comment 7 Tomas Dosek 2011-07-15 09:49:53 UTC
Verified - vdsm-4.9-81.el6 - omri's scenario no longer reproduces. VM's successfully recover after vdsmd restart in Omri's scenario.

Comment 8 Dan Kenigsberg 2011-09-05 12:58:05 UTC
A patch relating to this bug was mistakenly included in build 96, and would be reverted in the next build. Sorry for the noise.

Comment 9 Omri Hochman 2011-09-06 09:38:19 UTC
Cannot be verify, currently blocked 735816.

Comment 10 Daniel Paikov 2011-09-25 09:57:13 UTC
Could not reproduce on 4.9-104. Closing as verified.

Comment 11 errata-xmlrpc 2011-12-06 07:25:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html


Note You need to log in before you can comment on or make changes to this bug.