Bug 783746

Summary: [ovirt] [vdsm] race during vm-recovery flow (machines are stuck on wait for lunch)
Product: [Retired] oVirt Reporter: Haim <hateya>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED NOTABUG QA Contact: Haim <hateya>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, bazulay, danken, iheim, mgoldboi, ohochman, yeylon, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-10 09:55:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Haim 2012-01-22 07:36:18 UTC
Description of problem:


- running a script that starts and stop 50 vms
- during its operation, restart vdsm service

some of the machines are stuck on wait-for-lunch: 

I get the following error: 

Thread-14::INFO::2012-01-22 02:19:37,571::vm::538::vm.Vm::(_startUnderlyingVm) vmId=`30a01eab-f584-436b-aa74-0bb2a61f51b8`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 531, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1255, in _run
    self._domDependentInit()
  File "/usr/share/vdsm/libvirtvm.py", line 1156, in _domDependentInit
    self._getUnderlyingVmDevicesInfo()
  File "/usr/share/vdsm/libvirtvm.py", line 1137, in _getUnderlyingVmDevicesInfo
    self._getUnderlyingDriveInfo()
  File "/usr/share/vdsm/libvirtvm.py", line 1662, in _getUnderlyingDriveInfo
    if d.path == devPath:
AttributeError: 'Drive' object has no attribute 'path

Comment 1 Haim 2012-01-22 08:05:28 UTC
another instance of the problem:

Thread-15::INFO::2012-01-22 03:00:07,088::vm::538::vm.Vm::(_startUnderlyingVm) vmId=`30a01eab-f584-436b-aa74-0bb2a61f51b8`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 531, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1214, in _run
    self._devices[devType].append(devClass(self.conf, self.log, **dev))
  File "/usr/share/vdsm/libvirtvm.py", line 976, in __init__
    self.name = self._makeName()
  File "/usr/share/vdsm/libvirtvm.py", line 993, in _makeName
    i = int(self.index)
AttributeError: 'Drive' object has no attribute 'index'

Comment 2 Omri Hochman 2012-05-09 11:38:48 UTC
Issue reproduced on rhev-m 3.1 downstream  build (si3), vdsm-4.9.6-8.

After restart vdsmd service - None of the VM's remained in status 'wait-for-lunch', some of the VM's were stuck in status 'Pause' and couldn't be re-initiate.
 
The following Error occurred when attempted to re-initiate a Paused VM:

Thread-429::DEBUG::2012-05-09 14:34:17,938::BindingXMLRPC::864::vds::(wrapper) client [10.35.3.233]::call vmCont with ('e44571a6-2f26-4f29-801d-1c534884ec20',) {} flowID [1efcc1d8]
Thread-429::ERROR::2012-05-09 14:34:17,944::BindingXMLRPC::879::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 869, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 210, in vmCont
    return vm.cont()
  File "/usr/share/vdsm/API.py", line 121, in cont
    return v.cont()
  File "/usr/share/vdsm/vm.py", line 820, in cont
    self._underlyingCont()
  File "/usr/share/vdsm/libvirtvm.py", line 1594, in _underlyingCont
    hooks.before_vm_cont(self._dom.XMLDesc(0), self.conf)
AttributeError: 'NoneType' object has no attribute 'XMLDesc'

Comment 3 Dan Kenigsberg 2012-05-10 09:55:22 UTC
Haim Ateya: 783746 can be closed and new bug should be open as we get different symptom.