Bug 707264

Summary: [vdsm][storage]race condition on _recoverExistingVms
Product: Red Hat Enterprise Linux 6 Reporter: Moran Goldboim <mgoldboi>
Component: vdsmAssignee: Igor Lvovsky <ilvovsky>
Status: CLOSED ERRATA QA Contact: Moran Goldboim <mgoldboi>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2CC: abaron, bazulay, iheim, lpeer, tdosek, ykaul
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.9-72.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 07:19:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm log none

Description Moran Goldboim 2011-05-24 14:27:07 UTC
Created attachment 500614 [details]
vdsm log

Description of problem:
during _recoverExistingVms there is 5 seconds sleep that if during this sleep time vm is migrated/destroyed there is an unhanded exception which preventing all the other vms to connect.
while self._enabled and self.vmContainer and \
                  not self.irs.getConnectedStoragePoolsList()['poollist']:
                time.sleep(5)

            for vmId in self.vmContainer.keys():
                # Do not prepare volumes when system goes down
                if self._enabled:
                    self.vmContainer[vmId].preparePaths()

Thread-662::DEBUG::2011-05-24 16:15:18,221::clientIF::49::vds::(wrapper) return getVmStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'timeOffset': '0', 'vmId': '7a61d6c8-3df2-4551-b4c4-69260affc29f', 'exitMessage': 'Migration succeeded', 'exitCode': 0}]}
Thread-551::INFO::2011-05-24 16:15:18,224::libvirtvm::228::vm.Vm::(run) vmId=`18c4eaa5-ef78-44f8-b31b-f4a64a332d45`::Migration Progress: 10 seconds elapsed, 63% of data processed, 63% of mem processed
Thread-551::INFO::2011-05-24 16:15:18,236::libvirtvm::228::vm.Vm::(run) vmId=`18c4eaa5-ef78-44f8-b31b-f4a64a332d45`::Migration Progress: 10 seconds elapsed, 63% of data processed, 63% of mem processed
Thread-663::DEBUG::2011-05-24 16:15:18,241::clientIF::44::vds::(wrapper) [10.16.144.114]::call destroy with ('7a61d6c8-3df2-4551-b4c4-69260affc29f',) {}
Thread-663::INFO::2011-05-24 16:15:18,242::clientIF::443::vds::(destroy) vmContainerLock aquired by vm 7a61d6c8-3df2-4551-b4c4-69260affc29f
Thread-663::DEBUG::2011-05-24 16:15:18,243::libvirtvm::1160::vm.Vm::(destroy) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::destroy Called
Thread-663::INFO::2011-05-24 16:15:18,244::libvirtvm::1123::vm.Vm::(releaseVm) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::Release VM resources
Thread-663::WARNING::2011-05-24 16:15:18,244::vm::552::vm.Vm::(_set_lastStatus) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::trying to set state to Powering down when already Down
Thread-663::DEBUG::2011-05-24 16:15:18,245::utils::471::vm.Vm::(stop) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::Stop statistics collection

clientIFinit::ERROR::2011-05-24 16:20:06,951::clientIF::1183::vds::(_recoverExistingVms) Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 1181, in _recoverExistingVms
    self.vmContainer[vmId].preparePaths()
KeyError: '7a61d6c8-3df2-4551-b4c4-69260affc29f'



Version-Release number of selected component (if applicable):
vdsm-4.9-67.el6.x86_64

How reproducible:


Steps to Reproduce:
1.restart vdsm and migrate or destroy vm during the 5 seconds sleep (will be hard to reproduce)
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 3 Igor Lvovsky 2011-05-26 07:55:17 UTC
http://gerrit.usersys.redhat.com/#change,485

Comment 5 Tomas Dosek 2011-07-18 06:57:54 UTC
Verified - vdsm-4.9-81.el6 - exeption raised in Moran's scenario is now handled correctly, user gets warning about a network exception in communicating with host, vm's don't try to migrate/destroy. Above described scenario therefore no longer reproduces.

Comment 6 errata-xmlrpc 2011-12-06 07:19:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html