Bug 707264 - [vdsm][storage]race condition on _recoverExistingVms
Summary: [vdsm][storage]race condition on _recoverExistingVms
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Igor Lvovsky
QA Contact: Moran Goldboim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-24 14:27 UTC by Moran Goldboim
Modified: 2013-03-01 04:53 UTC (History)
6 users (show)

Fixed In Version: vdsm-4.9-72.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 07:19:15 UTC
Target Upstream Version:


Attachments (Terms of Use)
vdsm log (1.38 MB, application/x-gzip)
2011-05-24 14:27 UTC, Moran Goldboim
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 0 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description Moran Goldboim 2011-05-24 14:27:07 UTC
Created attachment 500614 [details]
vdsm log

Description of problem:
during _recoverExistingVms there is 5 seconds sleep that if during this sleep time vm is migrated/destroyed there is an unhanded exception which preventing all the other vms to connect.
while self._enabled and self.vmContainer and \
                  not self.irs.getConnectedStoragePoolsList()['poollist']:
                time.sleep(5)

            for vmId in self.vmContainer.keys():
                # Do not prepare volumes when system goes down
                if self._enabled:
                    self.vmContainer[vmId].preparePaths()

Thread-662::DEBUG::2011-05-24 16:15:18,221::clientIF::49::vds::(wrapper) return getVmStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'timeOffset': '0', 'vmId': '7a61d6c8-3df2-4551-b4c4-69260affc29f', 'exitMessage': 'Migration succeeded', 'exitCode': 0}]}
Thread-551::INFO::2011-05-24 16:15:18,224::libvirtvm::228::vm.Vm::(run) vmId=`18c4eaa5-ef78-44f8-b31b-f4a64a332d45`::Migration Progress: 10 seconds elapsed, 63% of data processed, 63% of mem processed
Thread-551::INFO::2011-05-24 16:15:18,236::libvirtvm::228::vm.Vm::(run) vmId=`18c4eaa5-ef78-44f8-b31b-f4a64a332d45`::Migration Progress: 10 seconds elapsed, 63% of data processed, 63% of mem processed
Thread-663::DEBUG::2011-05-24 16:15:18,241::clientIF::44::vds::(wrapper) [10.16.144.114]::call destroy with ('7a61d6c8-3df2-4551-b4c4-69260affc29f',) {}
Thread-663::INFO::2011-05-24 16:15:18,242::clientIF::443::vds::(destroy) vmContainerLock aquired by vm 7a61d6c8-3df2-4551-b4c4-69260affc29f
Thread-663::DEBUG::2011-05-24 16:15:18,243::libvirtvm::1160::vm.Vm::(destroy) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::destroy Called
Thread-663::INFO::2011-05-24 16:15:18,244::libvirtvm::1123::vm.Vm::(releaseVm) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::Release VM resources
Thread-663::WARNING::2011-05-24 16:15:18,244::vm::552::vm.Vm::(_set_lastStatus) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::trying to set state to Powering down when already Down
Thread-663::DEBUG::2011-05-24 16:15:18,245::utils::471::vm.Vm::(stop) vmId=`7a61d6c8-3df2-4551-b4c4-69260affc29f`::Stop statistics collection

clientIFinit::ERROR::2011-05-24 16:20:06,951::clientIF::1183::vds::(_recoverExistingVms) Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 1181, in _recoverExistingVms
    self.vmContainer[vmId].preparePaths()
KeyError: '7a61d6c8-3df2-4551-b4c4-69260affc29f'



Version-Release number of selected component (if applicable):
vdsm-4.9-67.el6.x86_64

How reproducible:


Steps to Reproduce:
1.restart vdsm and migrate or destroy vm during the 5 seconds sleep (will be hard to reproduce)
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 3 Igor Lvovsky 2011-05-26 07:55:17 UTC
http://gerrit.usersys.redhat.com/#change,485

Comment 5 Tomas Dosek 2011-07-18 06:57:54 UTC
Verified - vdsm-4.9-81.el6 - exeption raised in Moran's scenario is now handled correctly, user gets warning about a network exception in communicating with host, vm's don't try to migrate/destroy. Above described scenario therefore no longer reproduces.

Comment 6 errata-xmlrpc 2011-12-06 07:19:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html


Note You need to log in before you can comment on or make changes to this bug.