This is the vdsm part to avoid the "RuntimeError: dictionary changed size during iteration" exception. +++ This bug was initially created as a clone of Bug #923194 +++ --- Additional comment from Federico Simoncelli on 2013-04-03 07:42:17 EDT --- I found multiple issues in the attached logs. With regard to: Thread-449::ERROR::2013-04-03 10:56:24,726::BindingXMLRPC::932::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 918, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 345, in vmDiskReplicateStart return vm.diskReplicateStart(srcDisk, dstDisk) File "/usr/share/vdsm/API.py", line 520, in diskReplicateStart return v.diskReplicateStart(srcDisk, dstDisk) File "/usr/share/vdsm/libvirtvm.py", line 2271, in diskReplicateStart self._setDiskReplica(srcDrive, dstDisk) File "/usr/share/vdsm/libvirtvm.py", line 2241, in _setDiskReplica self.saveState() File "/usr/share/vdsm/libvirtvm.py", line 2509, in saveState vm.Vm.saveState(self) File "/usr/share/vdsm/vm.py", line 761, in saveState toSave = deepcopy(self.status()) File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy y = copier(x, memo) File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy y = copier(x, memo) File "/usr/lib64/python2.6/copy.py", line 228, in _deepcopy_list y.append(deepcopy(a, memo)) File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy y = copier(x, memo) File "/usr/lib64/python2.6/copy.py", line 254, in _deepcopy_dict for key, value in x.iteritems(): RuntimeError: dictionary changed size during iteration VDSM failed to update (save) the state of a VM because of two concurrent vmDiskReplicateStart which modified the VM configuration at the same time: Thread-449::DEBUG::2013-04-03 10:56:24,717::BindingXMLRPC::913::vds::(wrapper) client [10.35.161.131]::call vmDiskReplicateStart with ('c3cdb482-8472-4f8a-b2ee-332118d467d1', {'device': 'disk', 'domainID': '6829602d-352a-40a4-af70-376f6e498f85', 'volumeID': '7ec4cbcc-1f0a-48b4-8ba3-000b42a0701f', 'poolID': '574d2c32-013c-4210-ab82-334188bd6171', 'imageID': '88929489-c525-49f9-9b1b-4efe28c4a706'}, {'device': 'disk', 'domainID': 'a3282596-8f78-4930-bb76-bebeb657babf', 'volumeID': '7ec4cbcc-1f0a-48b4-8ba3-000b42a0701f', 'poolID': '574d2c32-013c-4210-ab82-334188bd6171', 'imageID': '88929489-c525-49f9-9b1b-4efe28c4a706'}) {} flowID [2904eb87] Thread-450::DEBUG::2013-04-03 10:56:24,724::BindingXMLRPC::913::vds::(wrapper) client [10.35.161.131]::call vmDiskReplicateStart with ('c3cdb482-8472-4f8a-b2ee-332118d467d1', {'device': 'disk', 'domainID': '6829602d-352a-40a4-af70-376f6e498f85', 'volumeID': 'b0463b99-16a0-4ca7-b9b3-ff370dc200e4', 'poolID': '574d2c32-013c-4210-ab82-334188bd6171', 'imageID': 'f4eca5b2-1f0f-4d6e-9da7-fbee2d9e532e'}, {'device': 'disk', 'domainID': 'a3282596-8f78-4930-bb76-bebeb657babf', 'volumeID': 'b0463b99-16a0-4ca7-b9b3-ff370dc200e4', 'poolID': '574d2c32-013c-4210-ab82-334188bd6171', 'imageID': 'f4eca5b2-1f0f-4d6e-9da7-fbee2d9e532e'}) {} flowID [2904eb87] That can be easily resolved in VDSM but I suggest to open a bug on the engine too as this exception should have been handled (one of the vmDiskReplicateStart failed => retry or rollback to the source). If it's needed I can provide a custom VDSM that triggers the exception without having to reproduce the entire scenario. Beside that I found again traces of storage overload: MainThread::INFO::2013-04-03 11:03:14,647::logUtils::37::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) ... 6bbc5822-3e98-4d13-8c7c-92e62d4006a6::WARNING::2013-04-03 11:03:53,802::task::579::TaskManager.Task::(_updateState) Task=`6bbc5822-3e98-4d13-8c7c-92e62d4006a6`::Task._updateState: failed persisting task 6bbc5822-3e98-4d13-8c7c-92e62d4006a6 Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 576, in _updateState self.persist() File "/usr/share/vdsm/storage/task.py", line 1098, in persist self._save(self.store) File "/usr/share/vdsm/storage/task.py", line 717, in _save raise se.TaskDirError("_save: no such task dir '%s'" % origTaskDir) TaskDirError: can't find/access task dir: ("_save: no such task dir '/rhev/data-center/574d2c32-013c-4210-ab82-334188bd6171/mastersd/master/tasks/6bbc5822-3e98-4d13-8c7c-92e62d4006a6'",) ... MainThread::INFO::2013-04-03 11:03:54,328::vdsm::89::vds::(run) I am the actual vdsm 4.10-12.0 cougar01.scl.lab.tlv.redhat.com (2.6.32-358.2.1.el6.x86_64) 2013-04-03 11:03:12+0300 1114 [6957]: s1 check_our_lease warning 78 last_success 1036 2013-04-03 11:03:13+0300 1115 [6957]: s1 check_our_lease warning 79 last_success 1036 2013-04-03 11:03:14+0300 1116 [6957]: s1 check_our_lease failed 80 2013-04-03 11:03:14+0300 1116 [6957]: s1 kill 10381 sig 15 count 1 2013-04-03 11:03:15+0300 1117 [6957]: s1 kill 10381 sig 15 count 2 ...
Verified on RHEVM-3.2-SF16: vdsm-4.10.2-18.0.el6ev.x86_64 rhevm-3.2.0-10.25.beta3.el6ev.noarch libvirt-0.10.2-18.el6_4.4.x86_64 concurrent storage migration of multiple disks succeeded.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html