Hide Forgot
Description of problem: Start VM action sometimes fails and its host becomes non-responsive. This is also relevant for VM migration action. Version-Release number of selected component (if applicable): oVirt engine Version 4.1.0-0.2.master.20161201131309.git6c02a32.el7.centos Vdsm version: vdsm-4.18.999-1028.gitd5c00b9.el7.centos How reproducible: Sometimes. Steps to Reproduce: 1. Start a VM on host. Actual results: Start fails and makes host non-responsive. Expected results: Should work. Additional info: Run VM command started on: 2016-12-05 11:34 vdsm.log: 2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [storage.TaskManager.Task] (Task='435710e8-2bc2-4789-aec2-08e72b4f1b21') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3115, in prepareImage self.getPool(spUUID) File "/usr/share/vdsm/storage/hsm.py", line 326, in getPool raise se.StoragePoolUnknown(spUUID) StoragePoolUnknown: Unknown pool id, pool not connected: ('a0ff0f8f-3675-4166-85f5-128bc3424160',) 2016-12-05 11:34:26,929 INFO (vm/ccc27397) [storage.TaskManager.Task] (Task='435710e8-2bc2-4789-aec2-08e72b4f1b21') aborting: Task is aborted: 'Unknown pool id, pool not connected' - code 309 (task:1175) 2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [storage.Dispatcher] {'status': {'message': "Unknown pool id, pool not connected: ('a0ff0f8f-3675-4166-85f5-128bc3424160',)", 'code': 309}} (dispatcher:77) 2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [virt.vm] (vmId='ccc27397-1d68-4883-a8be-5542ba077b26') The vm start process failed (vm:613) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 549, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 1899, in _run self._preparePathsForDrives(dev_spec_map[hwclass.DISK]) File "/usr/share/vdsm/virt/vm.py", line 639, in _preparePathsForDrives drive['path'] = self.cif.prepareVolumePath(drive, self.id) File "/usr/share/vdsm/clientIF.py", line 320, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {'index': 0, 'domainID': '39de3668-5565-4901-888e-5294729ed509', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'address': {'function': '0x0', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'slot': '0x07'}, 'volumeID': '6ac83627-74d3-4647-9f80-9dc7af5b493b', 'apparentsize': '18612224', 'imageID': '43f9a986-0afb-4ca0-8a16-65b704e636ee', 'discard': False, 'specParams': {}, 'readonly': 'false', 'iface': 'virtio', 'optional': 'false', 'deviceId': '43f9a986-0afb-4ca0-8a16-65b704e636ee', 'truesize': '18554880', 'poolID': 'a0ff0f8f-3675-4166-85f5-128bc3424160', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'} engine.log: 2016-12-05 11:34:24,144+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM golden_env_mixed_virtio_2 is down with error. Exit message: Bad volume specification {'index': 0, 'domainID': '39de3668-5565-4901-888e-5294729ed509', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'address': {'function': '0x0', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'slot': '0x07'}, 'volumeID': '9b9de755-9280-43ee-8cb5-a0e06d064f8f', 'apparentsize': '18677760', 'imageID': 'd3470036-e997-48e1-bcf2-5624cb5e0a7b', 'discard': False, 'specParams': {}, 'readonly': 'false', 'iface': 'virtio', 'optional': 'false', 'deviceId': 'd3470036-e997-48e1-bcf2-5624cb5e0a7b', 'truesize': '18620416', 'poolID': 'a0ff0f8f-3675-4166-85f5-128bc3424160', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}.
Created attachment 1228066 [details] vdsm, supervdsm and engine logs
*** Bug 1400512 has been marked as a duplicate of this bug. ***
Update on finding: If problem occurs, host can be put into maintenance mode and reactivated, then the VM starts normally.
I could not reproduce on master. I asked Mor to update if it will happen again on his setup.
Mor, did you encountered this issue again ?
(In reply to Fred Rolland from comment #5) > Mor, did you encountered this issue again ? No, it seems to be resolved.
Closing it, seems to have been solved between the builds.