Created attachment 1513030 [details] logs Description of problem: An automation test case failed to resize disk on iscsi storage domain. Looks like the same issue as here: https://access.redhat.com/solutions/3423481 From the engine log: 2018-12-05 02:59:39,250+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM 'bdc67223-59c9-4b53-96f4-7aeb5ca8852f'(vm_TestCase5063_0502555271) moved from 'Up' --> 'Paused' 2018-12-05 02:59:39,309+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_PAUSED(1,025), VM vm_TestCase5063_0502555271 has been paused. 2018-12-05 02:59:39,320+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_PAUSED_ENOSPC(138), VM vm_TestCase5063_0502555271 has been paused due to no Storage space error. 2018-12-05 02:59:55,130+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] VM 'bdc67223-59c9-4b53-96f4-7aeb5ca8852f' was reported as Down on VDS 'a3ebe986-710b-4d72-849a-8a8bd2f16738'(host_mixed_2) 2018-12-05 02:59:55,133+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='a3ebe986-710b-4d72-849a-8a8bd2f16738', vmId='bdc67223-59c9-4b53-96f4-7aeb5ca8852f', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 22ff82b0 2018-12-05 02:59:55,138+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] FINISH, DestroyVDSCommand, log id: 22ff82b0 2018-12-05 02:59:55,138+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] VM 'bdc67223-59c9-4b53-96f4-7aeb5ca8852f'(vm_TestCase5063_0502555271) moved from 'Paused' --> 'Down' 2018-12-05 02:59:55,153+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] EVENT_ID: VM_DOWN_ERROR(119), VM vm_TestCase5063_0502555271 is down with error. Exit message: Lost connection with qemu process. From VDSM log (same log block multiple times): 2018-12-05 02:59:40,953+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='67f91f50-e668-4335-8939-68a8a03014df') moving from state init -> state preparing (task:602) 2018-12-05 02:59:40,953+0200 INFO (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='c6dc7fd0-295a-4f1e-bc4b-ac3c2c7e767f', volDict={'newSize': 2147483648, 'domainID': 'e45546fb-005c-4c1c-b9be-4bde6fe7bbb9', 'name': 'vdb', 'poolID': 'c6dc7fd0-295a-4f1e-bc4b-ac3c2c7e767f', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': False, 'volumeID': '83e6d3db-9410-4af2-b657-fc6640d49b99', 'imageID': 'b6cdbca9-10f4-4df3-aad7-005f1820d783'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7f25d6dd8ed0>>) from=internal, task_id=67f91f50-e668-4335-8939-68a8a03014df (api:46) 2018-12-05 02:59:40,953+0200 DEBUG (libvirt/events) [storage.SPM.Messages.Extend] new extend msg created: domain: e45546fb-005c-4c1c-b9be-4bde6fe7bbb9, volume: 83e6d3db-9410-4af2-b657-fc6640d49b99 (mailbox:123) 2018-12-05 02:59:40,954+0200 INFO (libvirt/events) [vdsm.api] FINISH sendExtendMsg return=None from=internal, task_id=67f91f50-e668-4335-8939-68a8a03014df (api:52) 2018-12-05 02:59:40,954+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='67f91f50-e668-4335-8939-68a8a03014df') finished: None (task:1201) 2018-12-05 02:59:40,954+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='67f91f50-e668-4335-8939-68a8a03014df') moving from state preparing -> state finished (task:602) 2018-12-05 02:59:40,955+0200 DEBUG (libvirt/events) [storage.ResourceManager.Owner] Owner.releaseAll requests {} resources {} (resourceManager:910) 2018-12-05 02:59:40,955+0200 DEBUG (libvirt/events) [storage.ResourceManager.Owner] Owner.cancelAll requests {} (resourceManager:947) 2018-12-05 02:59:40,955+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='67f91f50-e668-4335-8939-68a8a03014df') ref 0 aborting False (task:1002) 2018-12-05 02:59:40,963+0200 INFO (libvirt/events) [virt.vm] (vmId='bdc67223-59c9-4b53-96f4-7aeb5ca8852f') abnormal vm stop device ua-b6cdbca9-10f4-4df3-aad7-005f1820d783 error enospc (vm:5158) 2018-12-05 02:59:40,963+0200 INFO (libvirt/events) [virt.vm] (vmId='bdc67223-59c9-4b53-96f4-7aeb5ca8852f') CPU stopped: onIOError (vm:6199) 2018-12-05 02:59:40,968+0200 INFO (libvirt/events) [virt.vm] (vmId='bdc67223-59c9-4b53-96f4-7aeb5ca8852f') Requesting extension for volume 83e6d3db-9410-4af2-b657-fc6640d49b99 on domain e45546fb-005c-4c1c-b9be-4bde6fe7bbb9 (apparent: 1073741824, capacity: 2147483648, allocated: 1078329344, physical: 1073741824 threshold_state: exceeded) (vm:1277) ..... 2018-12-05 02:59:46,014+0200 INFO (mailbox-hsm/3) [virt.vm] (vmId='bdc67223-59c9-4b53-96f4-7aeb5ca8852f') Extend volume 83e6d3db-9410-4af2-b657-fc6640d49b99 completed <Clock(total=4.59, extend-volume=4.03, refresh-volume=0.38)> (vm:1400) 2018-12-05 02:59:46,015+0200 INFO (mailbox-hsm/3) [virt.vm] (vmId='bdc67223-59c9-4b53-96f4-7aeb5ca8852f') setting block threshold to 1610612736 bytes for drive 'vdb' (apparentsize 2147483648) (drivemonitor:108) 2018-12-05 02:59:46,015+0200 ERROR (mailbox-hsm/3) [storage.TaskManager.Task] (Task='e9d2f78f-bb41-48e8-81e9-0296c9903c1e') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1406, in __afterVolumeExtension self._update_drive_volume_size(drive, volSize) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1422, in _update_drive_volume_size self.drive_monitor.set_threshold(drive, volsize.apparentsize) File "/usr/lib/python2.7/site-packages/vdsm/virt/drivemonitor.py", line 113, in set_threshold self._vm._dom.setBlockThreshold(drive.name, threshold) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__ % self.vmid) NotConnectedError: VM 'bdc67223-59c9-4b53-96f4-7aeb5ca8852f' was not defined yet or was undefined 2018-12-05 02:59:46,114+0200 DEBUG (mailbox-hsm/3) [storage.TaskManager.Task] (Task='e9d2f78f-bb41-48e8-81e9-0296c9903c1e') Task._run: e9d2f78f-bb41-48e8-81e9-0296c9903c1e ({'newSize': 2147483648, 'domainID': 'e45546fb-005c-4c1c-b9be-4bde6fe7bbb9', 'name': 'vdb', 'poolID': 'c6dc7fd0-295a-4f1e-bc4b-ac3c2c7e767f', 'clock': <Clock(total=4.59, extend-volume=4.03, refresh-volume=0.38)>, 'internal': False, 'volumeID': '83e6d3db-9410-4af2-b657-fc6640d49b99', 'imageID': 'b6cdbca9-10f4-4df3-aad7-005f1820d783'},) {} failed - stopping task (task:894) 2018-12-05 02:59:46,114+0200 DEBUG (mailbox-hsm/3) [storage.TaskManager.Task] (Task='e9d2f78f-bb41-48e8-81e9-0296c9903c1e') stopping in state preparing (force False) (task:1256) 2018-12-05 02:59:46,114+0200 DEBUG (mailbox-hsm/3) [storage.TaskManager.Task] (Task='e9d2f78f-bb41-48e8-81e9-0296c9903c1e') ref 1 aborting True (task:1002) 2018-12-05 02:59:46,115+0200 INFO (mailbox-hsm/3) [storage.TaskManager.Task] (Task='e9d2f78f-bb41-48e8-81e9-0296c9903c1e') aborting: Task is aborted: u"VM 'bdc67223-59c9-4b53-96f4-7aeb5ca8852f' was not defined yet or was undefined" - code 100 (task:1181) ... 2018-12-05 02:59:46,117+0200 ERROR (mailbox-hsm/3) [storage.ThreadPool.WorkerThread] FINISH task 83439718-dd74-475e-8e91-9fd22ded39cd failed (callback=None, cmd=<function runTask at 0x7f260970d7d0>, args=(<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7f25d6dd8ed0>>, {'newSize': 2147483648, 'domainID': 'e45546fb-005c-4c1c-b9be-4bde6fe7bbb9', 'name': 'vdb', 'poolID': 'c6dc7fd0-295a-4f1e-bc4b-ac3c2c7e767f', 'clock': <Clock(total=4.59, extend-volume=4.03, refresh-volume=0.38)>, 'internal': False, 'volumeID': '83e6d3db-9410-4af2-b657-fc6640d49b99', 'imageID': 'b6cdbca9-10f4-4df3-aad7-005f1820d783'})) (threadPool:222) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/threadPool.py", line 209, in _processNextTask cmd(args) File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line 80, in runTask ctask.prepare(cmd, *args) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper return m(self, *a, **kw) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in prepare raise self.error NotConnectedError: VM 'bdc67223-59c9-4b53-96f4-7aeb5ca8852f' was not defined yet or was undefined Version-Release number of selected component (if applicable): ovirt-engine-4.2.8-0.1.el7ev.noarch vdsm-4.20.44-1.el7ev.x86_64 libvirt-4.5.0-10.el7_6.3.x86_64 qemu-img-rhev-2.12.0-18.el7_6.1.x86_64 How reproducible: Unknown as I failed to reproduce it both manually and using automation. Steps to Reproduce: 1. 2. 3. Actual results: The resize failed. Expected results: The disk should have been resized successfully. Additional info: Attached relevant logs.
Same happened again in other test case when extending the disk. Errors a bit different but Looks like it's related to the same issue. From the engine log: 2018-12-10 13:45:40,367+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM '460e606c-fb09-4a21-aab8-6165490ebeb9'(vm_TestCase5061_1013432638) moved from 'Up' --> 'Paused' 2018-12-10 13:45:40,386+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_PAUSED(1,025), VM vm_TestCase5061_1013432638 has been paused. From the VDSM log: 2018-12-10 13:45:55,425+0200 ERROR (jsonrpc/2) [jsonrpc.JsonRpcServer] Internal server error (__init__:611) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in _dynamicMethod result = fn(*methodArgs) File "<string>", line 2, in dumpxmls File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1430, in dumpxmls for vmId in vmList} File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1430, in <dictcomp> for vmId in vmList} KeyError: u'460e606c-fb09-4a21-aab8-6165490ebeb9' Attaching logs for this TC execution (logs_2.zip)
Created attachment 1513679 [details] logs_2
Steps to reproduce according to the test cases: TestCase5063 (the 1st test case - bug Description): 1. Create VM and start it 2. Add thin provision qcow disk of size 1G and attach to the VM 3. Extend the disk by 1G 4. Start to write using dd 5. Check that the disk is actually growing TestCase5061 (2nd test case - comment #1): 1. Create VM and start it 2. Add multiple disk permutation of size 1G, attach them to the VM and activate 3. Create snapshot (with memory) 4. Extend each disk by 1G (synchronized)
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Another reproduction of this issue on rhv-4.2.8-7(ovirt-engine-4.2.8.3-0.1.el7ev.noarch) automation tier1 TestCase5063. Logs are attached. Engine log: 2019-02-10 06:57:34,272+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_PAUSED_ENOSPC(138), VM vm_TestCase5063_1006540117 has been paused due to no Storage space error. 2019-02-10 06:57:50,637+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] VM '66596aa9-4775-4416-94ed-abf01f7dd178' was reported as Down on VDS '950c516f-2c3e-4943-8525-b0a519997293'(host_mixed_2) 2019-02-10 06:57:50,638+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='950c516f-2c3e-4943-8525-b0a519997293', vmId='66596aa9-4775-4416-94ed-abf01f7dd178', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 69d79ebd 2019-02-10 06:57:50,644+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] FINISH, DestroyVDSCommand, log id: 69d79ebd 2019-02-10 06:57:50,644+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] VM '66596aa9-4775-4416-94ed-abf01f7dd178'(vm_TestCase5063_1006540117) moved from 'Paused' --> 'Down' 2019-02-10 06:57:50,661+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] EVENT_ID: VM_DOWN_ERROR(119), VM vm_TestCase5063_1006540117 is down with error. Exit message: Lost connection with qemu process.
Created attachment 1535639 [details] rhv-4.2.8.7_TestCase5063_logs
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it. Please feel free to reopen if you have a plan how to contribute this feature/bug fix.