Description of problem: Creating a thick VM from a Glance Image imported as template fails in block storage. The destination LV is too small. To make things worse, the engine fails to properly rollback the operation, it deletes the destination image, but then keeps doing it again, failing with NPE on rollback. Version-Release number of selected component (if applicable): vdsm-4.30.17-1.el7.x86_64 ovirt-engine-4.3.4.3-1.el7.noarch How reproducible: Always Steps to Reproduce: 1. Import image from Glance as Template 2. Create thick provisioned VM from Template Actual results: Fail Expected results: Succeed Additional info: VDSM: ===== The destination LV is 128mb, and never extended: 2019-07-08 11:44:49,014+1000 INFO (jsonrpc/0) [vdsm.api] START createVolume(sdUUID=u'0772f1a4-c22e-47bd-af53-915f2f2110a8', spUUID=u'7f6bbc5e-9eca-11e9-b288-52540019c1ff', imgUUID=u'5dfec071-4b81-41c2-80fd-d377ff9549a0', size=u'4294967296', volFormat=4, preallocate=2, diskType=u'DATA', volUUID=u'7be20bde-1333-43ba-b226-84c707cf7570', desc=None, srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=u'366048350') from=::ffff:192.168.150.253,42354, flow_id=6fcb97db-4cc0-4a95-853f-10c84923e8eb, task_id=9b1fd80c-58ae-4535-974b-a797684e8d0e (api:48) 2019-07-08 11:44:50,086+1000 INFO (tasks/4) [storage.Volume] Request to create COW volume /rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/5dfec071-4b81-41c2-80fd-d377ff9549a0/7be20bde-1333-43ba-b226-84c707cf7570 with size = 8388608 blocks (blockVolume:511) 2019-07-08 11:44:55,218+1000 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '0772f1a4-c22e-47bd-af53-915f2f2110a8', 'voltype': 'LEAF', 'description': 'None', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': '5dfec071-4b81-41c2-80fd-d377ff9549a0', 'ctime': '1562550291', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '402653184', 'children': [], 'pool': '', 'capacity': '4294967296', 'uuid': u'7be20bde-1333-43ba-b226-84c707cf7570', 'truesize': '402653184', 'type': 'SPARSE', 'lease': {'path': '/dev/0772f1a4-c22e-47bd-af53-915f2f2110a8/leases', 'owners': [], 'version': None, 'offset': 113246208}}} from=::ffff:192.168.150.253,41276, flow_id=6fcb97db-4cc0-4a95-853f-10c84923e8eb, task_id=9d035d45-b5b0-44ce-b5d0-190c676258d5 (api:54) 2019-07-08 11:44:55,370+1000 INFO (jsonrpc/3) [vdsm.api] START sdm_copy_data(job_id=u'd4ac62a3-174c-4470-bf94-b1954ed5e4a8', source={u'img_id': u'bc2a0459-0ac4-4c1c-ab6b-2883af763d8d', u'sd_id': u'0772f1a4-c22e-47bd-af53-915f2f2110a8', u'endpoint_type': u'div', u'vol_id': u'3482add2-853f-431b-8e67-0b6be14fcb0e'}, destination={u'generation': 0, u'img_id': u'5dfec071-4b81-41c2-80fd-d377ff9549a0', u'sd_id': u'0772f1a4-c22e-47bd-af53-915f2f2110a8', u'endpoint_type': u'div', u'vol_id': u'7be20bde-1333-43ba-b226-84c707cf7570'}) from=::ffff:192.168.150.253,41276, flow_id=6fcb97db-4cc0-4a95-853f-10c84923e8eb, task_id=7642dff7-12a4-4009-bb53-3462e2208f43 (api:48) 2019-07-08 11:45:12,043+1000 ERROR (tasks/5) [root] Job u'd4ac62a3-174c-4470-bf94-b1954ed5e4a8' failed (jobs:221) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run self._run() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/copy_data.py", line 86, in _run self._operation.run() File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 343, in run for data in self._operation.watch(): File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 106, in watch self._finalize(b"", err) File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 179, in _finalize raise cmdutils.Error(self._cmd, rc, out, err) Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/bc2a0459-0ac4-4c1c-ab6b-2883af763d8d/3482add2-853f-431b-8e67-0b6be14fcb0e', '-O', 'qcow2', '-o', 'compat=1.1', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/5dfec071-4b81-41c2-80fd-d377ff9549a0/7be20bde-1333-43ba-b226-84c707cf7570'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while writing sector 3742336: No space left on device\n') Template: 3482add2-853f-431b-8e67-0b6be14fcb0e 0772f1a4-c22e-47bd-af53-915f2f2110a8 -wi------- 384.00m IU_bc2a0459-0ac4-4c1c-ab6b-2883af763d8d,MD_1,PU_00000000-0000-0000-0000-000000000000 New VM: a8037db8-c9c2-4d93-b710-125ec739ce95 0772f1a4-c22e-47bd-af53-915f2f2110a8 -wi------- 128.00m IU_d5e1bf8c-44d7-4df3-a607-61d346807184,MD_7,PU_00000000-0000-0000-0000-000000000000 # dd if=/dev/0772f1a4-c22e-47bd-af53-915f2f2110a8/metadata bs=8k skip=129 count=1 CAP=4294967296 CTIME=1562302926 DESCRIPTION={"DiskAlias":"GlanceDisk-ba2dd4d","DiskDescription":"Fedora 30 Cloud Base Image v1.2 for x86_64 (ba2dd4d)"} DISKTYPE=DATA DOMAIN=0772f1a4-c22e-47bd-af53-915f2f2110a8 FORMAT=COW GEN=0 IMAGE=bc2a0459-0ac4-4c1c-ab6b-2883af763d8d LEGALITY=LEGAL PUUID=00000000-0000-0000-0000-000000000000 TYPE=SPARSE VOLTYPE=LEAF EOF # qemu-img convert /dev/0772f1a4-c22e-47bd-af53-915f2f2110a8/3482add2-853f-431b-8e67-0b6be14fcb0e -O qcow2 -o compat=1.1 /root/f30.qcow2 # qemu-img info /root/f30.qcow2 image: /root/f30.qcow2 file format: qcow2 virtual size: 4.0G (4294967296 bytes) disk size: 898M <------- cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false ENGINE: ======= The engine sees the failure, deletes the disk (correct) but then hits an NPE and the VM remains locked. 1. Sees the failed copy 2019-07-08 11:45:21,188+10 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [6fcb97db-4cc0-4a95-853f-10c84923e8eb] FINISH, GetHostJobsVDSCommand, return: {d4ac62a3-174c-4470-bf94-b1954ed5e4a8=HostJobInfo:{id='d4ac62a3-174c-4470-bf94-b1954ed5e4a8', type='storage', description='copy_data', status='failed', progress='42', error='VDSError:{code='GeneralException', message='General Exception: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/bc2a0459-0ac4-4c1c-ab6b-2883af763d8d/3482add2-853f-431b-8e67-0b6be14fcb0e', '-O', 'qcow2', '-o', 'compat=1.1', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/5dfec071-4b81-41c2-80fd-d377ff9549a0/7be20bde-1333-43ba-b226-84c707cf7570'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while writing sector 3742336: No space left on device\\n')",)'}'}}, log id: 3905ea5a 2. Deletes the image 2019-07-08 11:45:25,750+1000 INFO (jsonrpc/0) [vdsm.api] START deleteImage(sdUUID=u'0772f1a4-c22e-47bd-af53-915f2f2110a8', spUUID=u'7f6bbc5e-9eca-11e9-b288-52540019c1ff', imgUUID=u'5dfec071-4b81-41c2-80fd-d377ff9549a0', postZero=u'false', force=u'false', discard=False) from=::ffff:192.168.150.253,42354, flow_id=1b6d96d, task_id=e5dd8a00-3e70-4394-ad79-ee5528f9ce88 (api:48) 2019-07-08 11:45:25,897+1000 INFO (jsonrpc/4) [root] /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err= (hooks:114) 2019-07-08 11:45:26,558+1000 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Image.delete succeeded in 0.80 seconds (__init__:312) 3. Tries to delete it again 2019-07-08 11:45:26,824+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [6fcb97db-4cc0-4a95-853f-10c84923e8eb] Failed to delete image 00000000-0000-0000-0000-000000000000/3482add2-853f-431b-8e67-0b6be14fcb0e: org.ovirt.engine.core.common.errors.EngineException: EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2075) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand.executeCommand(DestroyImageCommand.java:56) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1157) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1315) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1964) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:139) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:105) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1375) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:419) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:451) [bll.jar:] 2019-07-08 11:59:13,826+10 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [0ea7d026-fcf8-4d79-8cb3-f42fac463f54] Exception: javax.ejb.EJBTransactionRolledbackException: Error retrieving database metadata; nested exception is org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLException: IJ031070: Transaction cannot proceed: STATUS_MARKED_ROLLBACK at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:203) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:420) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:148) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509)
> a8037db8-c9c2-4d93-b710-125ec739ce95 0772f1a4-c22e-47bd-af53-915f2f2110a8 -wi------- 128.00m IU_d5e1bf8c-44d7-4df3-a607-61d346807184,MD_7,PU_00000000-0000-0000-0000-000000000000 This is actually the wrong LV, but I don't see it getting bigger than 128mb anyway. There is also this in the logs: 2019-07-08 11:45:29,295+1000 ERROR (jsonrpc/3) [storage.HSM] Empty or not found image 5dfec071-4b81-41c2-80fd-d377ff9549a0 in SD 0772f1a4-c22e-47bd-af53-915f2f2110a8. {'643390a5-e705-47a7-a044-29beb4ef8008': ImgsPar(imgs=['c3605567-62e1-4ae9-89a3-44890b31d516'], parent='00000000-0000-0000-0000-000000000000'), '57f7ad5a-73f3-461a-ba74-59ccb1bb935b': ImgsPar(imgs=['c50e27ae-03d2-49af-959e-960db83b0684'], parent='00000000-0000-0000-0000-000000000000'), '1821313a-12e5-401f-a3eb-d6441cc9494f': ImgsPar(imgs=['345967c9-7e06-4bc1-880d-0269d34c8a73'], parent='00000000-0000-0000-0000-000000000000'), '3482add2-853f-431b-8e67-0b6be14fcb0e': ImgsPar(imgs=['bc2a0459-0ac4-4c1c-ab6b-2883af763d8d'], parent='00000000-0000-0000-0000-000000000000'), '9d8a3686-bca3-47bc-afa2-47250ed924e2': ImgsPar(imgs=['f62f69f6-3bc2-4f26-8ed3-77497ff72def'], parent='00000000-0000-0000-0000-000000000000'), 'a8037db8-c9c2-4d93-b710-125ec739ce95': ImgsPar(imgs=['d5e1bf8c-44d7-4df3-a607-61d346807184'], parent='00000000-0000-0000-0000-000000000000'), 'f803f664-3562-481f-8811-6c69bac96d5b': ImgsPar(imgs=['b248ca9c-4a9b-4de6-90c0-5cb23b2865e4'], parent='00000000-0000-0000-0000-000000000000')} (hsm:1501) Let me know if you need logs (mine are ugly because 3 VMs failed at the same time). Its very easy to reproduce anyway.
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it. Please feel free to reopen if you have a plan how to contribute this feature/bug fix.