Bug 1727678 - Create VM from template from glance fails with 'enospc', engine hits NPE and VM remains locked
Summary: Create VM from template from glance fails with 'enospc', engine hits NPE and ...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.30.17
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-08 02:10 UTC by Germano Veit Michel
Modified: 2022-01-18 12:19 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-09-29 11:33:06 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1712832 0 medium CLOSED Storage migration of a compressed image is failing with error "No space left on device" in block storage domain 2023-10-06 18:19:54 UTC
Red Hat Issue Tracker RHV-37776 0 None None None 2021-08-24 12:22:52 UTC

Internal Links: 1712832

Description Germano Veit Michel 2019-07-08 02:10:54 UTC
Description of problem:

Creating a thick VM from a Glance Image imported as template fails in block storage. The destination LV is too small. 
To make things worse, the engine fails to properly rollback the operation, it deletes the destination image, but then keeps doing it again, failing with NPE on rollback.

Version-Release number of selected component (if applicable):
vdsm-4.30.17-1.el7.x86_64
ovirt-engine-4.3.4.3-1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Import image from Glance as Template
2. Create thick provisioned VM from Template

Actual results:
Fail

Expected results:
Succeed

Additional info:

VDSM:
=====

The destination LV is 128mb, and never extended:

2019-07-08 11:44:49,014+1000 INFO  (jsonrpc/0) [vdsm.api] START createVolume(sdUUID=u'0772f1a4-c22e-47bd-af53-915f2f2110a8', spUUID=u'7f6bbc5e-9eca-11e9-b288-52540019c1ff', imgUUID=u'5dfec071-4b81-41c2-80fd-d377ff9549a0', size=u'4294967296', volFormat=4, preallocate=2, diskType=u'DATA', volUUID=u'7be20bde-1333-43ba-b226-84c707cf7570', desc=None, srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=u'366048350') from=::ffff:192.168.150.253,42354, flow_id=6fcb97db-4cc0-4a95-853f-10c84923e8eb, task_id=9b1fd80c-58ae-4535-974b-a797684e8d0e (api:48)

2019-07-08 11:44:50,086+1000 INFO  (tasks/4) [storage.Volume] Request to create COW volume /rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/5dfec071-4b81-41c2-80fd-d377ff9549a0/7be20bde-1333-43ba-b226-84c707cf7570 with size = 8388608 blocks (blockVolume:511)

2019-07-08 11:44:55,218+1000 INFO  (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '0772f1a4-c22e-47bd-af53-915f2f2110a8', 'voltype': 'LEAF', 'description': 'None', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': '5dfec071-4b81-41c2-80fd-d377ff9549a0', 'ctime': '1562550291', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '402653184', 'children': [], 'pool': '', 'capacity': '4294967296', 'uuid': u'7be20bde-1333-43ba-b226-84c707cf7570', 'truesize': '402653184', 'type': 'SPARSE', 'lease': {'path': '/dev/0772f1a4-c22e-47bd-af53-915f2f2110a8/leases', 'owners': [], 'version': None, 'offset': 113246208}}} from=::ffff:192.168.150.253,41276, flow_id=6fcb97db-4cc0-4a95-853f-10c84923e8eb, task_id=9d035d45-b5b0-44ce-b5d0-190c676258d5 (api:54)

2019-07-08 11:44:55,370+1000 INFO  (jsonrpc/3) [vdsm.api] START sdm_copy_data(job_id=u'd4ac62a3-174c-4470-bf94-b1954ed5e4a8', source={u'img_id': u'bc2a0459-0ac4-4c1c-ab6b-2883af763d8d', u'sd_id': u'0772f1a4-c22e-47bd-af53-915f2f2110a8', u'endpoint_type': u'div', u'vol_id': u'3482add2-853f-431b-8e67-0b6be14fcb0e'}, destination={u'generation': 0, u'img_id': u'5dfec071-4b81-41c2-80fd-d377ff9549a0', u'sd_id': u'0772f1a4-c22e-47bd-af53-915f2f2110a8', u'endpoint_type': u'div', u'vol_id': u'7be20bde-1333-43ba-b226-84c707cf7570'}) from=::ffff:192.168.150.253,41276, flow_id=6fcb97db-4cc0-4a95-853f-10c84923e8eb, task_id=7642dff7-12a4-4009-bb53-3462e2208f43 (api:48)

2019-07-08 11:45:12,043+1000 ERROR (tasks/5) [root] Job u'd4ac62a3-174c-4470-bf94-b1954ed5e4a8' failed (jobs:221)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/copy_data.py", line 86, in _run
    self._operation.run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 343, in run
    for data in self._operation.watch():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 106, in watch
    self._finalize(b"", err)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 179, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/bc2a0459-0ac4-4c1c-ab6b-2883af763d8d/3482add2-853f-431b-8e67-0b6be14fcb0e', '-O', 'qcow2', '-o', 'compat=1.1', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/5dfec071-4b81-41c2-80fd-d377ff9549a0/7be20bde-1333-43ba-b226-84c707cf7570'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while writing sector 3742336: No space left on device\n')

Template:
  3482add2-853f-431b-8e67-0b6be14fcb0e 0772f1a4-c22e-47bd-af53-915f2f2110a8 -wi------- 384.00m                                                     IU_bc2a0459-0ac4-4c1c-ab6b-2883af763d8d,MD_1,PU_00000000-0000-0000-0000-000000000000
New VM:
  a8037db8-c9c2-4d93-b710-125ec739ce95 0772f1a4-c22e-47bd-af53-915f2f2110a8 -wi------- 128.00m                                                     IU_d5e1bf8c-44d7-4df3-a607-61d346807184,MD_7,PU_00000000-0000-0000-0000-000000000000

# dd if=/dev/0772f1a4-c22e-47bd-af53-915f2f2110a8/metadata bs=8k skip=129 count=1
CAP=4294967296
CTIME=1562302926
DESCRIPTION={"DiskAlias":"GlanceDisk-ba2dd4d","DiskDescription":"Fedora 30 Cloud Base Image v1.2 for x86_64 (ba2dd4d)"}
DISKTYPE=DATA
DOMAIN=0772f1a4-c22e-47bd-af53-915f2f2110a8
FORMAT=COW
GEN=0
IMAGE=bc2a0459-0ac4-4c1c-ab6b-2883af763d8d
LEGALITY=LEGAL
PUUID=00000000-0000-0000-0000-000000000000
TYPE=SPARSE
VOLTYPE=LEAF
EOF


# qemu-img convert /dev/0772f1a4-c22e-47bd-af53-915f2f2110a8/3482add2-853f-431b-8e67-0b6be14fcb0e -O qcow2 -o compat=1.1 /root/f30.qcow2
# qemu-img info /root/f30.qcow2 
image: /root/f30.qcow2
file format: qcow2
virtual size: 4.0G (4294967296 bytes)
disk size: 898M                            <-------
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

ENGINE:
=======

The engine sees the failure, deletes the disk (correct) but then hits an NPE and the VM remains locked.

1. Sees the failed copy

2019-07-08 11:45:21,188+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [6fcb97db-4cc0-4a95-853f-10c84923e8eb] FINISH, GetHostJobsVDSCommand, return: {d4ac62a3-174c-4470-bf94-b1954ed5e4a8=HostJobInfo:{id='d4ac62a3-174c-4470-bf94-b1954ed5e4a8', type='storage', description='copy_data', status='failed', progress='42', error='VDSError:{code='GeneralException', message='General Exception: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/bc2a0459-0ac4-4c1c-ab6b-2883af763d8d/3482add2-853f-431b-8e67-0b6be14fcb0e', '-O', 'qcow2', '-o', 'compat=1.1', '/rhev/data-center/mnt/blockSD/0772f1a4-c22e-47bd-af53-915f2f2110a8/images/5dfec071-4b81-41c2-80fd-d377ff9549a0/7be20bde-1333-43ba-b226-84c707cf7570'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while writing sector 3742336: No space left on device\\n')",)'}'}}, log id: 3905ea5a

2. Deletes the image

2019-07-08 11:45:25,750+1000 INFO  (jsonrpc/0) [vdsm.api] START deleteImage(sdUUID=u'0772f1a4-c22e-47bd-af53-915f2f2110a8', spUUID=u'7f6bbc5e-9eca-11e9-b288-52540019c1ff', imgUUID=u'5dfec071-4b81-41c2-80fd-d377ff9549a0', postZero=u'false', force=u'false', discard=False) from=::ffff:192.168.150.253,42354, flow_id=1b6d96d, task_id=e5dd8a00-3e70-4394-ad79-ee5528f9ce88 (api:48)
2019-07-08 11:45:25,897+1000 INFO  (jsonrpc/4) [root] /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err= (hooks:114)

2019-07-08 11:45:26,558+1000 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Image.delete succeeded in 0.80 seconds (__init__:312)

3. Tries to delete it again

2019-07-08 11:45:26,824+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [6fcb97db-4cc0-4a95-853f-10c84923e8eb] Failed to delete image 00000000-0000-0000-0000-000000000000/3482add2-853f-431b-8e67-0b6be14fcb0e: org.ovirt.engine.core.common.errors.EngineException: EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2075) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand.executeCommand(DestroyImageCommand.java:56) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1157) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1315) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1964) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:139) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:105) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1375) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:419) [bll.jar:]
        at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:451) [bll.jar:]


2019-07-08 11:59:13,826+10 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [0ea7d026-fcf8-4d79-8cb3-f42fac463f54] Exception: javax.ejb.EJBTransactionRolledbackException: Error retrieving database metadata; nested exception is org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLException: IJ031070: Transaction cannot proceed: STATUS_MARKED_ROLLBACK
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:203) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:420) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:148) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509)

Comment 1 Germano Veit Michel 2019-07-08 02:22:59 UTC
>   a8037db8-c9c2-4d93-b710-125ec739ce95 0772f1a4-c22e-47bd-af53-915f2f2110a8 -wi------- 128.00m                                                     IU_d5e1bf8c-44d7-4df3-a607-61d346807184,MD_7,PU_00000000-0000-0000-0000-000000000000

This is actually the wrong LV, but I don't see it getting bigger than 128mb anyway.

There is also this in the logs:

2019-07-08 11:45:29,295+1000 ERROR (jsonrpc/3) [storage.HSM] Empty or not found image 5dfec071-4b81-41c2-80fd-d377ff9549a0 in SD 0772f1a4-c22e-47bd-af53-915f2f2110a8. {'643390a5-e705-47a7-a044-29beb4ef8008': ImgsPar(imgs=['c3605567-62e1-4ae9-89a3-44890b31d516'], parent='00000000-0000-0000-0000-000000000000'), '57f7ad5a-73f3-461a-ba74-59ccb1bb935b': ImgsPar(imgs=['c50e27ae-03d2-49af-959e-960db83b0684'], parent='00000000-0000-0000-0000-000000000000'), '1821313a-12e5-401f-a3eb-d6441cc9494f': ImgsPar(imgs=['345967c9-7e06-4bc1-880d-0269d34c8a73'], parent='00000000-0000-0000-0000-000000000000'), '3482add2-853f-431b-8e67-0b6be14fcb0e': ImgsPar(imgs=['bc2a0459-0ac4-4c1c-ab6b-2883af763d8d'], parent='00000000-0000-0000-0000-000000000000'), '9d8a3686-bca3-47bc-afa2-47250ed924e2': ImgsPar(imgs=['f62f69f6-3bc2-4f26-8ed3-77497ff72def'], parent='00000000-0000-0000-0000-000000000000'), 'a8037db8-c9c2-4d93-b710-125ec739ce95': ImgsPar(imgs=['d5e1bf8c-44d7-4df3-a607-61d346807184'], parent='00000000-0000-0000-0000-000000000000'), 'f803f664-3562-481f-8811-6c69bac96d5b': ImgsPar(imgs=['b248ca9c-4a9b-4de6-90c0-5cb23b2865e4'], parent='00000000-0000-0000-0000-000000000000')} (hsm:1501)

Let me know if you need logs (mine are ugly because 3 VMs failed at the same time). Its very easy to reproduce anyway.

Comment 2 Eyal Shenitzky 2021-08-24 12:20:23 UTC
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. 
Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.

Comment 3 Michal Skrivanek 2021-09-29 11:33:06 UTC
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it.
Please feel free to reopen if you have a plan how to contribute this feature/bug fix.


Note You need to log in before you can comment on or make changes to this bug.