Bug 2069670

Summary:	NPE when converting ISCSI disk during the copy_data action
Product:	[oVirt] ovirt-engine	Reporter:	sshmulev
Component:	BLL.Storage	Assignee:	Benny Zlotnik <bzlotnik>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Ilia Markelov <imarkelo>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.5.0	CC:	ahadas, bugs, eshames
Target Milestone:	ovirt-4.5.0	Keywords:	ZStream
Target Release:	4.5.0.1	Flags:	pm-rhel: ovirt-4.5?
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ovirt-engine-4.5.0.1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-04-20 06:33:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	977778

Description sshmulev 2022-03-29 12:41:27 UTC

Description of problem:
When trying to convert an ISCSI disk from raw/preallocated/incremental disabled ->> cow/Thin/incremental enabled, the operation failes with the following exception:

2022-03-29 15:07:23,653+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [1bb46e63-6368-46f4-9e4b-7913e23bb992] FINISH, GetHostJobsVDSCommand, return: {e7d84242-b6c6-409a-bd4c-7758aef34d21=HostJobInfo:{id='e7d84242-b6c6-409a-bd4c-7758aef34d21', type='storage', description='copy_data', status='failed', progress='100', error='VDSError:{code='GeneralException', message='General Exception: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '-O', 'qcow2', '-n', '/rhev/data-center/mnt/blockSD/239b3ea8-2f89-4193-ac88-40ce681c496d/images/34738192-a5db-4292-bf74-dc7090181d94/d7c7a4e4-a1ff-43b4-8ee1-b737bacd0bd1', '/rhev/data-center/mnt/blockSD/239b3ea8-2f89-4193-ac88-40ce681c496d/images/34738192-a5db-4292-bf74-dc7090181d94/2ce0dd86-5bb9-4784-8a88-4bb56eb6c313'] failed with rc=1 out=b'' err=bytearray(b'qemu-img: error while writing at byte 1071644672: No space left on device\\n')",)'}'}}, log id: 36f3b5b3

2022-03-29 15:07:35,329+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-167945) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:357)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
	at org.glassfish.javax.enterprise.concurrent.1.redhat-00001//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227)

Versions:
engine-4.5.0-0.237.el8ev
vdsm-4.50.0.10-1.el8ev


How reproducible:
100%

Steps to Reproduce:
1. Create ISCSI_Raw-Preallocated-Disabled disk:
post {{ENV_Hostname}}/ovirt-engine/api/disks
<disk>
  <storage_domains>
    <storage_domain id="{{ISCSI_SD}}"/>
  </storage_domains>
  <name>ISCSI_Raw-Preallocated-Disabled</name>
  <provisioned_size>1073741824</provisioned_size>
  <format>raw</format>
  <sparse>false</sparse>
  <backup>None</backup>
</disk>

2. Wait until the disks status is ok and try to convert its allocation policy to Thin and format to Cow:
<action>
    <disk>
        <format>cow</format>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

Actual results:
The disk is not converted and remains as it has been created.

Expected results:
The disk should be converted successfully to format cow and thin allocation.

Additional info:
Tried to perform other converts on the same SD and it passed successfully, so it doesn't appear to be a Netapp issue.
The disk can be removed if needed, so it is not locked after it fails the convert.

Comment 2 Arik 2022-03-30 07:12:51 UTC

(In reply to sshmulev from comment #0)
> Expected results:
> The disk should be converted successfully to format cow and thin allocation.

This expectation doesn't sound right - if qemu-img failed due to lack of space then the best we can do is to see that we have a proper validation to prevent the conversion from starting I guess
 
> Additional info:
> Tried to perform other converts on the same SD and it passed successfully,
> so it doesn't appear to be a Netapp issue.
> The disk can be removed if needed, so it is not locked after it fails the
> convert.

Were these disks smaller than the one it failed on?

Comment 3 Benny Zlotnik 2022-03-30 10:24:22 UTC

(In reply to Arik from comment #2)
> (In reply to sshmulev from comment #0)
> > Expected results:
> > The disk should be converted successfully to format cow and thin allocation.
> 
> This expectation doesn't sound right - if qemu-img failed due to lack of
> space then the best we can do is to see that we have a proper validation to
> prevent the conversion from starting I guess
>  
> > Additional info:
> > Tried to perform other converts on the same SD and it passed successfully,
> > so it doesn't appear to be a Netapp issue.
> > The disk can be removed if needed, so it is not locked after it fails the
> > convert.
> 
> Were these disks smaller than the one it failed on?

I'm not sure why it fails only on NetApp, but I'm pretty sure the issue is there was not enough space on the target for the qcow metadata

Comment 4 Arik 2022-04-03 06:43:34 UTC

(In reply to Benny Zlotnik from comment #3)
> (In reply to Arik from comment #2)
> > (In reply to sshmulev from comment #0)
> > > Expected results:
> > > The disk should be converted successfully to format cow and thin allocation.
> > 
> > This expectation doesn't sound right - if qemu-img failed due to lack of
> > space then the best we can do is to see that we have a proper validation to
> > prevent the conversion from starting I guess
> >  
> > > Additional info:
> > > Tried to perform other converts on the same SD and it passed successfully,
> > > so it doesn't appear to be a Netapp issue.
> > > The disk can be removed if needed, so it is not locked after it fails the
> > > convert.
> > 
> > Were these disks smaller than the one it failed on?
> 
> I'm not sure why it fails only on NetApp, but I'm pretty sure the issue is
> there was not enough space on the target for the qcow metadata

Yeah, that indeed makes more sense
So it's not a corner case as I suspected above, thus changing to 4.5

Comment 5 Ilia Markelov 2022-04-14 17:12:01 UTC

Verified.

Disks are converted successfully and no NPE found.

Versions:
engine-4.5.0.2-0.7.el8ev
vdsm-4.50.0.12-1.el8ev.x86_64

Comment 6 Sandro Bonazzola 2022-04-20 06:33:59 UTC

This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.