2069670 – NPE when converting ISCSI disk during the copy_data action

Bug 2069670 - NPE when converting ISCSI disk during the copy_data action

Summary: NPE when converting ISCSI disk during the copy_data action

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Storage
Sub Component:
Version:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.5.0
Target Release:	4.5.0.1
Assignee:	Benny Zlotnik
QA Contact:	Ilia Markelov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	RHEV_thin_to_preallocated_disks
TreeView+	depends on / blocked

Reported:	2022-03-29 12:41 UTC by sshmulev
Modified:	2022-04-20 06:33 UTC (History)
CC List:	3 users (show)
Fixed In Version:	ovirt-engine-4.5.0.1
Clone Of:
Environment:
Last Closed:	2022-04-20 06:33:59 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	pm-rhel: ovirt-4.5?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	oVirt ovirt-engine pull 203	0	None	Draft	core: measure volume before starting conversion	2022-03-30 10:22:28 UTC
Red Hat Issue Tracker	RHV-45472	0	None	None	None	2022-03-29 15:37:12 UTC

Description sshmulev 2022-03-29 12:41:27 UTC

Description of problem:
When trying to convert an ISCSI disk from raw/preallocated/incremental disabled ->> cow/Thin/incremental enabled, the operation failes with the following exception:

2022-03-29 15:07:23,653+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [1bb46e63-6368-46f4-9e4b-7913e23bb992] FINISH, GetHostJobsVDSCommand, return: {e7d84242-b6c6-409a-bd4c-7758aef34d21=HostJobInfo:{id='e7d84242-b6c6-409a-bd4c-7758aef34d21', type='storage', description='copy_data', status='failed', progress='100', error='VDSError:{code='GeneralException', message='General Exception: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '-O', 'qcow2', '-n', '/rhev/data-center/mnt/blockSD/239b3ea8-2f89-4193-ac88-40ce681c496d/images/34738192-a5db-4292-bf74-dc7090181d94/d7c7a4e4-a1ff-43b4-8ee1-b737bacd0bd1', '/rhev/data-center/mnt/blockSD/239b3ea8-2f89-4193-ac88-40ce681c496d/images/34738192-a5db-4292-bf74-dc7090181d94/2ce0dd86-5bb9-4784-8a88-4bb56eb6c313'] failed with rc=1 out=b'' err=bytearray(b'qemu-img: error while writing at byte 1071644672: No space left on device\\n')",)'}'}}, log id: 36f3b5b3

2022-03-29 15:07:35,329+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-167945) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:357)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
	at org.glassfish.javax.enterprise.concurrent.1.redhat-00001//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227)

Versions:
engine-4.5.0-0.237.el8ev
vdsm-4.50.0.10-1.el8ev


How reproducible:
100%

Steps to Reproduce:
1. Create ISCSI_Raw-Preallocated-Disabled disk:
post {{ENV_Hostname}}/ovirt-engine/api/disks
<disk>
  <storage_domains>
    <storage_domain id="{{ISCSI_SD}}"/>
  </storage_domains>
  <name>ISCSI_Raw-Preallocated-Disabled</name>
  <provisioned_size>1073741824</provisioned_size>
  <format>raw</format>
  <sparse>false</sparse>
  <backup>None</backup>
</disk>

2. Wait until the disks status is ok and try to convert its allocation policy to Thin and format to Cow:
<action>
    <disk>
        <format>cow</format>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

Actual results:
The disk is not converted and remains as it has been created.

Expected results:
The disk should be converted successfully to format cow and thin allocation.

Additional info:
Tried to perform other converts on the same SD and it passed successfully, so it doesn't appear to be a Netapp issue.
The disk can be removed if needed, so it is not locked after it fails the convert.

Comment 2 Arik 2022-03-30 07:12:51 UTC

(In reply to sshmulev from comment #0)
> Expected results:
> The disk should be converted successfully to format cow and thin allocation.

This expectation doesn't sound right - if qemu-img failed due to lack of space then the best we can do is to see that we have a proper validation to prevent the conversion from starting I guess
 
> Additional info:
> Tried to perform other converts on the same SD and it passed successfully,
> so it doesn't appear to be a Netapp issue.
> The disk can be removed if needed, so it is not locked after it fails the
> convert.

Were these disks smaller than the one it failed on?

Comment 3 Benny Zlotnik 2022-03-30 10:24:22 UTC

(In reply to Arik from comment #2)
> (In reply to sshmulev from comment #0)
> > Expected results:
> > The disk should be converted successfully to format cow and thin allocation.
> 
> This expectation doesn't sound right - if qemu-img failed due to lack of
> space then the best we can do is to see that we have a proper validation to
> prevent the conversion from starting I guess
>  
> > Additional info:
> > Tried to perform other converts on the same SD and it passed successfully,
> > so it doesn't appear to be a Netapp issue.
> > The disk can be removed if needed, so it is not locked after it fails the
> > convert.
> 
> Were these disks smaller than the one it failed on?

I'm not sure why it fails only on NetApp, but I'm pretty sure the issue is there was not enough space on the target for the qcow metadata

Comment 4 Arik 2022-04-03 06:43:34 UTC

(In reply to Benny Zlotnik from comment #3)
> (In reply to Arik from comment #2)
> > (In reply to sshmulev from comment #0)
> > > Expected results:
> > > The disk should be converted successfully to format cow and thin allocation.
> > 
> > This expectation doesn't sound right - if qemu-img failed due to lack of
> > space then the best we can do is to see that we have a proper validation to
> > prevent the conversion from starting I guess
> >  
> > > Additional info:
> > > Tried to perform other converts on the same SD and it passed successfully,
> > > so it doesn't appear to be a Netapp issue.
> > > The disk can be removed if needed, so it is not locked after it fails the
> > > convert.
> > 
> > Were these disks smaller than the one it failed on?
> 
> I'm not sure why it fails only on NetApp, but I'm pretty sure the issue is
> there was not enough space on the target for the qcow metadata

Yeah, that indeed makes more sense
So it's not a corner case as I suspected above, thus changing to 4.5

Comment 5 Ilia Markelov 2022-04-14 17:12:01 UTC

Verified.

Disks are converted successfully and no NPE found.

Versions:
engine-4.5.0.2-0.7.el8ev
vdsm-4.50.0.12-1.el8ev.x86_64

Comment 6 Sandro Bonazzola 2022-04-20 06:33:59 UTC

This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.