Bug 2069670 - NPE when converting ISCSI disk during the copy_data action
Summary: NPE when converting ISCSI disk during the copy_data action
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.5.0
: 4.5.0.1
Assignee: Benny Zlotnik
QA Contact: Ilia Markelov
URL:
Whiteboard:
Depends On:
Blocks: RHEV_thin_to_preallocated_disks
TreeView+ depends on / blocked
 
Reported: 2022-03-29 12:41 UTC by sshmulev
Modified: 2022-04-20 06:33 UTC (History)
3 users (show)

Fixed In Version: ovirt-engine-4.5.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-20 06:33:59 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 203 0 None Draft core: measure volume before starting conversion 2022-03-30 10:22:28 UTC
Red Hat Issue Tracker RHV-45472 0 None None None 2022-03-29 15:37:12 UTC

Description sshmulev 2022-03-29 12:41:27 UTC
Description of problem:
When trying to convert an ISCSI disk from raw/preallocated/incremental disabled ->> cow/Thin/incremental enabled, the operation failes with the following exception:

2022-03-29 15:07:23,653+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [1bb46e63-6368-46f4-9e4b-7913e23bb992] FINISH, GetHostJobsVDSCommand, return: {e7d84242-b6c6-409a-bd4c-7758aef34d21=HostJobInfo:{id='e7d84242-b6c6-409a-bd4c-7758aef34d21', type='storage', description='copy_data', status='failed', progress='100', error='VDSError:{code='GeneralException', message='General Exception: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '-O', 'qcow2', '-n', '/rhev/data-center/mnt/blockSD/239b3ea8-2f89-4193-ac88-40ce681c496d/images/34738192-a5db-4292-bf74-dc7090181d94/d7c7a4e4-a1ff-43b4-8ee1-b737bacd0bd1', '/rhev/data-center/mnt/blockSD/239b3ea8-2f89-4193-ac88-40ce681c496d/images/34738192-a5db-4292-bf74-dc7090181d94/2ce0dd86-5bb9-4784-8a88-4bb56eb6c313'] failed with rc=1 out=b'' err=bytearray(b'qemu-img: error while writing at byte 1071644672: No space left on device\\n')",)'}'}}, log id: 36f3b5b3

2022-03-29 15:07:35,329+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-167945) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:357)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
	at org.glassfish.javax.enterprise.concurrent.1.redhat-00001//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227)

Versions:
engine-4.5.0-0.237.el8ev
vdsm-4.50.0.10-1.el8ev


How reproducible:
100%

Steps to Reproduce:
1. Create ISCSI_Raw-Preallocated-Disabled disk:
post {{ENV_Hostname}}/ovirt-engine/api/disks
<disk>
  <storage_domains>
    <storage_domain id="{{ISCSI_SD}}"/>
  </storage_domains>
  <name>ISCSI_Raw-Preallocated-Disabled</name>
  <provisioned_size>1073741824</provisioned_size>
  <format>raw</format>
  <sparse>false</sparse>
  <backup>None</backup>
</disk>

2. Wait until the disks status is ok and try to convert its allocation policy to Thin and format to Cow:
<action>
    <disk>
        <format>cow</format>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

Actual results:
The disk is not converted and remains as it has been created.

Expected results:
The disk should be converted successfully to format cow and thin allocation.

Additional info:
Tried to perform other converts on the same SD and it passed successfully, so it doesn't appear to be a Netapp issue.
The disk can be removed if needed, so it is not locked after it fails the convert.

Comment 2 Arik 2022-03-30 07:12:51 UTC
(In reply to sshmulev from comment #0)
> Expected results:
> The disk should be converted successfully to format cow and thin allocation.

This expectation doesn't sound right - if qemu-img failed due to lack of space then the best we can do is to see that we have a proper validation to prevent the conversion from starting I guess
 
> Additional info:
> Tried to perform other converts on the same SD and it passed successfully,
> so it doesn't appear to be a Netapp issue.
> The disk can be removed if needed, so it is not locked after it fails the
> convert.

Were these disks smaller than the one it failed on?

Comment 3 Benny Zlotnik 2022-03-30 10:24:22 UTC
(In reply to Arik from comment #2)
> (In reply to sshmulev from comment #0)
> > Expected results:
> > The disk should be converted successfully to format cow and thin allocation.
> 
> This expectation doesn't sound right - if qemu-img failed due to lack of
> space then the best we can do is to see that we have a proper validation to
> prevent the conversion from starting I guess
>  
> > Additional info:
> > Tried to perform other converts on the same SD and it passed successfully,
> > so it doesn't appear to be a Netapp issue.
> > The disk can be removed if needed, so it is not locked after it fails the
> > convert.
> 
> Were these disks smaller than the one it failed on?

I'm not sure why it fails only on NetApp, but I'm pretty sure the issue is there was not enough space on the target for the qcow metadata

Comment 4 Arik 2022-04-03 06:43:34 UTC
(In reply to Benny Zlotnik from comment #3)
> (In reply to Arik from comment #2)
> > (In reply to sshmulev from comment #0)
> > > Expected results:
> > > The disk should be converted successfully to format cow and thin allocation.
> > 
> > This expectation doesn't sound right - if qemu-img failed due to lack of
> > space then the best we can do is to see that we have a proper validation to
> > prevent the conversion from starting I guess
> >  
> > > Additional info:
> > > Tried to perform other converts on the same SD and it passed successfully,
> > > so it doesn't appear to be a Netapp issue.
> > > The disk can be removed if needed, so it is not locked after it fails the
> > > convert.
> > 
> > Were these disks smaller than the one it failed on?
> 
> I'm not sure why it fails only on NetApp, but I'm pretty sure the issue is
> there was not enough space on the target for the qcow metadata

Yeah, that indeed makes more sense
So it's not a corner case as I suspected above, thus changing to 4.5

Comment 5 Ilia Markelov 2022-04-14 17:12:01 UTC
Verified.

Disks are converted successfully and no NPE found.

Versions:
engine-4.5.0.2-0.7.el8ev
vdsm-4.50.0.12-1.el8ev.x86_64

Comment 6 Sandro Bonazzola 2022-04-20 06:33:59 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.