Bug 1585013

Summary: [downstream clone - 4.2.4] ovirt-engine loses track of a cancelled disk
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-engineAssignee: Daniel Erez <derez>
Status: CLOSED ERRATA QA Contact: Natalie Gavrielov <ngavrilo>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.2CC: derez, ebenahar, gveitmic, ishaby, lsurette, lsvaty, mgoldboi, nsoffer, rbalakri, Rhev-m-bugs, srevivo, tnisan, ykaul, ylavi
Target Milestone: ovirt-4.2.4Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1565673 Environment:
Last Closed: 2018-06-27 10:02:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1565673    
Bug Blocks:    

Description RHV bug bot 2018-06-01 06:59:41 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1565673 +++
======================================================================

Description of problem:

I had a failed upload through the API.  In the UI I did Upload->Cancel
and the disk has disappeared.

However the disk space has not been recovered and oVirt appears
to have "forgotten" about the disk.  It is clearly still there
in the NFS storage:

$ cd /mnt/ovirt-data/7b4ae32d-bc79-4368-b683-3e0b58b64bf6/images
$ du -sh *
1.1M	275d7f97-627c-438d-b5f4-75bb0d5159ac
1.1M	33793225-e2c2-4a68-aa9b-68e2a64a733f
1.1M	3f7dacef-ad54-4502-bf6d-ea71f805bb41
1.1M	45d8cf54-7b9e-4b2e-b32f-558dc8843a6b
1.1M	6654a958-931f-466f-819c-bcb76ccf1692
1.1M	850fb38f-ce8b-4d87-aa67-cd04a4704039
1.1M	9756749f-f003-42a4-8f16-52b3595106c9
1.1M	a27919ed-4e03-46c5-be18-8f242da4f15e
1.1M	aabacf8f-d13e-4dd5-bce6-69ca760ce771
1.1M	aacbb0f4-9624-46df-a8f9-ab0711551a04
1.1M	ad0fd425-983f-4bed-a8fe-d2e017ad786d
439M	bc122b0d-d0c9-4e39-8e36-87c3fc9f43bf
1.1M	c97a5a01-d30c-4540-a1d7-4cb6fdb806c6
1.1M	d3c58c0b-dee6-4fff-aa24-a3f239ef47bd
41G	e9900d00-9ce6-404c-919a-6f17fea0a1bb
1.1M	f4c04bfe-cc4a-4469-af52-a1eb8eec170b
1.1M	f56c6164-5ea2-40fc-91f4-33bf5678d1f7
1.1M	f91e9beb-6fe5-4396-8fbf-f3446b50cbc3
$ ls -l e9900d00-9ce6-404c-919a-6f17fea0a1bb
total 41944080
-rw-rw----. 1 36 kvm 42949672960 Apr 10 13:32 cc182999-de3b-4ac9-ac80-60475888a65a
-rw-rw----. 1 36 kvm     1048576 Apr 10 13:07 cc182999-de3b-4ac9-ac80-60475888a65a.lease
-rw-r--r--. 1 36 kvm         350 Apr 10 13:07 cc182999-de3b-4ac9-ac80-60475888a65a.meta

(Originally by Richard Jones)

Comment 1 RHV bug bot 2018-06-01 06:59:51 UTC
On engine:

ovirt-ansible-cluster-upgrade-1.1.4-1.el7.centos.noarch
ovirt-ansible-disaster-recovery-0.1-1.el7.centos.noarch
ovirt-ansible-engine-setup-1.1.0-1.el7.centos.noarch
ovirt-ansible-image-template-1.1.5-1.el7.centos.noarch
ovirt-ansible-infra-1.1.3-1.el7.centos.noarch
ovirt-ansible-manageiq-1.1.5-1.el7.centos.noarch
ovirt-ansible-repositories-1.1.0-1.el7.centos.noarch
ovirt-ansible-roles-1.1.3-1.el7.centos.noarch
ovirt-ansible-vm-infra-1.1.4-1.el7.centos.noarch
ovirt-cockpit-sso-0.0.4-1.el7.noarch
ovirt-engine-4.2.2.5-1.el7.centos.noarch
ovirt-engine-api-explorer-0.0.2-1.el7.centos.noarch
ovirt-engine-backend-4.2.2.5-1.el7.centos.noarch
ovirt-engine-cli-3.6.9.2-1.el7.centos.noarch
ovirt-engine-dashboard-1.2.2-3.el7.centos.noarch
ovirt-engine-dbscripts-4.2.2.5-1.el7.centos.noarch
ovirt-engine-dwh-4.2.2.2-1.el7.centos.noarch
ovirt-engine-dwh-setup-4.2.2.2-1.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.1.7-1.el7.centos.noarch
ovirt-engine-extensions-api-impl-4.2.2.5-1.el7.centos.noarch
ovirt-engine-lib-4.2.2.6-0.0.master.20180322134320.git2ef85b5.el7.centos.noarch
ovirt-engine-metrics-1.1.2.2-1.el7.centos.noarch
ovirt-engine-restapi-4.2.2.5-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch
ovirt-engine-setup-4.2.2.5-1.el7.centos.noarch
ovirt-engine-setup-base-4.2.2.5-1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.2.2.5-1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-4.2.2.5-1.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.2.2.5-1.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.2.2.5-1.el7.centos.noarch
ovirt-engine-tools-4.2.2.5-1.el7.centos.noarch
ovirt-engine-tools-backup-4.2.2.5-1.el7.centos.noarch
ovirt-engine-vmconsole-proxy-helper-4.2.2.5-1.el7.centos.noarch
ovirt-engine-webadmin-portal-4.2.2.5-1.el7.centos.noarch
ovirt-engine-websocket-proxy-4.2.2.5-1.el7.centos.noarch
ovirt-engine-wildfly-11.0.0-1.el7.centos.x86_64
ovirt-engine-wildfly-overlay-11.0.1-1.el7.centos.noarch
ovirt-host-deploy-1.7.3-1.el7.centos.noarch
ovirt-host-deploy-java-1.7.3-1.el7.centos.noarch
ovirt-imageio-common-1.3.0-0.201804031158.git8c388d1.el7.centos.noarch
ovirt-imageio-proxy-1.3.0-0.201804031158.git8c388d1.el7.centos.noarch
ovirt-imageio-proxy-setup-1.3.0-0.201804031158.git8c388d1.el7.centos.noarch
ovirt-iso-uploader-4.2.0-1.el7.centos.noarch
ovirt-js-dependencies-1.2.0-3.1.el7.centos.noarch
ovirt-provider-ovn-1.2.5-1.el7.centos.noarch
ovirt-release42-pre-4.2.2-0.5.rc5.20180320231726.git716ab35.el7.centos.noarch
ovirt-setup-lib-1.1.4-1.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7.noarch
ovirt-web-ui-1.3.5-1.el7.centos.noarch
python-ovirt-engine-sdk4-4.2.4-2.el7.centos.x86_64

On node:

cockpit-ovirt-dashboard-0.11.20-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.2-0.1.20180209.gite99bbd1.el7.centos.noarch
ovirt-host-4.2.3-0.0.master.20180314072625.gitb93bc6a.el7.centos.x86_64
ovirt-host-dependencies-4.2.3-0.0.master.20180314072625.gitb93bc6a.el7.centos.x86_64
ovirt-host-deploy-1.7.4-0.0.master.20180313171951.git3441821.el7.centos.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.16-1.el7.centos.noarch
ovirt-imageio-common-1.3.0-0.201804031158.git8c388d1.el7.centos.noarch
ovirt-imageio-daemon-1.3.0-0.201804031158.git8c388d1.el7.centos.noarch
ovirt-provider-ovn-driver-1.2.10-0.20180314082503.gitb7e43f0.el7.centos.noarch
ovirt-release42-pre-4.2.2-3.el7.centos.noarch
ovirt-setup-lib-1.1.5-0.0.master.20180219145311.gitdee3d31.el7.centos.noarch
ovirt-vmconsole-1.0.5-0.0.master.20180215132524.gitf24a817.el7.centos.noarch
ovirt-vmconsole-host-1.0.5-0.0.master.20180215132524.gitf24a817.el7.centos.noarch
python-ovirt-engine-sdk4-4.2.4-2.20180316gita0f4e48.el7.centos.x86_64
vdsm-4.20.23-12.gited79797.el7.centos.x86_64
vdsm-api-4.20.23-12.gited79797.el7.centos.noarch
vdsm-client-4.20.23-12.gited79797.el7.centos.noarch
vdsm-common-4.20.23-12.gited79797.el7.centos.noarch
vdsm-hook-ethtool-options-4.20.23-12.gited79797.el7.centos.noarch
vdsm-hook-fcoe-4.20.23-12.gited79797.el7.centos.noarch
vdsm-hook-openstacknet-4.20.23-12.gited79797.el7.centos.noarch
vdsm-hook-vfio-mdev-4.20.23-12.gited79797.el7.centos.noarch
vdsm-hook-vhostmd-4.20.23-12.gited79797.el7.centos.noarch
vdsm-hook-vmfex-dev-4.20.23-12.gited79797.el7.centos.noarch
vdsm-http-4.20.23-12.gited79797.el7.centos.noarch
vdsm-jsonrpc-4.20.23-12.gited79797.el7.centos.noarch
vdsm-network-4.20.23-12.gited79797.el7.centos.x86_64
vdsm-python-4.20.23-12.gited79797.el7.centos.noarch
vdsm-yajsonrpc-4.20.23-12.gited79797.el7.centos.noarch

(Originally by Richard Jones)

Comment 3 RHV bug bot 2018-06-01 06:59:56 UTC
Created attachment 1419907 [details]
vdsm.log

(Originally by Richard Jones)

Comment 4 RHV bug bot 2018-06-01 07:00:01 UTC
Created attachment 1419908 [details]
engine.log

(Originally by Richard Jones)

Comment 5 RHV bug bot 2018-06-01 07:00:07 UTC
"Scan disks" does not help.

The lost disk is not visible anywhere in the UI.

(Originally by Richard Jones)

Comment 6 RHV bug bot 2018-06-01 07:00:12 UTC
Setting priority/severity to high, since this leak unlimited amount of storage 
space and the user does not have any way to reclaim the space.

(Originally by Nir Soffer)

Comment 7 RHV bug bot 2018-06-01 07:00:20 UTC
*** Bug 1516903 has been marked as a duplicate of this bug. ***

(Originally by Idan Shaby)

Comment 8 RHV bug bot 2018-06-01 07:00:24 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops

(Originally by rhv-bugzilla-bot)

Comment 10 Natalie Gavrielov 2018-06-10 10:47:02 UTC
Daniel,

Does the following scenario verify the fix?
1. Upload disk using python SDK
2. Fail the upload
3. Cancel the upload through UI

Should it be tested on both file and block storage types?
Was it reproducible/ consistent?

Comment 11 Daniel Erez 2018-06-10 11:21:13 UTC
(In reply to Natalie Gavrielov from comment #10)
> Daniel,
> 
> Does the following scenario verify the fix?
> 1. Upload disk using python SDK
> 2. Fail the upload
> 3. Cancel the upload through UI
> 
> Should it be tested on both file and block storage types?
> Was it reproducible/ consistent?

The issue here was that the disk wasn't being removed from storage upon failure. It was reproducible and consistent for all storage types.

Comment 12 Natalie Gavrielov 2018-06-10 11:53:38 UTC
(In reply to Daniel Erez from comment #11)
> (In reply to Natalie Gavrielov from comment #10)
> > Daniel,
> > 
> > Does the following scenario verify the fix?
> > 1. Upload disk using python SDK
> > 2. Fail the upload
> > 3. Cancel the upload through UI
> > 
> > Should it be tested on both file and block storage types?
> > Was it reproducible/ consistent?
> 
> The issue here was that the disk wasn't being removed from storage upon
> failure. It was reproducible and consistent for all storage types.

So steps 1-2 are sufficient?
(assuming that the expected result is for the disk to be removed from the storage)

Comment 13 Daniel Erez 2018-06-10 12:00:59 UTC
(In reply to Natalie Gavrielov from comment #12)
> (In reply to Daniel Erez from comment #11)
> > (In reply to Natalie Gavrielov from comment #10)
> > > Daniel,
> > > 
> > > Does the following scenario verify the fix?
> > > 1. Upload disk using python SDK
> > > 2. Fail the upload
> > > 3. Cancel the upload through UI
> > > 
> > > Should it be tested on both file and block storage types?
> > > Was it reproducible/ consistent?
> > 
> > The issue here was that the disk wasn't being removed from storage upon
> > failure. It was reproducible and consistent for all storage types.
> 
> So steps 1-2 are sufficient?
> (assuming that the expected result is for the disk to be removed from the
> storage)

iiuc, the disk is in paused state after step 2, so you should also invoke cancel (from api or UI) to initiate the failure flow.

Comment 14 Natalie Gavrielov 2018-06-12 07:54:55 UTC
Verified, ovirt-engine-4.2.4.2-0.1.el7_3.noarch
Performed scenario described in comment 10, now the disk is removed.

Comment 15 Natalie Gavrielov 2018-06-12 10:47:33 UTC
(In reply to Natalie Gavrielov from comment #14)
> Verified, ovirt-engine-4.2.4.2-0.1.el7_3.noarch
> Performed scenario described in comment 10, now the disk is removed.

Note: tested on both file and block storage types.

Comment 17 errata-xmlrpc 2018-06-27 10:02:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2071

Comment 18 Franta Kust 2019-05-16 13:04:03 UTC
BZ<2>Jira Resync