Bug 1455871

Summary:	[downstream clone - 4.2.0] [CodeChange] move vdsm calls (mostly removeImage) from transactional endAction()
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	rhev-integ
Component:	ovirt-engine	Assignee:	Fred Rolland <frolland>
Status:	CLOSED ERRATA	QA Contact:	Kevin Alon Goldblatt <kgoldbla>
Severity:	high	Docs Contact:
Priority:	high
Version:	unspecified	CC:	acanan, aefrat, bugs, ebenahar, frolland, jentrena, lsurette, mperina, pdwyer, rbalakri, Rhev-m-bugs, srevivo, tnisan, ykaul, ylavi
Target Milestone:	ovirt-4.2.0	Keywords:	CodeChange, ZStream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1390936	Environment:
Last Closed:	2018-05-15 17:42:49 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1390936
Bug Blocks:

Description rhev-integ 2017-05-26 10:49:19 UTC

+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1390936 +++
======================================================================

Currently some flows has vdsm calls (mostly removeImage) performed on the endAction() method. If the endAction() is executed within transaction we might get a transaction timeout on some scenarios.
This RFE is about moving those calls out of the endAction() - In order to do that:
1. Our "COCO-Storage" infrastructure (serial callback) needs to be modified to support moving vdsm calls out of endWithFailure().
2. The relevant flows needs to start using the COCO infrastructure instead of relying on the tasks infrastructure.

(Originally by laravot)

Comment 1 rhev-integ 2017-05-26 10:49:32 UTC

Can you describe the functional impact?

(Originally by Yaniv Dary)

Comment 3 rhev-integ 2017-05-26 10:49:44 UTC

Sure,
The functional impact is that we may get transaction timeouts when executing vdsm calls within transactional endAction(). 
In BZ 1372743 (see https://bugzilla.redhat.com/show_bug.cgi?id=1372743#c22) it caused us to remain with a locked disk.

Let me know if further info is needed.

Thanks,
Liron

(Originally by laravot)

Comment 4 rhev-integ 2017-05-26 10:49:51 UTC

Tal, I'm treating this as code change. Please decide on a target for it.

(Originally by Yaniv Dary)

Comment 6 Allon Mureinik 2017-09-28 11:35:32 UTC

With the recent work around both LSM and cold move, the issue in the ticket should be resolved, setting to MODIFIED.

We'll keep the upstream tracking bug for other code improvements.

Comment 13 Kevin Alon Goldblatt 2017-12-04 10:38:02 UTC

Verified with the following code:
----------------------------------
ovirt-engine-4.2.0-0.5.master.el7.noarch
vdsm-4.20.8-53.gitc3edfc0.el7.centos.x86_64

Verified with the following scenario:
----------------------------------
1. Created a vm with disks and OS installed on nfs
2. Moved the host to maintenance
3. Edit the file /usr/lib/python2.7/site-packages/vdsm/API.py on the host and added a sleep as follows:
-------------------------------------------------------------------
from time import sleep

class Image(APIBase):
    ctorArgs = ['imageID', 'storagepoolID', 'storagedomainID']

    BLANK_UUID = sc.BLANK_UUID

    class DiskTypes:
        UNKNOWN = image.UNKNOWN_DISK_TYPE
        SYSTEM = image.SYSTEM_DISK_TYPE
        DATA = image.DATA_DISK_TYPE
        SHARED = image.SHARED_DISK_TYPE
        SWAP = image.SWAP_DISK_TYPE
        TEMP = image.TEMP_DISK_TYPE

    def __init__(self, UUID, spUUID, sdUUID):
        APIBase.__init__(self)
        self._UUID = UUID
        self._spUUID = spUUID
        self._sdUUID = sdUUID

    def delete(self, postZero, force, discard=False):
        sleep(600)
        return self._irs.deleteImage(self._sdUUID, self._spUUID, self._UUID,
                                     postZero, force, discard)
-----------------------------------------------------------------------
4. Restarted the vdsm on the host
5. Cold move of the disk on the vm created in step 1  >>>>> The delete image operation times out and fails. The disk is left in OK state.


Moving to VERIFY

Comment 14 RHV bug bot 2017-12-06 16:16:07 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 15 RHV bug bot 2017-12-12 21:14:43 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 16 RHV bug bot 2017-12-18 17:05:07 UTC

INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 19 errata-xmlrpc 2018-05-15 17:42:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 20 Franta Kust 2019-05-16 13:07:35 UTC

BZ<2>Jira Resync