1455871 – [downstream clone - 4.2.0] [CodeChange] move vdsm calls (mostly removeImage) from transactional endAction()

Bug 1455871 - [downstream clone - 4.2.0] [CodeChange] move vdsm calls (mostly removeImage) from transactional endAction()

Summary: [downstream clone - 4.2.0] [CodeChange] move vdsm calls (mostly removeImage) ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	unspecified
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.2.0
Target Release:	---
Assignee:	Fred Rolland
QA Contact:	Kevin Alon Goldblatt
Docs Contact:
URL:
Whiteboard:
Depends On:	1390936
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-26 10:49 UTC by rhev-integ
Modified:	2023-09-07 18:53 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1390936
Environment:
Last Closed:	2018-05-15 17:42:49 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:1488	0	None	None	None	2018-05-15 17:44:33 UTC

Description rhev-integ 2017-05-26 10:49:19 UTC

+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1390936 +++
======================================================================

Currently some flows has vdsm calls (mostly removeImage) performed on the endAction() method. If the endAction() is executed within transaction we might get a transaction timeout on some scenarios.
This RFE is about moving those calls out of the endAction() - In order to do that:
1. Our "COCO-Storage" infrastructure (serial callback) needs to be modified to support moving vdsm calls out of endWithFailure().
2. The relevant flows needs to start using the COCO infrastructure instead of relying on the tasks infrastructure.

(Originally by laravot)

Comment 1 rhev-integ 2017-05-26 10:49:32 UTC

Can you describe the functional impact?

(Originally by Yaniv Dary)

Comment 3 rhev-integ 2017-05-26 10:49:44 UTC

Sure,
The functional impact is that we may get transaction timeouts when executing vdsm calls within transactional endAction(). 
In BZ 1372743 (see https://bugzilla.redhat.com/show_bug.cgi?id=1372743#c22) it caused us to remain with a locked disk.

Let me know if further info is needed.

Thanks,
Liron

(Originally by laravot)

Comment 4 rhev-integ 2017-05-26 10:49:51 UTC

Tal, I'm treating this as code change. Please decide on a target for it.

(Originally by Yaniv Dary)

Comment 6 Allon Mureinik 2017-09-28 11:35:32 UTC

With the recent work around both LSM and cold move, the issue in the ticket should be resolved, setting to MODIFIED.

We'll keep the upstream tracking bug for other code improvements.

Comment 13 Kevin Alon Goldblatt 2017-12-04 10:38:02 UTC

Verified with the following code:
----------------------------------
ovirt-engine-4.2.0-0.5.master.el7.noarch
vdsm-4.20.8-53.gitc3edfc0.el7.centos.x86_64

Verified with the following scenario:
----------------------------------
1. Created a vm with disks and OS installed on nfs
2. Moved the host to maintenance
3. Edit the file /usr/lib/python2.7/site-packages/vdsm/API.py on the host and added a sleep as follows:
-------------------------------------------------------------------
from time import sleep

class Image(APIBase):
    ctorArgs = ['imageID', 'storagepoolID', 'storagedomainID']

    BLANK_UUID = sc.BLANK_UUID

    class DiskTypes:
        UNKNOWN = image.UNKNOWN_DISK_TYPE
        SYSTEM = image.SYSTEM_DISK_TYPE
        DATA = image.DATA_DISK_TYPE
        SHARED = image.SHARED_DISK_TYPE
        SWAP = image.SWAP_DISK_TYPE
        TEMP = image.TEMP_DISK_TYPE

    def __init__(self, UUID, spUUID, sdUUID):
        APIBase.__init__(self)
        self._UUID = UUID
        self._spUUID = spUUID
        self._sdUUID = sdUUID

    def delete(self, postZero, force, discard=False):
        sleep(600)
        return self._irs.deleteImage(self._sdUUID, self._spUUID, self._UUID,
                                     postZero, force, discard)
-----------------------------------------------------------------------
4. Restarted the vdsm on the host
5. Cold move of the disk on the vm created in step 1  >>>>> The delete image operation times out and fails. The disk is left in OK state.


Moving to VERIFY

Comment 14 RHV bug bot 2017-12-06 16:16:07 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 15 RHV bug bot 2017-12-12 21:14:43 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 16 RHV bug bot 2017-12-18 17:05:07 UTC

INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 19 errata-xmlrpc 2018-05-15 17:42:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 20 Franta Kust 2019-05-16 13:07:35 UTC

BZ<2>Jira Resync

Note You need to log in before you can comment on or make changes to this bug.