+++ This bug is an upstream to downstream clone. The original bug is: +++ +++ bug 1390936 +++ ====================================================================== Currently some flows has vdsm calls (mostly removeImage) performed on the endAction() method. If the endAction() is executed within transaction we might get a transaction timeout on some scenarios. This RFE is about moving those calls out of the endAction() - In order to do that: 1. Our "COCO-Storage" infrastructure (serial callback) needs to be modified to support moving vdsm calls out of endWithFailure(). 2. The relevant flows needs to start using the COCO infrastructure instead of relying on the tasks infrastructure. (Originally by laravot)
Can you describe the functional impact? (Originally by Yaniv Dary)
Sure, The functional impact is that we may get transaction timeouts when executing vdsm calls within transactional endAction(). In BZ 1372743 (see https://bugzilla.redhat.com/show_bug.cgi?id=1372743#c22) it caused us to remain with a locked disk. Let me know if further info is needed. Thanks, Liron (Originally by laravot)
Tal, I'm treating this as code change. Please decide on a target for it. (Originally by Yaniv Dary)
With the recent work around both LSM and cold move, the issue in the ticket should be resolved, setting to MODIFIED. We'll keep the upstream tracking bug for other code improvements.
Verified with the following code: ---------------------------------- ovirt-engine-4.2.0-0.5.master.el7.noarch vdsm-4.20.8-53.gitc3edfc0.el7.centos.x86_64 Verified with the following scenario: ---------------------------------- 1. Created a vm with disks and OS installed on nfs 2. Moved the host to maintenance 3. Edit the file /usr/lib/python2.7/site-packages/vdsm/API.py on the host and added a sleep as follows: ------------------------------------------------------------------- from time import sleep class Image(APIBase): ctorArgs = ['imageID', 'storagepoolID', 'storagedomainID'] BLANK_UUID = sc.BLANK_UUID class DiskTypes: UNKNOWN = image.UNKNOWN_DISK_TYPE SYSTEM = image.SYSTEM_DISK_TYPE DATA = image.DATA_DISK_TYPE SHARED = image.SHARED_DISK_TYPE SWAP = image.SWAP_DISK_TYPE TEMP = image.TEMP_DISK_TYPE def __init__(self, UUID, spUUID, sdUUID): APIBase.__init__(self) self._UUID = UUID self._spUUID = spUUID self._sdUUID = sdUUID def delete(self, postZero, force, discard=False): sleep(600) return self._irs.deleteImage(self._sdUUID, self._spUUID, self._UUID, postZero, force, discard) ----------------------------------------------------------------------- 4. Restarted the vdsm on the host 5. Cold move of the disk on the vm created in step 1 >>>>> The delete image operation times out and fails. The disk is left in OK state. Moving to VERIFY
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops
INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [No relevant external trackers attached] For more info please contact: rhv-devops
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488
BZ<2>Jira Resync