Bug 1637172

Summary: Live Merge hung in the volume deletion phase, leaving snapshot in a LOCKED state
Product: Red Hat Enterprise Virtualization Manager Reporter: Gordon Watson <gwatson>
Component: ovirt-engineAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED ERRATA QA Contact: Eyal Shenitzky <eshenitz>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.5CC: aefrat, gveitmic, gwatson, lsvaty, mkalinin, tnisan
Target Milestone: ovirt-4.4.0Keywords: ZStream
Target Release: 4.3.0Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1723794 (view as bug list) Environment:
Last Closed: 2020-08-04 13:16:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1723794    

Description Gordon Watson 2018-10-08 20:15:20 UTC
Description of problem:

The volume deletion phase of a Live Merge (snapshot deletion) never completed, leaving the snapshot in a LOCKED state.

The merges for two disks completed on the host that the VM was running on. They were both active layer merges, the pivots completed. 

However, the volume deletions never completed. The engine reported a java exception for both.

During this time the SPM switched twice.


Version-Release number of selected component (if applicable):

RHV 4.2.5
RHEL 7.5 hosts w/vdsm-4.20.35-1


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Eyal Shenitzky 2018-10-23 04:18:26 UTC
Hey Gordon,

Did you manage to reproduce this bug?

Comment 10 Eyal Shenitzky 2018-11-07 10:48:16 UTC
Hey Gordon,

It is impossible to debug this log, the environment was in a huge mess and there were a lot of live merge attempts.

Please try to reproduce this issue with clear steps.

Comment 20 Sandro Bonazzola 2019-01-28 09:44:19 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 22 Eyal Shenitzky 2019-06-16 07:36:50 UTC
Reproducing this bug on a none development environment may be difficult.


Steps to reproduce:

1. Create a VM with a disk
2. Create a snapshot for the VM that contains the disk
3. Run the VM
4. Delete the snapshot from step 2
5. While the snapshot is deleted (live merge) after the 'MERGE_STATUS' step,
   block the communication between the engine to the SPM, 
   this should be done before the engine sends the SPM the command to start the deleteImage command.

The expected result after the fix:

The attempt to delete the volume will fail and a second attempt will take place.
If the communication is still blocked, the second attempt will fail too -> snapshot deletion will fail, the snapshot will not remain on 'locked'.
If the communication is OK, the second attempt will succeed -> snapshot deletion will succeed

Comment 23 Avihai 2019-06-19 08:33:04 UTC
As we discussed in thread mail and is written in the previous comment this bug is almost impossible to verify in QE(none development environment).

Eyal S. will verify this bug when it will be merged on master and a cherry-pick will be created.
 
reassigning QA Contact to Eyal to verify on current target milestone(4.3.5) .

Comment 25 Eyal Shenitzky 2019-08-06 05:52:54 UTC
For some reason, the verification modification wasn't saved.


Verified locally on my dev environment by throwing an exception.

steps:
1) Create a VM with a disk
2) Create a snapshot for the VM
3) Run the VM
4) Add an Exception that will throw in -

    @Override
    protected void executeCommand() {
        getParameters().setEntityInfo(new EntityInfo(VdcObjectType.Disk, getParameters().getImageGroupId()));

        VDSReturnValue vdsReturnValue = null;
        try {
            throw new EngineException();
            // vdsReturnValue = runVdsCommand(VDSCommandType.DestroyImage, createVDSParameters());
        } catch (EngineException e) {
            log.error("Failed to delete image {}/{}", getParameters().getImageGroupId(),
                    getParameters().getImageList().stream().findFirst().get(), e);
            if (!getParameters().isLiveMerge()) {
                throw e;
            }
        }
    ....

5) Remove the created snapshot -> DestroyImageCommand failed, live merge continuing to try until it will succeed.
6) Remove the added Exception -> DestroyImageCommand succeed, snapshot removed.


Moving to verify

Comment 26 Daniel Gur 2019-08-28 13:13:14 UTC
sync2jira

Comment 27 Daniel Gur 2019-08-28 13:17:27 UTC
sync2jira

Comment 29 RHEL Program Management 2019-12-05 07:02:14 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 30 RHV bug bot 2019-12-13 13:13:57 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 31 RHV bug bot 2019-12-20 17:43:49 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 32 RHV bug bot 2020-01-08 14:48:22 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 33 RHV bug bot 2020-01-08 15:14:38 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 34 RHV bug bot 2020-01-24 19:50:09 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 38 errata-xmlrpc 2020-08-04 13:16:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3247