Bug 1417458

Summary: Cold Merge: Use volume generation
Product: [oVirt] ovirt-engine Reporter: Ala Hino <ahino>
Component: BLL.StorageAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: bugs, tnisan
Target Milestone: ovirt-4.1.1Flags: rule-engine: ovirt-4.1+
Target Release: 4.1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Host becomes non-responsive Consequence: Unable to determine status of merge job Fix: Use generation Result: When host becomes non-responsive, engine attempts to fence the job, i.e. the host will fail to execute the job if attempts to. When trying to execute the job on a different host, there are two options: - Job completed on the previous host. This attempt fails (as expected) because the generation will be different on the engine and the host (generation on host incremented because the job successfully completed) - Job failed on the previous host. This attempt succeeds because generation on engine equals generation on the host
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-21 09:48:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ala Hino 2017-01-29 15:10:26 UTC
Description of problem:
Generation support is used to enhance error handling for jobs on
non-responsive hosts and decide about job status: started, didn't start,
failed or completed. Based on generation and volume lease, we could
decide whether to fence the job.

Steps to Reproduce:
1. Start cold merge
2. Stop Vdsm during merge step (watch the log to see when merge starts)
3. Try again

Comment 1 Kevin Alon Goldblatt 2017-02-16 13:41:45 UTC
Tested with the following code:
-----------------------------------------------
ovirt-engine-4.1.1-0.1.el7.noarch
rhevm-4.1.1-0.1.el7.noarch
vdsm-4.19.5-1.el7ev.x86_64

Verified with the following scenario:
----------------------------------------------
Create VM with disks on system with 2 hosts
Stop the VM
Start a cold merge and stop the vdsm on the Performing HSM during the cold merge
The 2nd HSM continues the job successfully
Start the previously stopped vdsm on second host >>>>> no attempt is made to continue the previous job

Moving to VERIFIED!