Bug 1990231

Summary: Setting a host to maintenance shouldn't be blocked when having 'active' image transfer
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Artiom Divak <adivak>
Status: CLOSED NEXTRELEASE QA Contact: Evelina Shames <eshames>
Severity: medium Docs Contact:
Priority: high    
Version: 4.4.7CC: ahadas, bugs, dfodor, michal.skrivanek, nsoffer, sfishbai
Target Milestone: ovirt-4.5.3Keywords: ZStream
Target Release: ---Flags: pm-rhel: ovirt-4.5?
pm-rhel: planning_ack?
pm-rhel: devel_ack+
pm-rhel: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.3.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-03 19:01:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eyal Shenitzky 2021-08-05 05:58:01 UTC
Description of problem:

Currently, when setting a host to maintenance, and there is an active image transfer that runs on this host (with status different then paused/finished/failed), the operation is blocked until the transfer will be over or paused.

We should never block this operation, we have a PeparingForMaintenance state to specifically handle tasks that need to finish before the final Maintenance state is set. 

This needlessly blocks people from initiating maintenance requests.


Version-Release number of selected component (if applicable):
4.4.7

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with a disk
2. Start downloading the disk using ImageIO
3. Set the host that handles the transfer to maintenance

Actual results:
Setting the host to maintenance is blocked with a warning that there is an image transfer process on it.

Expected results:
Setting host to maintenance should succeed with a proper warning for the user about waiting for the running image transfer to end before setting the host to maintenance and move the host state to PeparingForMaintenance state. 

Additional info:

Comment 1 Michal Skrivanek 2021-08-06 06:43:33 UTC
There are few validations for similar oongoing tasks (e.g. jobs), if possible that should be addressed as well

Comment 2 Nir Soffer 2021-08-06 16:30:16 UTC
Before we remove the validation, we must ensure that code handling preparing
for maintenance state is considering active image transfers. Otherwise the
host may be disconnected from storage while an image transfer is active.

Previously we did not need to handle this case because we had the validation.

Same issue for other ongoing tasks (comment 1) that are likely not handled yet.

Comment 4 Sandro Bonazzola 2022-03-29 16:16:40 UTC
We are past 4.5.0 feature freeze, please re-target.

Comment 5 Michal Skrivanek 2022-04-20 12:09:09 UTC
this is supposedly not that hard, and still quite useful.

Comment 6 Arik 2022-05-24 14:47:37 UTC
To fix this we need to make two changes:
1. In VirtMonitoringStrategy#canMoveToMaintenance we also need to check if there's an ongoing transfer on the host
2. In TransferDiskImage we need to make sure not to start a transfer on a host that is in PreparingToMaintenance status

Comment 7 Nir Soffer 2022-05-24 14:55:29 UTC
(In reply to Arik from comment #6)
> To fix this we need to make two changes:
> 1. In VirtMonitoringStrategy#canMoveToMaintenance we also need to check if
> there's an ongoing transfer on the host
> 2. In TransferDiskImage we need to make sure not to start a transfer on a
> host that is in PreparingToMaintenance status

Not starting a transfer on host in PreparingToMaintenance sounds like nice improvement
but it is not enough.

If there are already active transfers we need to either wait for them or cancel them,
but currently cancelling image transfer is flaky and likely to end in stuck transfer when
the user cancel the transfer after failure (same issue we had in backup).

Comment 8 Arik 2022-05-25 06:53:40 UTC
(In reply to Nir Soffer from comment #7)
> (In reply to Arik from comment #6)
> > To fix this we need to make two changes:
> > 1. In VirtMonitoringStrategy#canMoveToMaintenance we also need to check if
> > there's an ongoing transfer on the host
> > 2. In TransferDiskImage we need to make sure not to start a transfer on a
> > host that is in PreparingToMaintenance status
> 
> Not starting a transfer on host in PreparingToMaintenance sounds like nice
> improvement
> but it is not enough.
> 
> If there are already active transfers we need to either wait for them or
> cancel them,
> but currently cancelling image transfer is flaky and likely to end in stuck
> transfer when
> the user cancel the transfer after failure (same issue we had in backup).

Right, that's covered by #1 above - just like we do it for VMs that run on the host

Comment 9 Shir Fishbain 2022-05-30 19:52:36 UTC
QE doesn't have the capacity to verify during 4.5.1

Comment 10 Casper (RHV QE bot) 2022-10-03 19:01:06 UTC
This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status.