Bug 2008900

Summary: Eviction of not live migratable VMs due to virt-launcher upgrade can happen outside the upgrade window
Product: Container Native Virtualization (CNV) Reporter: Simone Tiraboschi <stirabos>
Component: InstallationAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Debarati Basu-Nag <dbasunag>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.9.0CC: cnv-qe-bugs, oramraz, pelauter, stirabos
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.9.0-223 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-02 16:01:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simone Tiraboschi 2021-09-29 13:03:07 UTC
Description of problem:

The default configuration on HCO CR is:

  workloadUpdateStrategy:
    workloadUpdateMethods:
      - LiveMigrate
      - Evict
    batchEvictSize: 10
    batchEvictInterval: "1m"


this means that *during* CNV upgrades, VMs are trying to be live migrated or eventually evicted in order to be sure that all the VMs are going to be executed with an up to date version of virt-launcher.

The issue is that virt-operator is going to report (with its conditions) that the upgrade completed as soon as the upgrade of its control plane completes while the upgrade of virt-launcher is completely asynchronous.

So the user can eventually see VMs getting evicted due to a CNV upgrade after the upgrade is already reported to be completed.


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. check HCO CR for defaults values of workloadUpdateMethods
2.
3.

Actual results:
    workloadUpdateMethods:
      - LiveMigrate
      - Evict

Expected results:
    workloadUpdateMethods:
      - LiveMigrate


Additional info:

Comment 1 Debarati Basu-Nag 2021-10-01 23:28:26 UTC
Validated by default, hco.workloadUpdateStrategy.workloadUpdateMethods is now "LiveMigrate"
=========================
   progressTimeout: 150
    workloadUpdateStrategy:
      batchEvictionInterval: 1m0s
      batchEvictionSize: 10
      workloadUpdateMethods:
      - LiveMigrate
    workloads: {}
==========================
Build used:
Deployed: OCP-4.9.0-rc.4
Deployed: CNV-v4.9.0-223

Comment 2 Debarati Basu-Nag 2021-10-04 22:20:08 UTC
Waiting on a cluster to perform upgrade test to complete validation of this bug.

Comment 6 errata-xmlrpc 2021-11-02 16:01:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104