Bug 2008900 - Eviction of not live migratable VMs due to virt-launcher upgrade can happen outside the upgrade window
Summary: Eviction of not live migratable VMs due to virt-launcher upgrade can happen o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: Simone Tiraboschi
QA Contact: Debarati Basu-Nag
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-29 13:03 UTC by Simone Tiraboschi
Modified: 2021-11-02 16:01 UTC (History)
4 users (show)

Fixed In Version: hco-bundle-registry-container-v4.9.0-223
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-02 16:01:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 1542 0 None Merged Avoid setting Evict workloadUpdates strategy 2021-09-29 15:38:29 UTC
Github kubevirt hyperconverged-cluster-operator pull 1543 0 None Merged [release-1.5] Avoid setting Evict workloadUpdates strategy 2021-09-29 16:34:23 UTC
Red Hat Product Errata RHSA-2021:4104 0 None None None 2021-11-02 16:01:30 UTC

Description Simone Tiraboschi 2021-09-29 13:03:07 UTC
Description of problem:

The default configuration on HCO CR is:

  workloadUpdateStrategy:
    workloadUpdateMethods:
      - LiveMigrate
      - Evict
    batchEvictSize: 10
    batchEvictInterval: "1m"


this means that *during* CNV upgrades, VMs are trying to be live migrated or eventually evicted in order to be sure that all the VMs are going to be executed with an up to date version of virt-launcher.

The issue is that virt-operator is going to report (with its conditions) that the upgrade completed as soon as the upgrade of its control plane completes while the upgrade of virt-launcher is completely asynchronous.

So the user can eventually see VMs getting evicted due to a CNV upgrade after the upgrade is already reported to be completed.


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. check HCO CR for defaults values of workloadUpdateMethods
2.
3.

Actual results:
    workloadUpdateMethods:
      - LiveMigrate
      - Evict

Expected results:
    workloadUpdateMethods:
      - LiveMigrate


Additional info:

Comment 1 Debarati Basu-Nag 2021-10-01 23:28:26 UTC
Validated by default, hco.workloadUpdateStrategy.workloadUpdateMethods is now "LiveMigrate"
=========================
   progressTimeout: 150
    workloadUpdateStrategy:
      batchEvictionInterval: 1m0s
      batchEvictionSize: 10
      workloadUpdateMethods:
      - LiveMigrate
    workloads: {}
==========================
Build used:
Deployed: OCP-4.9.0-rc.4
Deployed: CNV-v4.9.0-223

Comment 2 Debarati Basu-Nag 2021-10-04 22:20:08 UTC
Waiting on a cluster to perform upgrade test to complete validation of this bug.

Comment 6 errata-xmlrpc 2021-11-02 16:01:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104


Note You need to log in before you can comment on or make changes to this bug.