Bug 2019453

Summary: Stale PDBs do not get reconciled triggering continuous PDB alerts
Product: Container Native Virtualization (CNV) Reporter: Antonio Cardace <acardace>
Component: VirtualizationAssignee: Antonio Cardace <acardace>
Status: CLOSED ERRATA QA Contact: zhe peng <zpeng>
Severity: medium Docs Contact:
Priority: urgent    
Version: 4.8.2CC: cnv-qe-bugs, fdeutsch, kbidarka, sgott, zpeng
Target Milestone: ---   
Target Release: 4.8.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virt-operator-container-v4.8.4-4 hco-bundle-registry-container-v4.8.4-11 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-20 17:21:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Antonio Cardace 2021-11-02 15:15:50 UTC
Description of problem:

When upgrading from CNV 4.8.0 to 4.8.2 if there are running VMIs with 'EvictionStrategy: LiveMigrate' the associated disruption budgets will keep existing forever since the VMI-PDB logic changed in 4.8.1 and those old PDBs do not get properly reconciled, this might continuously trigger alerts about not having enough pods compared to the pods the PDB expects to protect (as until 4.8.1 the logic was to create a single PDB with 'MinAvailable: 2' at all times).

The quick workaround is to delete all pdbs associated to running VMIs so that virt-controller will notice the deletions and re-create the PDBs with the correct values, to do that simply run:

(assuming the VMIs are in the default namespace)
kubectl delete pdb --all 

otherwise

kubectl -n $NAMESPACE delete pdb --all


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Install CNV 4.8.0
2. Create a VMI with 'EvictionStrategy: LiveMigrate'
3. Upgrade to CNV 4.8.2

Actual results:
The VMI associated PDB has 'MinAvailable: 2'.


Expected results:
The VMI associated PDB should have 'MinAvailable: 1'.


Additional info:

Comment 1 Antonio Cardace 2021-11-03 13:48:13 UTC
Posted fix at https://github.com/kubevirt/kubevirt/pull/6723.

Comment 4 zhe peng 2021-12-15 09:28:01 UTC
verify with build:
hco-bundle-registry-container-v4.8.4-20

step:
1. deploy cnv4.8.3 cluster and create&run rhel8 vm
vm have: 'EvictionStrategy: LiveMigrate'
2. check status
$ oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.8.3   OpenShift Virtualization   4.8.3     kubevirt-hyperconverged-operator.v4.8.2   Succeeded

$ oc get pdb
NAME                               MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
kubevirt-disruption-budget-z7j9f   1               N/A               0                     8m32s

3. upgrade cnv from 4.8.3 to 4.8.4-20
$ oc get ip -A
NAMESPACE                 NAME            CSV                                         APPROVAL    APPROVED
openshift-cnv             install-8tjvk   kubevirt-hyperconverged-operator.v4.8.3     Manual      true
openshift-cnv             install-g7vpx   kubevirt-hyperconverged-operator.v4.8.4     Manual      false
openshift-local-storage   install-tzbms   local-storage-operator.4.8.0-202111041632   Automatic   true
openshift-storage         install-zfqjz   ocs-operator.v4.8.6                         Automatic   true

$ oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.8.4   OpenShift Virtualization   4.8.4     kubevirt-hyperconverged-operator.v4.8.3   Succeeded

4. check pdb
$ oc get pdb
NAME                               MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
kubevirt-disruption-budget-z7j9f   1               N/A               0                     110m
$ oc get vmi
NAME       AGE    PHASE     IP             NODENAME
vm-rhel8   110m   Running   10.129.2.159   virt03-9vn6t-worker-0-vcqwb

Comment 6 zhe peng 2021-12-15 09:33:25 UTC
per comment 4, vmi is 'MinAvailable: 1' after cnv upgrade, move to verified.

Comment 8 Fabian Deutsch 2022-01-10 13:39:44 UTC
@zpeng have we also checked that no alerts are firing anymore?

Comment 9 Fabian Deutsch 2022-01-10 13:50:15 UTC
I was thinking of rhbz#2026733, but this is a different bug

Comment 10 zhe peng 2022-01-12 11:27:16 UTC
His Fabian,

No, I didn't check that, just follow description of the bug.

Comment 15 errata-xmlrpc 2022-01-20 17:21:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.8.4 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0213