Description of problem: hostpath-provisioner-operator pods amongst others are failing randomly after cluster deployment Other pods also failing: virt-controller-b5b88dd59-8pr8j machine-api-controllers-7dc65b48df-tbpcs cdi-deployment-844845fd6d-n2pz2 cluster-node-tuning-operator-6f78bdb995-2qg77 openshift-adp-controller-manager Version-Release number of selected component (if applicable): Deployed: OCP-4.14.0-ec.3 Deployed: CNV-v4.14.0.rhel9-1576 virt-operator-64d8f997bf-r4q8f How reproducible: Happened on 3 deployments but each time random pods are failing Steps to Reproduce: 1. Deploy PSI env with 4.14 2. After a while, sometimes more than an hour hostpath-provisioner-operator amongst others pods start failing 3. Actual results: openshift-cnv pods are in CrashLoopBackOff status Expected results: Pods should be in Running state Additional info: oc get pods -A |grep -v Running |grep -v Completed NAMESPACE NAME READY STATUS RESTARTS AGE openshift-adp openshift-adp-controller-manager-5dbc95bc86-cxw84 0/1 CrashLoopBackOff 942 (2m51s ago) 3d19h openshift-cluster-node-tuning-operator cluster-node-tuning-operator-6f78bdb995-2qg77 0/1 CrashLoopBackOff 749 (16s ago) 3d20h openshift-cnv cdi-deployment-844845fd6d-n2pz2 0/1 CrashLoopBackOff 943 (3m8s ago) 3d19h openshift-cnv cdi-operator-58b6766b45-mhmg2 0/1 CrashLoopBackOff 943 (19s ago) 3d19h openshift-cnv hostpath-provisioner-operator-b74bbd4ff-7x4bm 0/1 CrashLoopBackOff 943 (33s ago) 3d19h openshift-cnv virt-controller-b5b88dd59-8pr8j 0/1 CrashLoopBackOff 1050 (4m4s ago) 3d19h openshift-cnv virt-controller-b5b88dd59-zmcx7 0/1 CrashLoopBackOff 938 (2m54s ago) 3d19h openshift-cnv virt-operator-64d8f997bf-r4q8f 0/1 CrashLoopBackOff 993 (2m54s ago) 3d19h openshift-cnv virt-operator-64d8f997bf-zfhgf 0/1 CrashLoopBackOff 910 (109s ago) 3d19h openshift-machine-api machine-api-controllers-7dc65b48df-tbpcs 6/7 CrashLoopBackOff 760 (36s ago) 3d20h
Verified with the following code: ----------------------------------------------------------------------- oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.14.0 OpenShift Virtualization 4.14.0 kubevirt-hyperconverged-operator.v4.13.3 Succeeded openshift-pipelines-operator-rh.v1.11.0 Red Hat OpenShift Pipelines 1.11.0 Succeeded v4.14.0.rhel9-1793 oc version Client Version: 4.14.0-ec.3 Kustomize Version: v5.0.1 Server Version: 4.14.0-ec.3 Kubernetes Version: v1.27.3+e8b13aa Verifed with the following scenario: ------------------------------------------------------------------------ oc get pods -A |grep -v Running |grep -v Completed >>>> no pods are failing on env running several days Not seeing this happend in the latest builds Moving to verified!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817