Description of problem: We use exec probes when we launch VMIs to know when virt-launcher is fully started. We ran into the following issues with the exec probes: https://bugzilla.redhat.com/show_bug.cgi?id=1817057 https://bugzilla.redhat.com/show_bug.cgi?id=1855067 https://bugzilla.redhat.com/show_bug.cgi?id=1848524 https://bugzilla.redhat.com/show_bug.cgi?id=1850168 Since it benefits kubevirt overall to just remove the exec probes, independent of seeing them fixed in OCP, I am proposing https://github.com/kubevirt/kubevirt/pull/3971, which is merged on master already. This has the advantage that we don't rely on exec probes anymore (at the moment), removes exec probe warning events which we can't fully eliminate with the generic readiness mechanisms and speeds up VMI start. I see this as the proper solution, because as one can see in https://bugzilla.redhat.com/show_bug.cgi?id=1855067 it may even require kernel setting changes to make exec probes work nicely with pod limits. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Merged and part of kubevirt 0.30.6 which is the base for CNV 2.4.1.
As seen below, we no longer have the readinessProbe [kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource-v2d6p -o yaml | grep readinessProbe [kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource2-77vzg -o yaml | grep readinessProbe [kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vm-rhel81-mb2q5 -o yaml | grep readinessProbe
Summary: 1) The VM and pods are running successfully for 5+ days, as seen above. 2) Also the memory consumption for volumecontainerdisk is 4Mi seen as per the kubectl top command. 3) The cDisk and DV based VMI's are accessible and stable even when running for 5+ days. 4) we no longer have the readinessProbe in the VMI Pod.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.4.1 images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3629
Note that customer 0 plans to migrate to OCP 4.5/CNV 2.4.1 which contains fix for this issue. The specific workload that is affected is only in being testing on OCP 4.4. Not sure that we need to try to fix in 4.4 at this point.