Bug 1867904

Summary: Exec probes on VMI pods have various issues, let's remove them
Product: Container Native Virtualization (CNV) Reporter: Roman Mohr <rmohr>
Component: VirtualizationAssignee: sgott
Status: CLOSED ERRATA QA Contact: Kedar Bidarkar <kbidarka>
Severity: high Docs Contact:
Priority: high    
Version: 2.4.0CC: cnv-qe-bugs, cpaquin, danken, fdeutsch, markmc, ncredi, sreichar
Target Milestone: ---   
Target Release: 2.4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virt-operator-container-v2.4.1-2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-03 20:31:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roman Mohr 2020-08-11 07:42:24 UTC
Description of problem:


We use exec probes when we launch VMIs to know when virt-launcher is fully started.


We ran into the following issues with the exec probes:

https://bugzilla.redhat.com/show_bug.cgi?id=1817057
https://bugzilla.redhat.com/show_bug.cgi?id=1855067
https://bugzilla.redhat.com/show_bug.cgi?id=1848524
https://bugzilla.redhat.com/show_bug.cgi?id=1850168

Since it benefits kubevirt overall to just remove the exec probes, independent of seeing them fixed in OCP, I am proposing https://github.com/kubevirt/kubevirt/pull/3971, which is merged on master already.

This has the advantage that we don't rely on exec probes anymore (at the moment), removes exec probe warning events which we can't fully eliminate with the generic readiness mechanisms and speeds up VMI start.

I see this as the proper solution, because as one can see in https://bugzilla.redhat.com/show_bug.cgi?id=1855067 it may even require kernel setting changes to make exec probes work nicely with pod limits.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Roman Mohr 2020-08-12 09:08:10 UTC
Merged and part of kubevirt 0.30.6 which is the base for CNV 2.4.1.

Comment 6 Kedar Bidarkar 2020-08-17 19:17:00 UTC
As seen below, we no longer have the readinessProbe

[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource-v2d6p -o yaml | grep readinessProbe
[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource2-77vzg -o yaml | grep readinessProbe
[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vm-rhel81-mb2q5 -o yaml | grep readinessProbe

Comment 10 Kedar Bidarkar 2020-08-24 12:11:44 UTC
Summary: 
1) The VM and pods are running successfully for 5+ days, as seen above.
2) Also the memory consumption for volumecontainerdisk is 4Mi  seen as per the kubectl top command.
3) The cDisk and DV based VMI's are accessible and stable even when running for 5+ days.
4) we no longer have the readinessProbe in the VMI Pod.

Comment 14 errata-xmlrpc 2020-09-03 20:31:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 2.4.1 images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3629

Comment 15 Chris Paquin 2020-09-10 15:40:27 UTC
Note that customer 0 plans to migrate to OCP 4.5/CNV 2.4.1 which contains fix for this issue. The specific workload that is affected is only in being testing on OCP 4.4. Not sure that we need to try to fix in 4.4 at this point.