Bug 1867904

Summary:	Exec probes on VMI pods have various issues, let's remove them
Product:	Container Native Virtualization (CNV)	Reporter:	Roman Mohr <rmohr>
Component:	Virtualization	Assignee:	sgott
Status:	CLOSED ERRATA	QA Contact:	Kedar Bidarkar <kbidarka>
Severity:	high	Docs Contact:
Priority:	high
Version:	2.4.0	CC:	cnv-qe-bugs, cpaquin, danken, fdeutsch, markmc, ncredi, sreichar
Target Milestone:	---
Target Release:	2.4.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	virt-operator-container-v2.4.1-2	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-09-03 20:31:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Roman Mohr 2020-08-11 07:42:24 UTC

Description of problem:


We use exec probes when we launch VMIs to know when virt-launcher is fully started.


We ran into the following issues with the exec probes:

https://bugzilla.redhat.com/show_bug.cgi?id=1817057
https://bugzilla.redhat.com/show_bug.cgi?id=1855067
https://bugzilla.redhat.com/show_bug.cgi?id=1848524
https://bugzilla.redhat.com/show_bug.cgi?id=1850168

Since it benefits kubevirt overall to just remove the exec probes, independent of seeing them fixed in OCP, I am proposing https://github.com/kubevirt/kubevirt/pull/3971, which is merged on master already.

This has the advantage that we don't rely on exec probes anymore (at the moment), removes exec probe warning events which we can't fully eliminate with the generic readiness mechanisms and speeds up VMI start.

I see this as the proper solution, because as one can see in https://bugzilla.redhat.com/show_bug.cgi?id=1855067 it may even require kernel setting changes to make exec probes work nicely with pod limits.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Roman Mohr 2020-08-12 09:08:10 UTC

Merged and part of kubevirt 0.30.6 which is the base for CNV 2.4.1.

Comment 6 Kedar Bidarkar 2020-08-17 19:17:00 UTC

As seen below, we no longer have the readinessProbe

[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource-v2d6p -o yaml | grep readinessProbe
[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource2-77vzg -o yaml | grep readinessProbe
[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vm-rhel81-mb2q5 -o yaml | grep readinessProbe

Comment 10 Kedar Bidarkar 2020-08-24 12:11:44 UTC

Summary: 
1) The VM and pods are running successfully for 5+ days, as seen above.
2) Also the memory consumption for volumecontainerdisk is 4Mi  seen as per the kubectl top command.
3) The cDisk and DV based VMI's are accessible and stable even when running for 5+ days.
4) we no longer have the readinessProbe in the VMI Pod.

Comment 14 errata-xmlrpc 2020-09-03 20:31:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 2.4.1 images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3629

Comment 15 Chris Paquin 2020-09-10 15:40:27 UTC

Note that customer 0 plans to migrate to OCP 4.5/CNV 2.4.1 which contains fix for this issue. The specific workload that is affected is only in being testing on OCP 4.4. Not sure that we need to try to fix in 4.4 at this point.