1867904 – Exec probes on VMI pods have various issues, let's remove them

Bug 1867904 - Exec probes on VMI pods have various issues, let's remove them

Summary: Exec probes on VMI pods have various issues, let's remove them

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2.4.1
Assignee:	sgott
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-11 07:42 UTC by Roman Mohr
Modified:	2023-12-15 18:51 UTC (History)
CC List:	7 users (show)
Fixed In Version:	virt-operator-container-v2.4.1-2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-03 20:31:08 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 3946	None	closed	Remove exec readiness probes in kubevirt	2021-01-31 10:47:28 UTC
Github	kubevirt kubevirt pull 3971	None	closed	[release-0.30] Remove exec readiness probes in kubevirt	2021-01-31 10:47:30 UTC
Red Hat Bugzilla	1855067	urgent	CLOSED	nproc * 1/2 MiB of each container memory may be taken by RHEL8 kernel slabs	2023-12-15 18:25:13 UTC
Red Hat Issue Tracker	CNV-36418	None	None	None	2023-12-15 18:51:20 UTC
Red Hat Product Errata	RHBA-2020:3629	None	None	None	2020-09-03 20:31:20 UTC

Description Roman Mohr 2020-08-11 07:42:24 UTC

Description of problem:


We use exec probes when we launch VMIs to know when virt-launcher is fully started.


We ran into the following issues with the exec probes:

https://bugzilla.redhat.com/show_bug.cgi?id=1817057
https://bugzilla.redhat.com/show_bug.cgi?id=1855067
https://bugzilla.redhat.com/show_bug.cgi?id=1848524
https://bugzilla.redhat.com/show_bug.cgi?id=1850168

Since it benefits kubevirt overall to just remove the exec probes, independent of seeing them fixed in OCP, I am proposing https://github.com/kubevirt/kubevirt/pull/3971, which is merged on master already.

This has the advantage that we don't rely on exec probes anymore (at the moment), removes exec probe warning events which we can't fully eliminate with the generic readiness mechanisms and speeds up VMI start.

I see this as the proper solution, because as one can see in https://bugzilla.redhat.com/show_bug.cgi?id=1855067 it may even require kernel setting changes to make exec probes work nicely with pod limits.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Roman Mohr 2020-08-12 09:08:10 UTC

Merged and part of kubevirt 0.30.6 which is the base for CNV 2.4.1.

Comment 6 Kedar Bidarkar 2020-08-17 19:17:00 UTC

As seen below, we no longer have the readinessProbe

[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource-v2d6p -o yaml | grep readinessProbe
[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vmi-fedora32-resource2-77vzg -o yaml | grep readinessProbe
[kbidarka@kbidarka-host osdc]$ oc get pod virt-launcher-vm-rhel81-mb2q5 -o yaml | grep readinessProbe

Comment 10 Kedar Bidarkar 2020-08-24 12:11:44 UTC

Summary: 
1) The VM and pods are running successfully for 5+ days, as seen above.
2) Also the memory consumption for volumecontainerdisk is 4Mi  seen as per the kubectl top command.
3) The cDisk and DV based VMI's are accessible and stable even when running for 5+ days.
4) we no longer have the readinessProbe in the VMI Pod.

Comment 14 errata-xmlrpc 2020-09-03 20:31:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 2.4.1 images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3629

Comment 15 Chris Paquin 2020-09-10 15:40:27 UTC

Note that customer 0 plans to migrate to OCP 4.5/CNV 2.4.1 which contains fix for this issue. The specific workload that is affected is only in being testing on OCP 4.4. Not sure that we need to try to fix in 4.4 at this point.

Note You need to log in before you can comment on or make changes to this bug.