2151248 – SSP pods moving to CrashLoopBackOff state for long duration when tlssecurityProfile is changed often

Bug 2151248 - SSP pods moving to CrashLoopBackOff state for long duration when tlssecurityProfile is changed often

Summary: SSP pods moving to CrashLoopBackOff state for long duration when tlssecurityP...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Infrastructure
Sub Component:
Version:	4.12.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.14.0
Assignee:	opokorny
QA Contact:	Geetika Kapoor
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2150333 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-12-06 13:50 UTC by Geetika Kapoor
Modified:	2023-11-08 14:05 UTC (History)
CC List:	4 users (show)
Fixed In Version:	CNV v4.14.0.rhel9-1787 / ssp v4.14.0-101
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 14:05:03 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-23186	0	None	None	None	2022-12-06 13:53:51 UTC
Red Hat Product Errata	RHSA-2023:6817	0	None	None	None	2023-11-08 14:05:16 UTC

Description Geetika Kapoor 2022-12-06 13:50:13 UTC

Description of problem:

ssp pod continue to be in CrashLoopBackOff for nearly ~5 mins when  tlssecurityProfile is changed often.

1. Set HCO tlsSecurityProfile as old.

oc get hco kubevirt-hyperconverged -n openshift-cnv -ojsonpath={.spec.tlsSecurityProfile} 
{"old":{},"type":"Old"}

2. Set ssp tlssecurityProfile explicitly to custom.

oc patch ssp -n openshift-cnv --type=json ssp-kubevirt-hyperconverged -p '[{"op": "replace", "path": /spec/tlsSecurityProfile, "value": {custom: {minTLSVersion: "VersionTLS13", ciphers: ["TLS_AES_128_GCM_SHA256", "TLS_CHACHA20_POLY1305_SHA256"]}, type: "Custom"} }]'

3. Expected is HCO should try to propogate it's tls settings to ssp. 

$ oc get ssp ssp-kubevirt-hyperconverged -n openshift-cnv -ojsonpath={.spec.tlsSecurityProfile}
{"old":{},"type":"Old"}

However during this whole procedure, ssp(ssp-operator-79bbc48bc5-tch2n) pod continue to be in CrashLoopBackOff for nearly ~5 mins.

oc get pods -A -w | grep -i ssp
openshift-cnv                                      ssp-operator-79bbc48bc5-tch2n                                     0/1     CrashLoopBackOff   10 (4m54s ago)   28h



Version-Release number of selected component (if applicable):
4.12

How reproducible:

always
Steps to Reproduce:
1.mentioned above
2.
3.

Actual results:

ssp pods goes to crashed state and sometimes it is too often and for longer time.

Expected results:

ssp pods should not be crashed often and wait time should be less
Additional info:

Comment 1 SATHEESARAN 2022-12-06 14:11:43 UTC

I have seen a similar issue and reported a bug - https://bugzilla.redhat.com/show_bug.cgi?id=2150333

I believe this should be the same issue.

Comment 4 Simone Tiraboschi 2022-12-16 14:55:45 UTC

*** Bug 2150333 has been marked as a duplicate of this bug. ***

Comment 5 Simone Tiraboschi 2022-12-16 14:59:12 UTC

Currently SSP restarts itself on each change on tlsSecurityProfile.
This is somehow acceptable for end users that are probably going to amend the configuration only once but it's definitely cumbersome for automated tests that tries to apply more changes in a row.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
says:

After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes.
Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container.

So, in order to avoid paying the CrashLoopBackOff timne, you should wait 10 minutes between a configuration change and the next one.

Comment 6 Geetika Kapoor 2023-10-04 23:10:10 UTC

Test Cases:

Test Case 1: try to patch ssp in a row for 50 times between intermediate, custom, modern.(Change in hco)

Test Result : 
1. everytime it acquires ciphers/tls based on hco.
2. No crashloop happens for ssp-operator pods.

Test Case 2: try to patch ssp in a row for 50 times between intermediate, custom, modern.(Change in apiserver)

Test Result : 
1. everytime it acquires ciphers/tls based on apiserver.
2. No crashloop happens for ssp-operator pods.

Comment 9 errata-xmlrpc 2023-11-08 14:05:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Note You need to log in before you can comment on or make changes to this bug.