Bug 2150333
Summary: | SSP operator goes in to crashloopbackoff after removing hco.spec.tlsSecurityProfile for sometime | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | SATHEESARAN <sasundar> |
Component: | Installation | Assignee: | Simone Tiraboschi <stirabos> |
Status: | CLOSED DUPLICATE | QA Contact: | Natalie Gavrielov <ngavrilo> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.12.0 | CC: | kmajcher, stirabos |
Target Milestone: | --- | ||
Target Release: | 4.12.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-12-16 13:56:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
SATHEESARAN
2022-12-02 14:27:06 UTC
Very consistent reproducer is: 1. Modify hco.spec.tlsSecurityProfile to *Old* 2. Verify that the SSP operator is running and all the managed CRs (CNAO, SSP, KubeVirt & CDI ) are updated with security profile as 'old' 3. Now remove hco.spec.tlsSecurityProfile 4. Validate the status of SSP operator pod. Now capturing the output of above steps in 4.12 cluster -------------------------------------------------------- 1. Editing hco.spec to modify tlsSecurityProfile as old <snip> [cnv-qe-jenkins@ ~]$ oc edit hco kubevirt-hyperconverged -n openshift-cnv hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged edited [cnv-qe-jenkins@ ~]$ sh getsecprof.sh API server - HCO - {"old":{},"type":"Old"} CNAO - {"old":{},"type":"Old"} SSP - {"old":{},"type":"Old"} CDI - {"old":{},"type":"Old"} Kubevirt - {"ciphers":["TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256","TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256","TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256","TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256","TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA","TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA","TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA","TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA","TLS_RSA_WITH_AES_128_GCM_SHA256","TLS_RSA_WITH_AES_256_GCM_SHA384","TLS_RSA_WITH_AES_128_CBC_SHA256","TLS_RSA_WITH_AES_128_CBC_SHA","TLS_RSA_WITH_AES_256_CBC_SHA","TLS_RSA_WITH_3DES_EDE_CBC_SHA"],"minTLSVersion":"VersionTLS10"} </snip> 2. Confirming that SSP is running <snip> [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 1/1 Running 4 (21s ago) 15h [cnv-qe-jenkins@ ~]$ </snip> 3. Remove the hco.spec.tlsSecurityProfile and then check for SSP operator status <snip> [cnv-qe-jenkins@ ~]$ oc edit hco -n openshift-cnv kubevirt-hyperconverged hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged edited [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 Completed 4 (56s ago) 15h [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 Completed 4 (58s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 Completed 4 (60s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (8s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (9s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (10s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (12s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (13s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (14s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (16s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 CrashLoopBackOff 4 (17s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 Running 5 (19s ago) 15h [cnv-qe-jenkins@ ~]$ oc get pods -n openshift-cnv | grep ssp ssp-operator-5d7cdcdd65-m4fjs 0/1 Running 5 (21s ago) 15h </snip> Unable to reproduce with v4.12.0-745; ssp restarts only once as expected: [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o=json | jq '.spec.tlsSecurityProfile' { "old": {}, "type": "Old" } [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 1/1 Running 2 (3m26s ago) 64m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc patch hco --type=json kubevirt-hyperconverged -n openshift-cnv -p '[{"op": "remove", "path": "/spec/tlsSecurityProfile" }]' hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o=json | jq '.spec.tlsSecurityProfile' null [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 0/1 Running 3 (15s ago) 64m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 0/1 Running 3 (19s ago) 64m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 0/1 Running 3 (20s ago) 64m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 1/1 Running 3 (24s ago) 64m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 1/1 Running 3 (27s ago) 64m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 1/1 Running 3 (30s ago) 64m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 1/1 Running 3 (3m6s ago) 67m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ oc get pods -n openshift-cnv -l name=ssp-operator NAME READY STATUS RESTARTS AGE ssp-operator-5d7cdcdd65-2664s 1/1 Running 3 (146m ago) 3h31m [cnv-qe-jenkins@c01-ss-412d-9ksd9-executor ~]$ even in the example on comment 2, ssp-operator moved from: Completed 4 (56s ago) to Running 5 (19s ago) So moving from 4 to 5, the pod got restarted only once. The issue is just that the pod has been running only for 56s seconds, and so on restart it still got marked as CrashLoopBackOff. The more we repeat the experiment in a row, the more the exponential backoff time is going to increase. Closing as not a bug, feel free to reopen a different one on ssp component if you think that having a pod self-killing itself in order to consume a configuration change is bad idea. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy says: After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes. Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container. So, in order to avoid getting the CrashLoopBackOff state, you should wait 10 minutes between a configuration change and the next one. *** This bug has been marked as a duplicate of bug 2151248 *** |