Bug 2127947

Summary: cluster-network-addons-config tlsSecurityProfle takes a long time to update after setting APIServer
Product: Container Native Virtualization (CNV) Reporter: Yossi Segev <ysegev>
Component: InstallationAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Debarati Basu-Nag <dbasunag>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.12.0CC: dbasunag, stirabos
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-v4.12.0-541 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-24 13:40:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yossi Segev 2022-09-19 12:40:36 UTC
Description of problem:
When setting tlsSecurityProfile in APIServer, it should take up to 60 seconds (the HCO period time) to be updated in cluster-network-addons-operator, but it takes much more (more than 10 minutes)


Version-Release number of selected component (if applicable):
OCP 4.12.0-ec.1
CNV 4.12.0
hyperconverged-cluster-operator: v4.12.0-65


How reproducible:
100%


Steps to Reproduce:
1.
Verify no tlsSecurityProfile is set in HCO:
$ oc get hco -n openshift-cnv kubevirt-hyperconverged -ojsonpath={.spec.tlsSecurityProfile} | jq
$

2.
Apply a change in APIserver, to set a new tlsEcurityProfile:
a. $ oc edit apiserver cluster

b. Add (to .spec):
spec:
  ...
  tlsSecurityProfile:
    old: {}
    type: Old

c. Exit the resource edit

3.
Check tlsEcurityProdile in NetworkAddonsConfig after 60 seconds (operator cycle time):
$ oc get NetworkAddonsConfig cluster -ojsonpath={.spec.tlsSecurityProfile};echo


Actual results:
Still not updated
{"intermediate":{},"type":"Intermediate"}


Expected results:
Same setting as in the APIServer:
{"old":{},"type":"Old"}


Additional info:
Not sure it is related, but the log that appears in hco-operator immediately when applying the change in APIServer includes a message saying "No HyperConverged resource":

2022-09-19T12:22:24.916912997Z {"level":"info","ts":1663590144.916714,"logger":"controller_hyperconverged","msg":"Reconciling for openshiftconfigv1.APIServer"}
2022-09-19T12:22:24.916912997Z {"level":"info","ts":1663590144.9168296,"logger":"controller_hyperconverged","msg":"Reconciling for openshiftconfigv1.APIServer"}
2022-09-19T12:22:24.916998663Z {"level":"info","ts":1663590144.9169204,"logger":"controller_hyperconverged","msg":"Triggered by ApiServer CR, refreshing it","Request.Namespace":"openshift-cnv","Request.Name":"api-server-cr-5bdfa811-d680-43c0-9feb-d308c63eaa11"}
2022-09-19T12:22:24.917475913Z {"level":"info","ts":1663590144.9171374,"logger":"controller_hyperconverged","msg":"No HyperConverged resource","Request.Namespace":"openshift-cnv","Request.Name":"api-server-cr-5bdfa811-d680-43c0-9feb-d308c63eaa11"}

Comment 1 Yossi Segev 2022-09-20 08:14:06 UTC
Update:
I now see another issue, when the same scenario leads to HCO's underlying CRs (NetworkAddonsConfig in my case) not being updated at all.
A W/A is setting something in HCO (I toggled the value of sriovLiveMigration feature-gate from "true" to "false"), which refreshes HCO against APIServer, thus updating its components.
This may have also been the case in the original scenario of this bug, and maybe I didn't notice that I changed something in HCO, and only as a result of that the CNAO got updated (rather than after a long timeout).

Comment 2 Yossi Segev 2022-10-02 11:26:09 UTC
Verified with versions
OCP 4.12.0-ec.1
CNV 4.12.0 (HCO bundle v4.12.0-548)
hyperconverged-cluster-operator:v4.12.0-72

by repeating the exact exact scenario from the bug description - tlsSecurityProfile was set on NetworkAddonsConfig immediately after setting it in APIServer.

Comment 6 errata-xmlrpc 2023-01-24 13:40:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408