Description of problem: I have a disconnected 4.10.3 cluster for a customer PoC. The insights operator is degraded as expected, because it can't reach cloud.redhat.com. I followed the directions linked below for editing the pull secret to stop it from trying to connect to cloud.redhat.com, but it keeps trying and remains degraded. https://docs.openshift.com/container-platform/4.10/post_installation_configuration/connected-to-disconnected.html#connected-to-disconnected-restore-insights_connected-to-disconnected Version-Release number of selected component (if applicable): 4.10.3 How reproducible: I've only tried once. Steps to Reproduce: 1. install disconnected cluster with platform baremetal 2. run the documented procedure to remove "cloud.redhat.com" from the cluster's pull secret 3. run `oc get co` or `oc get clusterversion` Actual results: See that insights is still degraded. % oc get co insights NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE insights 4.10.3 True False True 5d23h Unable to report: unable to build request to connect to Insights server: Post "https://cloud.redhat.com/api/ingress/v1/upload": dial tcp 23.218.165.26:443: i/o timeout % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.3 True False 5d23h Error while reconciling 4.10.3: the cluster operator insights is degraded Expected results: not degraded Additional info: Here is the pull secret contents, just to show that I followed the directions (I changed the cluster FQDN to remove customer references). Note that cloud.redhat.com is not in the list: % oc get secret/pull-secret -n openshift-config -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq '.auths | keys[]' "poc-registry-quay-quay-poc.apps.some-demo-cluster.some.customer.com" "quay.io" "registry.connect.redhat.com" "registry.redhat.io" The insight-operator pod log shows this every 2 minutes: I0504 19:16:01.859337 1 controller.go:203] Number of last upload failures 341 exceeded the threshold 5. Marking as degraded. I0504 19:16:01.859386 1 controller.go:380] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server: Post "https://cloud.redhat.com/api/ingress/v1/upload": dial tcp 23.218.165.26:443: i/o timeout I0504 19:16:01.859391 1 controller.go:385] The operator is marked as disabled I also see that when I made the change to the pull secret, 2 new machineconfigs got created: one for masters, one for workers. I see that each master node has these annotations indicating that the new machineconfig did get applied: "machineconfiguration.openshift.io/currentConfig": "rendered-master-59c21ecfa14984911e815a3d4e1eb0db", "machineconfiguration.openshift.io/desiredConfig": "rendered-master-59c21ecfa14984911e815a3d4e1eb0db", % oc get machineconfigs NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 00-worker 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 01-master-container-runtime 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 01-master-kubelet 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 01-worker-container-runtime 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 01-worker-kubelet 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 50-masters-chrony-configuration 2.2.0 6d 50-workers-chrony-configuration 2.2.0 6d 99-assisted-installer-master-ssh 3.1.0 6d 99-master-generated-registries 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 99-master-ssh 3.2.0 6d 99-worker-generated-registries 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d 99-worker-ssh 3.2.0 6d rendered-master-59c21ecfa14984911e815a3d4e1eb0db 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 107m rendered-master-5dc83cfb95c57b713c47070143b4b429 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d rendered-worker-abd7e90e575ef59cee3208838d35613d 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 6d rendered-worker-bad7a562a38faa199500cf825b227404 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22 3.2.0 107m
I also see this in the operator log every 5 minutes: I0504 20:57:08.916006 1 configobserver.go:127] Refreshing configuration from cluster pull secret I0504 20:57:08.918181 1 configobserver.go:154] Refreshing configuration from cluster secret I0504 20:57:08.919550 1 configobserver.go:112] support secret does not exist
Verified on 4.10.0-0.nightly-2022-05-11-183751. Steps to reproduce: 1. Create support secret 2. Add the value/key combinations: interval = 1m endpoint = https://httpstat.us/404 3. Wait until the insights cluster operator becomes degraded. 4. Remove the cloud token from the pull secret 5. Wait until the insights cluster operator becomes disabled. Conditions to verify: insights cluster operator is not degraded insights cluster operator is not upload_degraded insights cluster operator is disabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.15 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2258