2081844 – (release-4.10) disconnected insights operator remains degraded after editing pull secret

Bug 2081844 - (release-4.10) disconnected insights operator remains degraded after editing pull secret

Summary: (release-4.10) disconnected insights operator remains degraded after editing ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Insights Operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.10.z
Assignee:	Tomas Remes
QA Contact:	Joao Fula
Docs Contact:
URL:
Whiteboard:
Depends On:	2081997
Blocks:	2083467
TreeView+	depends on / blocked

Reported:	2022-05-04 19:36 UTC by Michael Hrivnak
Modified:	2022-05-23 13:25 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2081997 2083467 (view as bug list)
Environment:
Last Closed:	2022-05-23 13:25:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift insights-operator pull 619	0	None	open	[release-4.10] Bug 2081844: Fix the clusteroperator conditions values…	2022-05-06 12:15:40 UTC
Red Hat Product Errata	RHBA-2022:2258	0	None	None	None	2022-05-23 13:25:34 UTC

Description Michael Hrivnak 2022-05-04 19:36:05 UTC

Description of problem:

I have a disconnected 4.10.3 cluster for a customer PoC. The insights operator is degraded as expected, because it can't reach cloud.redhat.com. I followed the directions linked below for editing the pull secret to stop it from trying to connect to cloud.redhat.com, but it keeps trying and remains degraded.

https://docs.openshift.com/container-platform/4.10/post_installation_configuration/connected-to-disconnected.html#connected-to-disconnected-restore-insights_connected-to-disconnected


Version-Release number of selected component (if applicable):

4.10.3


How reproducible:

I've only tried once.


Steps to Reproduce:
1. install disconnected cluster with platform baremetal
2. run the documented procedure to remove "cloud.redhat.com" from the cluster's pull secret
3. run `oc get co` or `oc get clusterversion`


Actual results:

See that insights is still degraded.

% oc get co insights
NAME       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
insights   4.10.3    True        False         True       5d23h   Unable to report: unable to build request to connect to Insights server: Post "https://cloud.redhat.com/api/ingress/v1/upload": dial tcp 23.218.165.26:443: i/o timeout

% oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.3    True        False         5d23h   Error while reconciling 4.10.3: the cluster operator insights is degraded


Expected results:

not degraded


Additional info:

Here is the pull secret contents, just to show that I followed the directions (I changed the cluster FQDN to remove customer references). Note that cloud.redhat.com is not in the list:

% oc get secret/pull-secret -n openshift-config -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq '.auths | keys[]'
"poc-registry-quay-quay-poc.apps.some-demo-cluster.some.customer.com"
"quay.io"
"registry.connect.redhat.com"
"registry.redhat.io"


The insight-operator pod log shows this every 2 minutes:

I0504 19:16:01.859337       1 controller.go:203] Number of last upload failures 341 exceeded the threshold 5. Marking as degraded.
I0504 19:16:01.859386       1 controller.go:380] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server: Post "https://cloud.redhat.com/api/ingress/v1/upload": dial tcp 23.218.165.26:443: i/o timeout
I0504 19:16:01.859391       1 controller.go:385] The operator is marked as disabled


I also see that when I made the change to the pull secret, 2 new machineconfigs got created: one for masters, one for workers. I see that each master node has these annotations indicating that the new machineconfig did get applied:

  "machineconfiguration.openshift.io/currentConfig": "rendered-master-59c21ecfa14984911e815a3d4e1eb0db",
  "machineconfiguration.openshift.io/desiredConfig": "rendered-master-59c21ecfa14984911e815a3d4e1eb0db",

% oc get machineconfigs
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
00-worker                                          14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
01-master-container-runtime                        14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
01-master-kubelet                                  14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
01-worker-container-runtime                        14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
01-worker-kubelet                                  14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
50-masters-chrony-configuration                                                               2.2.0             6d
50-workers-chrony-configuration                                                               2.2.0             6d
99-assisted-installer-master-ssh                                                              3.1.0             6d
99-master-generated-registries                     14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
99-master-ssh                                                                                 3.2.0             6d
99-worker-generated-registries                     14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
99-worker-ssh                                                                                 3.2.0             6d
rendered-master-59c21ecfa14984911e815a3d4e1eb0db   14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             107m
rendered-master-5dc83cfb95c57b713c47070143b4b429   14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
rendered-worker-abd7e90e575ef59cee3208838d35613d   14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             6d
rendered-worker-bad7a562a38faa199500cf825b227404   14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22   3.2.0             107m

Comment 1 Michael Hrivnak 2022-05-04 21:08:09 UTC

I also see this in the operator log every 5 minutes:


I0504 20:57:08.916006       1 configobserver.go:127] Refreshing configuration from cluster pull secret
I0504 20:57:08.918181       1 configobserver.go:154] Refreshing configuration from cluster secret
I0504 20:57:08.919550       1 configobserver.go:112] support secret does not exist

Comment 6 Joao Fula 2022-05-12 09:57:57 UTC

Verified on 4.10.0-0.nightly-2022-05-11-183751.

Steps to reproduce:
1.  Create support secret
2.  Add the value/key combinations:
interval = 1m
endpoint = https://httpstat.us/404
3. Wait until the insights cluster operator becomes degraded.
4. Remove the cloud token from the pull secret
5. Wait until the insights cluster operator becomes disabled.
Conditions to verify:
    insights cluster operator is not degraded
    insights cluster operator is not upload_degraded
    insights cluster operator is disabled

Comment 9 errata-xmlrpc 2022-05-23 13:25:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2258

Note You need to log in before you can comment on or make changes to this bug.