Bug 1990140 - Samples operator management Removed failed to contact registry.redhat.io
Summary: Samples operator management Removed failed to contact registry.redhat.io
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Samples
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Gabe Montero
QA Contact: Jitendar Singh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-04 20:28 UTC by Dan Seals
Modified: 2021-10-18 17:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the samples operator performs a quick connection test to registry.redhat.io to help determine if it is in a disconnected environment; it was not setting a connection timeout on that attempt Consequence: if the underlying environment's socket connection defaults were lengthy enough, it would result in long times for this test to complete, and delay the samples operator completing start up and reporting to the cluster version operator Fix: samples operator now sets a reasonable connection timeout Result: There are no longer connection based delays reporting to the cluster version operator
Clone Of:
Environment:
Last Closed: 2021-10-18 17:44:59 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-samples-operator pull 384 0 None None None 2021-08-09 20:10:35 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:45:01 UTC

Description Dan Seals 2021-08-04 20:28:02 UTC
Description of problem:
Upgrade to 4.6.33 stalled.

oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         38m    Unable to apply 4.6.35: the cluster operator openshift-samples has not yet successfully rolled out


The cluster was installed in a disconnected environment and is currently disconnected.

The managementstate is set to Removed and was before the upgrade started.


Even with the managementstate set to removed the Sample OPerator still failed connect to registry.redhat.io
cluster-samples-operator-78977d7b45-qzvqj/cluster-samples-operator/cluster-samples-operator/logs/current.log:2021-07-14T14:48:57.096410152Z time="2021-07-14T14:48:57Z" level=info msg="test connection to registry.redhat.io failed with read tcp 10.236.34.51:35616->104.88.108.78:443: read: connection timed out"




Setting the managementState to Removed does not "disable" the operator and connections to registry.redhat.io are still attempted.

How to completely disable the operator?

Does "oc delete configs.samples cluster" disable the operator?

Comment 10 XiuJuan Wang 2021-09-07 04:06:26 UTC
Full samples operator logs http://pastebin.test.redhat.com/992111
When default set to removed, no connection session again.

time="2021-09-07T03:13:44Z" level=info msg="test connection with timeout failed with dial tcp 104.81.144.251:443: i/o timeout"
time="2021-09-07T03:14:04Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:14:15Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing"
time="2021-09-07T03:14:15Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing"
time="2021-09-07T03:14:18Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:14:24Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:14:40Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:14:44Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:15:04Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:15:18Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:15:24Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:15:37Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing"
time="2021-09-07T03:15:37Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing"
time="2021-09-07T03:15:40Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:15:44Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:16:04Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:16:18Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:16:24Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:16:40Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:16:44Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:17:04Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:17:18Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:17:24Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:17:40Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:17:44Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:18:04Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:18:18Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:18:21Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing"
time="2021-09-07T03:18:21Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing"
time="2021-09-07T03:18:24Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:18:39Z" level=info msg="test connection with timeout failed with dial tcp 104.105.42.89:443: i/o timeout"
time="2021-09-07T03:18:39Z" level=info msg="unable to establish HTTPS connection to registry.redhat.io after 3 minutes, bootstrap to Removed"
time="2021-09-07T03:18:39Z" level=info msg="creating default Config"
time="2021-09-07T03:18:40Z" level=info msg="metrics sample config retrieval failed with: config.samples.operator.openshift.io \"cluster\" not found"
time="2021-09-07T03:18:42Z" level=info msg="Attempting stage 1 Removed management state: RemovePending == true"
time="2021-09-07T03:18:42Z" level=info msg="CRDUPDATE process mgmt update spec Removed status "
time="2021-09-07T03:18:45Z" level=info msg="management state set to removed so deleting samples"
time="2021-09-07T03:18:45Z" level=info msg="Attempting stage 2 Removed management state: Status == Removed"
time="2021-09-07T03:18:45Z" level=info msg="CRDUPDATE process mgmt update spec Removed status Removed"
time="2021-09-07T03:18:48Z" level=info msg="Attempting stage 3 Removed management state: RemovePending == false"
time="2021-09-07T03:18:48Z" level=info msg="CRDUPDATE process mgmt update spec Removed status Removed"

Comment 13 errata-xmlrpc 2021-10-18 17:44:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.