Bug 2092003 - PR 3162 | BZ 2084450 - invalid URL schema for AWS causes tests to perma fail and break the cloud-network-config-controller
Summary: PR 3162 | BZ 2084450 - invalid URL schema for AWS causes tests to perma fail ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Andreas Karis
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-31 14:10 UTC by Andreas Karis
Modified: 2022-08-10 11:15 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:15:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 3170 0 None open Bug 2092003: Fixup of URL for AWS unit/file to compute instance provider-id 2022-05-31 14:12:05 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:15:32 UTC

Description Andreas Karis 2022-05-31 14:10:02 UTC
That's from the cloud-network-config controller pod:

2022-05-31T06:54:25.394180800Z E0531 06:54:25.394140       1 controller.go:165] error syncing 'ip-10-0-213-157.us-west-1.compute.internal': error retrieving the private IP configuration for node: ip-10-0-213-157.us-west-1.compute.internal, err: the URI is not expected: aws://us-west-1b/i-0c19616e4c5427e53, requeuing in node workqueue
2022-05-31T06:55:01.718861205Z E0531 06:55:01.718820       1 controller.go:165] error syncing 'ip-10-0-184-199.us-west-1.compute.internal': error retrieving the private IP configuration for node: ip-10-0-184-199.us-west-1.compute.internal, err: the URI is not expected: aws://us-west-1c/i-0f7b0659dce846ed8, requeuing in node workqueue
2022-05-31T06:55:06.355822553Z E0531 06:55:06.355780       1 controller.go:165] error syncing 'ip-10-0-213-157.us-west-1.compute.internal': error retrieving the private IP configuration for node: ip-10-0-213-157.us-west-1.compute.internal, err: the URI is not expected: aws://us-west-1b/i-0c19616e4c5427e53, requeuing in node workqueue
2022-05-31T06:56:23.638990397Z I0531 06:56:23.638951       1 node_controller.go:82] corev1.Node: 'ip-10-0-184-199.us-west-1.compute.internal' in work queue no longer exists
2022-05-31T06:56:23.638990397Z I0531 06:56:23.638975       1 controller.go:160] Dropping key 'ip-10-0-184-199.us-west-1.compute.internal' from the node workqueue
2022-05-31T06:56:28.276665181Z I0531 06:56:28.276626       1 node_controller.go:82] corev1.Node: 'ip-10-0-213-157.us-west-1.compute.internal' in work queue no longer exists
2022-05-31T06:56:28.276665181Z I0531 06:56:28.276648       1 controller.go:160] Dropping key 'ip-10-0-213-157.us-west-1.compute.internal' from the node workqueue
2022-05-31T07:18:36.233407135Z E0531 07:18:36.233356       1 leaderelection.go:367] Failed to update lock: Put "https://api-int.ci-op-z2mw0vdr-be673.aws-2.ci.openshift.org:6443/api/v1/namespaces/openshift-cloud-network-config-controller/configmaps/cloud-network-config-controller-lock": read tcp 10.130.0.14:42286->10.0.131.50:6443: read: connection reset by peer

That's from the cloud-network-config-controller's code - before it looked like this: aws:///us-west-2a/i-008447f243eead273

324 //  This is what the node's providerID looks like on AWS
325 //      spec:
326 //   providerID: aws:///us-west-2a/i-008447f243eead273
327 //  i.e: zone/instanceID
328 func (a *AWS) getInstance(node *corev1.Node) (*ec2.Instance, error) {
329         providerData := strings.Split(node.Spec.ProviderID, "/")
330         if len(providerData) != 5 {
331                 return nil, UnexpectedURIError(node.Spec.ProviderID)
332         }

But, this changed now to look like this (number of slashes):

[akaris@linux failed-egressip]$ omg get node ip-10-0-168-208.ec2.internal -o yaml | grep -i providerid
        f:providerID: {}
  providerID: aws://us-east-1d/i-036464a5bf998bef9

Likely culprit: #3162 | https://bugzilla.redhat.com/show_bug.cgi?id=2084450

Comment 2 Rio Liu 2022-06-09 04:53:59 UTC
verified on 4.11.0-0.nightly-2022-06-08-204347

» oc get node/ip-10-0-135-219.us-east-2.compute.internal -o yaml|grep -i provider
  providerID: aws:///us-east-2c/i-05d802320fa28c922

 » oc logs -n openshift-cloud-network-config-controller cloud-network-config-controller-66f7d794f7-thv52 |grep controller.go
I0609 04:08:21.428818       1 controller.go:88] Starting node controller
I0609 04:08:21.428927       1 controller.go:91] Waiting for informer caches to sync for node workqueue
I0609 04:08:21.428976       1 controller.go:88] Starting secret controller
I0609 04:08:21.428996       1 controller.go:91] Waiting for informer caches to sync for secret workqueue
I0609 04:08:21.429025       1 controller.go:88] Starting cloud-private-ip-config controller
I0609 04:08:21.429052       1 controller.go:91] Waiting for informer caches to sync for cloud-private-ip-config workqueue
I0609 04:08:21.439493       1 controller.go:182] Assigning key: ip-10-0-135-232.us-east-2.compute.internal to node workqueue
I0609 04:08:21.439549       1 controller.go:182] Assigning key: ip-10-0-154-95.us-east-2.compute.internal to node workqueue
I0609 04:08:21.439564       1 controller.go:182] Assigning key: ip-10-0-178-244.us-east-2.compute.internal to node workqueue
I0609 04:08:21.439570       1 controller.go:182] Assigning key: ip-10-0-229-69.us-east-2.compute.internal to node workqueue
I0609 04:08:21.439577       1 controller.go:182] Assigning key: ip-10-0-248-104.us-east-2.compute.internal to node workqueue
I0609 04:08:21.439583       1 controller.go:182] Assigning key: ip-10-0-135-219.us-east-2.compute.internal to node workqueue
I0609 04:08:21.529264       1 controller.go:96] Starting cloud-private-ip-config workers
I0609 04:08:21.529281       1 controller.go:96] Starting secret workers
I0609 04:08:21.529281       1 controller.go:96] Starting node workers
I0609 04:08:21.529512       1 controller.go:102] Started node workers
I0609 04:08:21.529527       1 controller.go:160] Dropping key 'ip-10-0-154-95.us-east-2.compute.internal' from the node workqueue
I0609 04:08:21.529546       1 controller.go:160] Dropping key 'ip-10-0-135-219.us-east-2.compute.internal' from the node workqueue
I0609 04:08:21.529547       1 controller.go:160] Dropping key 'ip-10-0-248-104.us-east-2.compute.internal' from the node workqueue
I0609 04:08:21.529526       1 controller.go:160] Dropping key 'ip-10-0-135-232.us-east-2.compute.internal' from the node workqueue
I0609 04:08:21.529461       1 controller.go:102] Started secret workers
I0609 04:08:21.529399       1 controller.go:102] Started cloud-private-ip-config workers
I0609 04:08:21.529538       1 controller.go:160] Dropping key 'ip-10-0-178-244.us-east-2.compute.internal' from the node workqueue
I0609 04:08:21.529594       1 controller.go:160] Dropping key 'ip-10-0-229-69.us-east-2.compute.internal' from the node workqueue

No invalid URI error found

Comment 4 errata-xmlrpc 2022-08-10 11:15:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.