Bug 1849036 - [IPI][OSP] Installer fails on proxy + externalDNS configuration. Authentication cluster
Summary: [IPI][OSP] Installer fails on proxy + externalDNS configuration. Authenticati...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Martin André
QA Contact: David Sanz
URL:
Whiteboard:
Depends On: 1851344 1866379 1868178
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-19 13:59 UTC by David Sanz
Modified: 2020-10-27 16:08 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:08:12 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:08:39 UTC

Description David Sanz 2020-06-19 13:59:20 UTC
Description of problem:

When installing using proxy + externalDNS, authentication cluster operator never gets available.

Logs from authentication-operator pod:

E0619 13:51:20.270807       1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: dial tcp 10.0.111.196:443: i/o timeout

10.0.111.196 IP address is the API floating ip:

[morenod@morenod-laptop ~]$ openstack floating ip list --long | grep 10.0.111.196
| d7bb6c14-82e0-45d6-9e66-875b6bb8d72e | 10.0.111.196        | 192.168.0.5      | 6e88657a-e418-40e5-8631-5a43ab54e70a | 316eeb47-1498-46b4-b39e-00ddf73bd2a5 | 542c6ebd48bf40fa857fc245c7572e30 | a4684936-c0d0-491b-b623-7659ba1ea501 | None   | preserve mrnd-13-46-px                                     | []   | None     | None       |
[morenod@morenod-laptop ~]$ openstack port list | grep mrnd-13-46-px | grep 192.168.0.5
| 6e88657a-e418-40e5-8631-5a43ab54e70a | mrnd-13-46-px-l6tr2-api-port       | fa:16:3e:c7:79:f7 | ip_address='192.168.0.5', subnet_id='8f3467f5-2771-4f1c-af5f-656ea1ee1657'                      | DOWN   |
[morenod@morenod-laptop ~]$ 


Compared with an installation without proxy or externalDNS, check done on controller.go:129 returns an EOF, not a timeout:

E0619 13:00:51.627750       1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: EOF

This seems to be controller, as operator gets available even with the EOF error:

I0619 13:00:53.430840       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"9dbd31c7-0d3b-4e8e-9360-0b4774253913", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from True to False ("RouteHealthDegraded: failed to GET route: EOF")
I0619 13:01:01.413373       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-06-19T13:00:53Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-06-19T13:01:01Z","message":"Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://192.168.3.157:6443/.well-known/oauth-authorization-server endpoint data","reason":"_WellKnownNotReady","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-06-19T13:01:01Z","status":"False","type":"Available"},{"lastTransitionTime":"2020-06-19T12:49:58Z","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0619 13:01:01.428440       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-06-19T13:00:53Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-06-19T13:01:01Z","message":"Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://192.168.3.157:6443/.well-known/oauth-authorization-server endpoint data","reason":"_WellKnownNotReady","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-06-19T13:01:01Z","status":"False","type":"Available"},{"lastTransitionTime":"2020-06-19T12:49:58Z","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0619 13:01:01.428948       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"9dbd31c7-0d3b-4e8e-9360-0b4774253913", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded message changed from "RouteHealthDegraded: failed to GET route: EOF" to "",Progressing changed from Unknown to True ("Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://192.168.3.157:6443/.well-known/oauth-authorization-server endpoint data"),Available changed from Unknown to False ("")

Version-Release number of the following components:
4.6.0-0.nightly-2020-06-19-051412

How reproducible:

Steps to Reproduce:
1.Install IPI on OSP using proxy and externalDNS
2.Check status of authentication cluster operator
3.

Actual results:
Cluster operators authentication and console (dependency from authentication) are not getting Available, making the installation failed

Expected results:
Authentication cluster operator captures the timeout as it does with the EOF error and continues its initialization


Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Stephen Cuppett 2020-06-19 14:25:24 UTC
Setting target release to current development version (4.6) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 2 Martin André 2020-06-25 14:09:07 UTC
The team considers this bug as valid. Considering this bug priority and our capacity, we are deferring this bug to an upcoming sprint. If there are reasons for us to reprioritise, please let us know.

Comment 3 David Sanz 2020-07-16 15:15:01 UTC
Verified on 4.6.0-0.nightly-2020-07-15-170241

Comment 6 errata-xmlrpc 2020-10-27 16:08:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.