Bug 1851782 - Authentication operator degraded when cluster is built with Multitenant plugin
Summary: Authentication operator degraded when cluster is built with Multitenant plugin
Keywords:
Status: CLOSED DUPLICATE of bug 1837575
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-28 23:12 UTC by Alan Chan
Modified: 2023-10-06 20:54 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-30 20:50:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alan Chan 2020-06-28 23:12:14 UTC
Description of problem:
-----------------------

Cluster is built with Multitenant plugin via customized manifest:

$ cat manifests/cluster-network-03-config.yml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    type: OpenShiftSDN
    openshiftSDNConfig:
      mode: Multitenant

Appears that after a successful build, the authentication operator goes into degraded mode:

[alchan-redhat.com@clientvm 0 ~]$ oc get co authentication 
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication   4.4.0     True        False         True       55m

[alchan-redhat.com@clientvm 0 ~]$ oc get co authentication -o json | jq '.status.conditions[0]'
{
  "lastTransitionTime": "2020-06-28T20:43:19Z",
  "message": "IngressStateEndpointsDegraded: Unhealthy addresses found: 10.129.0.30:Get https://10.129.0.30:6443/healthz: dial tcp 10.129.0.30:6443: connect: connection timed out,10.130.0.29:Get https://10.130.0.29:6443/healthz: dial tcp 10.130.0.29:6443: connect: connection timed out",
  "reason": "IngressStateEndpoints_UnhealthyAddresses",
  "status": "True",
  "type": "Degraded"
}

The 10.129.0.30 & 10.130.0.29 IPs are oauth-openshift pods in openshift-authentication namespace.

[alchan-redhat.com@clientvm 0 ~]$ oc get netnamespaces | grep authentication
openshift-authentication                                9296695    
openshift-authentication-operator                       7693696

Since they are in different netid, it prevents the authentication-operator pod connecting to oauth-openshift pods.

The workaround appears to be joining the two projects:

$ oc adm pod-network join-projects --to=openshift-authentication openshift-authentication-operator

The authentication operator then is not degraded anymore. 


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

- 4.4.0 has this issue.

- Latest 4.4.9 appears to be fine and does NOT has such issue. It appears that in 4.4.9, those two projects are all in the netid 1:

[alchan-redhat.com@clientvm 0 ~]$ oc get co authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication   4.4.9     True        False         False      9m44s

[alchan-redhat.com@clientvm 0 ~]$ oc get netnamespaces | grep authentication
openshift-authentication                                1          
openshift-authentication-operator                       1 

- Have not tested any other version in between 4.4.0 to 4.4.9.


Questions:
----------

- In which 4.4.z version is this fixed?

Comment 5 Maru Newby 2020-06-30 07:34:02 UTC
What action(s) are expected of the api/auth team that suggested assignment to me? It's not at all clear to me from the comments that appear on this bz.

Comment 7 David Eads 2020-06-30 20:43:45 UTC
It was fixed in 4.4.8 with https://github.com/openshift/cluster-network-operator/pull/657 related to https://bugzilla.redhat.com/show_bug.cgi?id=1841507.

The question about what happens for upgrades if someone worked around the problem (comment 4) is best addressed by the SDN team. Reassigning.

Comment 8 Maru Newby 2020-06-30 20:50:34 UTC
This bz is a duplicate of [1]. The fix is already merged for 4.5 [1] and backported to 4.4. 

For future reference, the list of namespaces to join when running in multitenant mode is maintained by the sdn team (openshift-sdn component). 

1: https://bugzilla.redhat.com/show_bug.cgi?id=1837575
2: https://github.com/openshift/cluster-network-operator/pull/650
3: https://github.com/openshift/cluster-network-operator/pull/657

*** This bug has been marked as a duplicate of bug 1837575 ***

Comment 9 Red Hat Bugzilla 2023-09-14 06:03:01 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.