1851782 – Authentication operator degraded when cluster is built with Multitenant plugin

Bug 1851782 - Authentication operator degraded when cluster is built with Multitenant plugin

Summary: Authentication operator degraded when cluster is built with Multitenant plugin

Keywords:
Status:	CLOSED DUPLICATE of bug 1837575
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Ben Bennett
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-28 23:12 UTC by Alan Chan
Modified:	2023-10-06 20:54 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-30 20:50:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Alan Chan 2020-06-28 23:12:14 UTC

Description of problem:
-----------------------

Cluster is built with Multitenant plugin via customized manifest:

$ cat manifests/cluster-network-03-config.yml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    type: OpenShiftSDN
    openshiftSDNConfig:
      mode: Multitenant

Appears that after a successful build, the authentication operator goes into degraded mode:

[alchan-redhat.com@clientvm 0 ~]$ oc get co authentication 
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication   4.4.0     True        False         True       55m

[alchan-redhat.com@clientvm 0 ~]$ oc get co authentication -o json | jq '.status.conditions[0]'
{
  "lastTransitionTime": "2020-06-28T20:43:19Z",
  "message": "IngressStateEndpointsDegraded: Unhealthy addresses found: 10.129.0.30:Get https://10.129.0.30:6443/healthz: dial tcp 10.129.0.30:6443: connect: connection timed out,10.130.0.29:Get https://10.130.0.29:6443/healthz: dial tcp 10.130.0.29:6443: connect: connection timed out",
  "reason": "IngressStateEndpoints_UnhealthyAddresses",
  "status": "True",
  "type": "Degraded"
}

The 10.129.0.30 & 10.130.0.29 IPs are oauth-openshift pods in openshift-authentication namespace.

[alchan-redhat.com@clientvm 0 ~]$ oc get netnamespaces | grep authentication
openshift-authentication                                9296695    
openshift-authentication-operator                       7693696

Since they are in different netid, it prevents the authentication-operator pod connecting to oauth-openshift pods.

The workaround appears to be joining the two projects:

$ oc adm pod-network join-projects --to=openshift-authentication openshift-authentication-operator

The authentication operator then is not degraded anymore. 


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

- 4.4.0 has this issue.

- Latest 4.4.9 appears to be fine and does NOT has such issue. It appears that in 4.4.9, those two projects are all in the netid 1:

[alchan-redhat.com@clientvm 0 ~]$ oc get co authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication   4.4.9     True        False         False      9m44s

[alchan-redhat.com@clientvm 0 ~]$ oc get netnamespaces | grep authentication
openshift-authentication                                1          
openshift-authentication-operator                       1 

- Have not tested any other version in between 4.4.0 to 4.4.9.


Questions:
----------

- In which 4.4.z version is this fixed?

Comment 5 Maru Newby 2020-06-30 07:34:02 UTC

What action(s) are expected of the api/auth team that suggested assignment to me? It's not at all clear to me from the comments that appear on this bz.

Comment 7 David Eads 2020-06-30 20:43:45 UTC

It was fixed in 4.4.8 with https://github.com/openshift/cluster-network-operator/pull/657 related to https://bugzilla.redhat.com/show_bug.cgi?id=1841507.

The question about what happens for upgrades if someone worked around the problem (comment 4) is best addressed by the SDN team. Reassigning.

Comment 8 Maru Newby 2020-06-30 20:50:34 UTC

This bz is a duplicate of [1]. The fix is already merged for 4.5 [1] and backported to 4.4. 

For future reference, the list of namespaces to join when running in multitenant mode is maintained by the sdn team (openshift-sdn component). 

1: https://bugzilla.redhat.com/show_bug.cgi?id=1837575
2: https://github.com/openshift/cluster-network-operator/pull/650
3: https://github.com/openshift/cluster-network-operator/pull/657

*** This bug has been marked as a duplicate of bug 1837575 ***

Comment 9 Red Hat Bugzilla 2023-09-14 06:03:01 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.