1740121 – Install master as schedulerable node will met co authentication Err

Bug 1740121 - Install master as schedulerable node will met co authentication Err

Summary: Install master as schedulerable node will met co authentication Err

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	4.2.0
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Stefan Schimanski
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1740337
TreeView+	depends on / blocked

Reported:	2019-08-12 10:20 UTC by ge liu
Modified:	2019-10-16 06:35 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1740337 (view as bug list)
Environment:
Last Closed:	2019-10-16 06:35:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
error message when adding vm to ingress LB (108.09 KB, image/png) 2019-08-22 08:11 UTC, Hongan Li	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-authentication-operator pull 173	'None'	closed	Bug 1740121: Prevent global sync error degraded status flapping	2021-02-15 10:39:59 UTC
Github	openshift service-ca-operator pull 66	'None'	closed	Bug 1740121: Make service-ca have guaranteed scheduling	2021-02-15 10:39:59 UTC
Red Hat Product Errata	RHBA-2019:2922	None	None	None	2019-10-16 06:35:53 UTC

Comment 1 Mo 2019-08-12 14:31:40 UTC

Please provide the logs from "oc adm must-gather"

Comment 2 Mo 2019-08-12 15:22:33 UTC

Moving back to assigned as I know there are at least two distinct bugs here.

Comment 4 ge liu 2019-08-21 05:34:09 UTC

Hello Mo,
I tried with 4.2.0-0.nightly-2019-08-20-213632, it seems still have this issue.

# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                                 Unknown     Unknown       True       125m
cloud-credential                           4.2.0-0.nightly-2019-08-20-213632   True        False         False      129m
cluster-autoscaler                         4.2.0-0.nightly-2019-08-20-213632   True        False         False      124m
console                                    4.2.0-0.nightly-2019-08-20-213632   False       True          False      124m
dns                                        4.2.0-0.nightly-2019-08-20-213632   True        False         False      128m
image-registry                             4.2.0-0.nightly-2019-08-20-213632   True        False         False      123m
ingress                                    4.2.0-0.nightly-2019-08-20-213632   True        False         False      124m
insights                                   4.2.0-0.nightly-2019-08-20-213632   True        False         False      129m
kube-apiserver                             4.2.0-0.nightly-2019-08-20-213632   True        False         False      128m
kube-controller-manager                    4.2.0-0.nightly-2019-08-20-213632   True        False         False      126m
kube-scheduler                             4.2.0-0.nightly-2019-08-20-213632   True        False         False      126m
machine-api                                4.2.0-0.nightly-2019-08-20-213632   True        False         False      129m
machine-config                             4.2.0-0.nightly-2019-08-20-213632   True        False         False      128m
marketplace                                4.2.0-0.nightly-2019-08-20-213632   True        False         False      123m
monitoring                                 4.2.0-0.nightly-2019-08-20-213632   True        False         False      121m
network                                    4.2.0-0.nightly-2019-08-20-213632   True        False         False      128m
node-tuning                                4.2.0-0.nightly-2019-08-20-213632   True        False         False      125m
openshift-apiserver                        4.2.0-0.nightly-2019-08-20-213632   True        False         False      125m
openshift-controller-manager               4.2.0-0.nightly-2019-08-20-213632   True        False         False      128m
openshift-samples                          4.2.0-0.nightly-2019-08-20-213632   True        False         False      123m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-08-20-213632   True        False         False      128m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-08-20-213632   True        False         False      128m
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-08-20-213632   True        False         False      125m
service-ca                                 4.2.0-0.nightly-2019-08-20-213632   True        False         False      129m
service-catalog-apiserver                  4.2.0-0.nightly-2019-08-20-213632   True        False         False      125m
service-catalog-controller-manager         4.2.0-0.nightly-2019-08-20-213632   True        False         False      125m
storage                                    4.2.0-0.nightly-2019-08-20-213632   True        False         False      124m


# oc describe co authentication
Name:         authentication
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-08-21T03:22:02Z
  Generation:          1
  Resource Version:    15984
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/authentication
  UID:                 d7b4f42f-c3c2-11e9-aa43-02a5eca1f9aa
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-08-21T03:25:51Z
    Message:               RouteHealthDegraded: failed to GET route: EOF
    Reason:                RouteHealthDegradedFailedGet
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2019-08-21T03:22:02Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Progressing
    Last Transition Time:  2019-08-21T03:22:02Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Available
    Last Transition Time:  2019-08-21T03:22:02Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:     operator.openshift.io
    Name:      cluster
    Resource:  authentications
    Group:     config.openshift.io
    Name:      cluster
    Resource:  authentications
    Group:     config.openshift.io
    Name:      cluster
    Resource:  infrastructures
    Group:     config.openshift.io
    Name:      cluster
    Resource:  oauths
    Group:     
    Name:      openshift-config
    Resource:  namespaces
    Group:     
    Name:      openshift-config-managed
    Resource:  namespaces
    Group:     
    Name:      openshift-authentication
    Resource:  namespaces
    Group:     
    Name:      openshift-authentication-operator
    Resource:  namespaces
Events:        <none>

And master still could be scheduled, I deploy pod on it successfully.

Comment 6 Mo 2019-08-21 18:18:18 UTC

Moving to routing to debug why routes are not working on this cluster.

Comment 9 Dan Mace 2019-08-22 00:51:17 UTC

Nevermind the AWS bug references — I misread the original report and missed a key point about the topology under test.

The problem is that there are no instances assigned to the ELB. Looking at the cluster nodes:

$ oc get nodes
NAME                                         STATUS   ROLES           AGE   VERSION
ip-10-0-139-134.us-east-2.compute.internal   Ready    master,worker   21h   v1.14.0+17b784327
ip-10-0-157-164.us-east-2.compute.internal   Ready    master,worker   21h   v1.14.0+17b784327
ip-10-0-167-50.us-east-2.compute.internal    Ready    master,worker   21h   v1.14.0+17b784327

Notice that every worker node in the cluster is labeled as a master. In Kubernetes, master nodes are not allowed to be load balancer targets. This is a deliberate behavior upstream, not a bug. It follows that ingress controllers published by a load balancer depends on at least one non-master node on which to expose a port to connect to the load balancer.

We should probably consider preventing ingress controllers from being scheduled on masters. If we did so, ingress operator would report degraded and the problem would be more visible.

I think we simply don't support this topology when using cloud load balancers.

I'm going to close the bug and recommend we prune the test case as unsupported.

Comment 10 Dan Mace 2019-08-22 01:15:28 UTC

I opened https://bugzilla.redhat.com/show_bug.cgi?id=1744370 to track the scheduling and status reporting issue.

Comment 13 Hongan Li 2019-08-22 06:59:23 UTC

Installing master as schedulerable node on Azure platform is successful, but cannot access any routes (since no virtual machine in Azure LB backend pools).

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-08-21-235427   True        False         33m     Cluster version is 4.2.0-0.nightly-2019-08-21-235427

$ oc get node
NAME                          STATUS   ROLES           AGE   VERSION
hongli-az427-hwp5f-master-0   Ready    master,worker   49m   v1.14.0+a80442411
hongli-az427-hwp5f-master-1   Ready    master,worker   49m   v1.14.0+a80442411
hongli-az427-hwp5f-master-2   Ready    master,worker   49m   v1.14.0+a80442411

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-08-21-235427   True        False         False      34m
cloud-credential                           4.2.0-0.nightly-2019-08-21-235427   True        False         False      48m
cluster-autoscaler                         4.2.0-0.nightly-2019-08-21-235427   True        False         False      41m
console                                    4.2.0-0.nightly-2019-08-21-235427   True        False         False      35m
dns                                        4.2.0-0.nightly-2019-08-21-235427   True        False         False      48m
image-registry                             4.2.0-0.nightly-2019-08-21-235427   True        False         False      38m
ingress                                    4.2.0-0.nightly-2019-08-21-235427   True        False         False      41m
insights                                   4.2.0-0.nightly-2019-08-21-235427   True        False         False      48m
kube-apiserver                             4.2.0-0.nightly-2019-08-21-235427   True        False         False      45m
kube-controller-manager                    4.2.0-0.nightly-2019-08-21-235427   True        False         False      45m
kube-scheduler                             4.2.0-0.nightly-2019-08-21-235427   True        False         False      45m
machine-api                                4.2.0-0.nightly-2019-08-21-235427   True        False         False      48m
machine-config                             4.2.0-0.nightly-2019-08-21-235427   True        False         False      42m
marketplace                                4.2.0-0.nightly-2019-08-21-235427   True        False         False      41m
monitoring                                 4.2.0-0.nightly-2019-08-21-235427   True        False         False      39m
network                                    4.2.0-0.nightly-2019-08-21-235427   True        False         False      47m
node-tuning                                4.2.0-0.nightly-2019-08-21-235427   True        False         False      43m
openshift-apiserver                        4.2.0-0.nightly-2019-08-21-235427   True        False         False      43m
openshift-controller-manager               4.2.0-0.nightly-2019-08-21-235427   True        False         False      46m
openshift-samples                          4.2.0-0.nightly-2019-08-21-235427   True        False         False      33m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-08-21-235427   True        False         False      47m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-08-21-235427   True        False         False      47m
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-08-21-235427   True        False         False      44m
service-ca                                 4.2.0-0.nightly-2019-08-21-235427   True        False         False      48m
service-catalog-apiserver                  4.2.0-0.nightly-2019-08-21-235427   True        False         False      43m
service-catalog-controller-manager         4.2.0-0.nightly-2019-08-21-235427   True        False         False      43m
storage                                    4.2.0-0.nightly-2019-08-21-235427   True        False         False      42m

$ oc get pod -n openshift-ingress -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP            NODE                          NOMINATED NODE   READINESS GATES
router-default-86f85b897b-62rs2   1/1     Running   0          34m   10.128.0.30   hongli-az427-hwp5f-master-1   <none>           <none>
router-default-86f85b897b-8cr9p   1/1     Running   0          34m   10.129.0.37   hongli-az427-hwp5f-master-2   <none>           <none>

$ oc get svc -n openshift-ingress 
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.180.223   13.89.142.235   80:31364/TCP,443:31093/TCP   42m
router-internal-default   ClusterIP      172.30.176.214   <none>          80/TCP,443/TCP,1936/TCP      42m

$ curl https://console-openshift-console.apps.hongli-az427.qe.azure.devcluster.openshift.com -k -vv
* Rebuilt URL to: https://console-openshift-console.apps.hongli-az427.qe.azure.devcluster.openshift.com/
*   Trying 13.89.142.235...
* TCP_NODELAY set
(time out）

Comment 14 Hongan Li 2019-08-22 08:11:32 UTC

Created attachment 1606863 [details]
error message when adding vm to ingress LB

When I try to update the ingress LB and add vm to the backend pools manually, the  error message says: This virtual machine and IP address is already added in another Public load balancer backend pool.

So seems even cluster can be installed but ingress LB is still unavailable on Azure platform.

Comment 17 ge liu 2019-08-26 10:27:35 UTC

Regarding to comment 10, there is new bug to trace this issue, so close this one.

Comment 18 errata-xmlrpc 2019-10-16 06:35:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.