Bug 1720581 - [3.11] installation fails when service-catalog enabled, on RHEL AH, multi-zone, GCP
Summary: [3.11] installation fails when service-catalog enabled, on RHEL AH, multi-zon...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-14 09:40 UTC by Weihua Meng
Modified: 2019-06-26 09:08 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Refactoring of openshift_facts caused the mtu to be improperly set. Consequence: Hosts could not communicate on networks with non-default mtu. Fix: openshift_facts.py was updated to properly detect and set the mtu for the host environment. Result: Hosts could properly communicate on networks with non-default mtu.
Clone Of:
Environment:
Last Closed: 2019-06-26 09:08:20 UTC
Target Upstream Version:


Attachments (Terms of Use)
tons of logs from the cluster (3.88 MB, application/gzip)
2019-06-14 20:08 UTC, Jesus M. Rodriguez
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1605 0 None None None 2019-06-26 09:08:21 UTC

Description Weihua Meng 2019-06-14 09:40:11 UTC
Description of problem:
installation fails when service-catalog enabled, on RHEL AH, multi-zone, GCP

installation success with same configration and service-catalog disabled.

Version-Release number of selected component (if applicable):
openshift-ansible-3.11.119-1.git.0.c9a8ebf.el7

sh-4.2# service-catalog --version
v3.11.117;Upstream:v0.1.35

How reproducible:
1 out of 1

Steps to Reproduce:
1. install OCP with service-catalog enabled


Actual results:
Installation failed.

# oc logs pod/controller-manager-gl4fk -n kube-service-catalog
I0614 09:18:06.213055       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true]
I0614 09:18:06.213353       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true AsyncBindingOperations:true]
I0614 09:18:06.213375       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true AsyncBindingOperations:true NamespacedServiceBroker:true]
I0614 09:18:06.213409       1 hyperkube.go:192] Service Catalog version v3.11.117;Upstream:v0.1.35 (built 2019-06-09T23:47:53Z)
I0614 09:18:06.217427       1 leaderelection.go:185] attempting to acquire leader lease  kube-service-catalog/service-catalog-controller-manager...
I0614 09:18:23.660185       1 leaderelection.go:194] successfully acquired lease kube-service-catalog/service-catalog-controller-manager
I0614 09:18:23.660283       1 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-service-catalog", Name:"service-catalog-controller-manager", UID:"ec653a7c-8e52-11e9-a7b9-42010af00079", APIVersion:"v1", ResourceVersion:"60766", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' controller-manager-gl4fk-external-service-catalog-controller became leader
F0614 09:18:23.700155       1 controller_manager.go:237] error running controllers: failed to get api versions from server: failed to get supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request


logs of apiserver in service-catalog project: 
I0614 09:15:04.126886       1 logs.go:49] http: TLS handshake error from 10.2.2.1:55380: read tcp 10.2.0.5:6443->10.2.2.1:55380: read: connection timed out
I0614 09:15:04.126978       1 logs.go:49] http: TLS handshake error from 10.2.4.1:42130: read tcp 10.2.0.5:6443->10.2.4.1:42130: read: connection timed out


Expected results:
install success

Additional info:

Comment 2 Jesus M. Rodriguez 2019-06-14 20:08:09 UTC
Created attachment 1580836 [details]
tons of logs from the cluster

logs from the master-api and controller.
logs from the service catalog apiserver pods (r7v7t was returning 200s, while the rest were failing).
logs from the service catalog controller
oc describe of the apiserver and controller-manager pods

Comment 3 Jesus M. Rodriguez 2019-06-14 20:08:40 UTC
[root@wmengahha311117-master-etcd-zone1-1 ~]# oc get pods -n kube-service-catalog
NAME                       READY     STATUS             RESTARTS   AGE
apiserver-722dq            1/1       Running            0          16h
apiserver-qk56x            1/1       Running            0          16h
apiserver-r7v7t            1/1       Running            0          16h
controller-manager-6mkfb   0/1       CrashLoopBackOff   191        16h
controller-manager-gl4fk   0/1       CrashLoopBackOff   190        16h
controller-manager-qxsfw   0/1       CrashLoopBackOff   190        16h

Comment 4 Jesus M. Rodriguez 2019-06-14 20:09:53 UTC
Works sometimes

# oc get --raw /apis/servicecatalog.k8s.io/v1beta1/clusterservicebrokers/ --loglevel=8
I0614 20:08:57.948520   99085 loader.go:359] Config loaded from file /root/.kube/config
I0614 20:08:57.949503   99085 round_trippers.go:383] GET https://wmengahha311117-lb-1:443/apis/servicecatalog.k8s.io/v1beta1/clusterservicebrokers/
I0614 20:08:57.949534   99085 round_trippers.go:390] Request Headers:
I0614 20:08:57.949546   99085 round_trippers.go:393]     Accept: application/json, */*
I0614 20:08:57.949556   99085 round_trippers.go:393]     User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0
I0614 20:08:57.987994   99085 round_trippers.go:408] Response Status: 200 OK in 38 milliseconds
I0614 20:08:57.988028   99085 round_trippers.go:411] Response Headers:
I0614 20:08:57.988037   99085 round_trippers.go:414]     Date: Fri, 14 Jun 2019 20:08:57 GMT
I0614 20:08:57.988044   99085 round_trippers.go:414]     Content-Length: 204
I0614 20:08:57.988050   99085 round_trippers.go:414]     Cache-Control: no-store
I0614 20:08:57.988057   99085 round_trippers.go:414]     Content-Type: application/json
{"kind":"ClusterServiceBrokerList","apiVersion":"servicecatalog.k8s.io/v1beta1","metadata":{"selfLink":"/apis/servicecatalog.k8s.io/v1beta1/clusterservicebrokers/","resourceVersion":"159833"},"items":[]}


Fails a lot

# oc get --raw /apis/servicecatalog.k8s.io/v1beta1/clusterservicebrokers/ --loglevel=8
I0614 20:09:18.732062   99540 loader.go:359] Config loaded from file /root/.kube/config
I0614 20:09:18.733569   99540 round_trippers.go:383] GET https://wmengahha311117-lb-1:443/apis/servicecatalog.k8s.io/v1beta1/clusterservicebrokers/
I0614 20:09:18.733710   99540 round_trippers.go:390] Request Headers:
I0614 20:09:18.733810   99540 round_trippers.go:393]     Accept: application/json, */*
I0614 20:09:18.733880   99540 round_trippers.go:393]     User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0
I0614 20:09:18.830132   99540 round_trippers.go:408] Response Status: 503 Service Unavailable in 96 milliseconds
I0614 20:09:18.830175   99540 round_trippers.go:411] Response Headers:
I0614 20:09:18.830183   99540 round_trippers.go:414]     X-Content-Type-Options: nosniff
I0614 20:09:18.830190   99540 round_trippers.go:414]     Content-Length: 20
I0614 20:09:18.830211   99540 round_trippers.go:414]     Date: Fri, 14 Jun 2019 20:09:18 GMT
I0614 20:09:18.830218   99540 round_trippers.go:414]     Cache-Control: no-store
I0614 20:09:18.830225   99540 round_trippers.go:414]     Content-Type: text/plain; charset=utf-8
I0614 20:09:18.830292   99540 request.go:897] Response Body: service unavailable
I0614 20:09:18.830466   99540 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server is currently unable to handle the request",
  "reason": "ServiceUnavailable",
  "details": {
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "service unavailable"
      }
    ]
  },
  "code": 503
}]
F0614 20:09:18.830509   99540 helpers.go:119] Error from server (ServiceUnavailable): the server is currently unable to handle the request

Comment 5 Jesus M. Rodriguez 2019-06-14 20:10:41 UTC
# oc get --raw /apis/servicecatalog.k8s.io/v1beta1/ --loglevel=8
I0614 20:10:22.584209  100681 loader.go:359] Config loaded from file /root/.kube/config
I0614 20:10:22.585199  100681 round_trippers.go:383] GET https://wmengahha311117-lb-1:443/apis/servicecatalog.k8s.io/v1beta1/
I0614 20:10:22.585231  100681 round_trippers.go:390] Request Headers:
I0614 20:10:22.585243  100681 round_trippers.go:393]     Accept: application/json, */*
I0614 20:10:22.585254  100681 round_trippers.go:393]     User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0
I0614 20:10:22.608623  100681 round_trippers.go:408] Response Status: 200 OK in 23 milliseconds
I0614 20:10:22.608779  100681 round_trippers.go:411] Response Headers:
I0614 20:10:22.608795  100681 round_trippers.go:414]     Cache-Control: no-store
I0614 20:10:22.608807  100681 round_trippers.go:414]     Content-Type: application/json
I0614 20:10:22.608822  100681 round_trippers.go:414]     Date: Fri, 14 Jun 2019 20:10:22 GMT
I0614 20:10:22.608833  100681 round_trippers.go:414]     Content-Length: 2680
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"servicecatalog.k8s.io/v1beta1","resources":[{"name":"clusterservicebrokers","singularName":"","namespaced":false,"kind":"ClusterServiceBroker","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"clusterservicebrokers/status","singularName":"","namespaced":false,"kind":"ClusterServiceBroker","verbs":["get","patch","update"]},{"name":"clusterserviceclasses","singularName":"","namespaced":false,"kind":"ClusterServiceClass","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"clusterserviceclasses/status","singularName":"","namespaced":false,"kind":"ClusterServiceClass","verbs":["get","patch","update"]},{"name":"clusterserviceplans","singularName":"","namespaced":false,"kind":"ClusterServicePlan","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"clusterserviceplans/status","singularName":"","namespaced":false,"kind":"ClusterServicePlan","verbs":["get","patch","update"]},{"name":"servicebindings","singularName":"","namespaced":true,"kind":"ServiceBinding","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"servicebindings/status","singularName":"","namespaced":true,"kind":"ServiceBinding","verbs":["get","patch","update"]},{"name":"servicebrokers","singularName":"","namespaced":true,"kind":"ServiceBroker","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"servicebrokers/status","singularName":"","namespaced":true,"kind":"ServiceBroker","verbs":["get","patch","update"]},{"name":"serviceclasses","singularName":"","namespaced":true,"kind":"ServiceClass","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"serviceclasses/status","singularName":"","namespaced":true,"kind":"ServiceClass","verbs":["get","patch","update"]},{"name":"serviceinstances","singularName":"","namespaced":true,"kind":"ServiceInstance","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"serviceinstances/reference","singularName":"","namespaced":true,"kind":"ServiceInstance","verbs":["get","patch","update"]},{"name":"serviceinstances/status","singularName":"","namespaced":true,"kind":"ServiceInstance","verbs":["get","patch","update"]},{"name":"serviceplans","singularName":"","namespaced":true,"kind":"ServicePlan","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]},{"name":"serviceplans/status","singularName":"","namespaced":true,"kind":"ServicePlan","verbs":["get","patch","update"]}]}

Comment 11 Ricardo Carrillo Cruz 2019-06-18 09:30:49 UTC
I've been restarting controllers and api containers in the kube-service-catalog namespace to see what was going on.
Restarting and checking logs on catalog controller, I see this:

```
I0618 09:24:36.006024  104860 loader.go:359] Config loaded from file /root/.kube/config                        
I0618 09:24:36.007040  104860 loader.go:359] Config loaded from file /root/.kube/config
I0618 09:24:36.010757  104860 loader.go:359] Config loaded from file /root/.kube/config
I0618 09:24:36.018679  104860 round_trippers.go:383] GET https://wmengahha311117-lb-1:443/apis/metrics.k8s.io/v1beta1?timeout=32s
I0618 09:24:36.018774  104860 round_trippers.go:390] Request Headers:                   
I0618 09:24:36.018786  104860 round_trippers.go:393]     Accept: application/json, */*
I0618 09:24:36.018808  104860 round_trippers.go:393]     User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0                                                                                                                                     
I0618 09:24:36.021532  104860 round_trippers.go:408] Response Status: 503 Service Unavailable in 2 milliseconds
I0618 09:24:36.021583  104860 round_trippers.go:411] Response Headers:                                                                                                                                                                                        
I0618 09:24:36.021604  104860 round_trippers.go:414]     Cache-Control: no-store                                                                                                                                                                             
I0618 09:24:36.021625  104860 round_trippers.go:414]     Content-Type: text/plain; charset=utf-8                                                                                                                                                             
I0618 09:24:36.021635  104860 round_trippers.go:414]     X-Content-Type-Options: nosniff
I0618 09:24:36.021646  104860 round_trippers.go:414]     Content-Length: 20                                                                                                                                                                                  
I0618 09:24:36.021658  104860 round_trippers.go:414]     Date: Tue, 18 Jun 2019 09:24:36 GMT                                                                                                                                                                 
I0618 09:24:36.021721  104860 request.go:897] Response Body: service unavailable      
I0618 09:24:36.021894  104860 request.go:1099] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\"
"; Kind string "json:\"kind,omitempty\"" }                                                    
I0618 09:24:36.021929  104860 cached_discovery.go:77] skipped caching discovery info due to the server is currently unable to handle the request
I0618 09:24:36.023399  104860 shortcut.go:89] Error loading discovery information: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0618 09:24:36.026221  104860 round_trippers.go:383] GET https://wmengahha311117-lb-1:443/apis/metrics.k8s.io/v1beta1?timeout=32s
I0618 09:24:36.026252  104860 round_trippers.go:390] Request Headers:                       
I0618 09:24:36.026261  104860 round_trippers.go:393]     Accept: application/json, */*                                                                                                                                                                        
I0618 09:24:36.026269  104860 round_trippers.go:393]     User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0                                                                                                                                      
I0618 09:24:36.028314  104860 round_trippers.go:408] Response Status: 503 Service Unavailable in 2 milliseconds                                                                                                                                               
I0618 09:24:36.028343  104860 round_trippers.go:411] Response Headers:                                                                                                                                                                                        
I0618 09:24:36.028352  104860 round_trippers.go:414]     Cache-Control: no-store            
I0618 09:24:36.028360  104860 round_trippers.go:414]     Content-Type: text/plain; charset=utf-8
I0618 09:24:36.028368  104860 round_trippers.go:414]     X-Content-Type-Options: nosniff
I0618 09:24:36.028375  104860 round_trippers.go:414]     Content-Length: 20            
I0618 09:24:36.028382  104860 round_trippers.go:414]     Date: Tue, 18 Jun 2019 09:24:36 GMT                                     
I0618 09:24:36.028419  104860 request.go:897] Response Body: service unavailable
I0618 09:24:36.028536  104860 request.go:1099] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\"
"; Kind string "json:\"kind,omitempty\"" }                                                                              
I0618 09:24:36.028571  104860 cached_discovery.go:77] skipped caching discovery info due to the server is currently unable to handle the request
I0618 09:24:36.030991  104860 loader.go:359] Config loaded from file /root/.kube/config
I0618 09:24:36.032145  104860 loader.go:359] Config loaded from file /root/.kube/config
I0618 09:24:36.033230  104860 loader.go:359] Config loaded from file /root/.kube/config         
I0618 09:24:36.033537  104860 round_trippers.go:383] GET https://wmengahha311117-lb-1:443/api/v1/namespaces/kube-service-catalog/pods/controller-manager-bbgm5/log?follow=true
I0618 09:24:36.033559  104860 round_trippers.go:390] Request Headers:      
I0618 09:24:36.033567  104860 round_trippers.go:393]     Accept: application/json, */*      
I0618 09:24:36.033575  104860 round_trippers.go:393]     User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0
I0618 09:24:36.051402  104860 round_trippers.go:408] Response Status: 200 OK in 17 milliseconds                                                                                                                                                               
I0618 09:24:36.051462  104860 round_trippers.go:411] Response Headers:
I0618 09:24:36.051476  104860 round_trippers.go:414]     Cache-Control: no-store                                                                
I0618 09:24:36.051488  104860 round_trippers.go:414]     Content-Type: text/plain                                                                                                                                   
I0618 09:24:36.051499  104860 round_trippers.go:414]     Date: Tue, 18 Jun 2019 09:24:36 GMT                                     
I0618 09:24:21.710295       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true]
I0618 09:24:21.710593       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true AsyncBindingOperations:true]
I0618 09:24:21.710623       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true AsyncBindingOperations:true NamespacedServiceBroker:true]
I0618 09:24:21.710650       1 hyperkube.go:192] Service Catalog version v3.11.117;Upstream:v0.1.35 (built 2019-06-09T23:47:53Z)
I0618 09:24:21.713644       1 leaderelection.go:185] attempting to acquire leader lease  kube-service-catalog/service-catalog-controller-manager...
I0618 09:24:21.746672       1 leaderelection.go:194] successfully acquired lease kube-service-catalog/service-catalog-controller-manager
I0618 09:24:21.747902       1 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-service-catalog", Name:"service-catalog-controller-manager", UID:"ec653a7c-8e52-11e9-a7b9-42010af00079", APIVersion:"v1", ResourceVersion:"895074", Fie
ldPath:""}): type: 'Normal' reason: 'LeaderElection' controller-manager-bbgm5-external-service-catalog-controller became leader
F0618 09:24:21.788731       1 controller_manager.go:237] error running controllers: failed to get api versions from server: failed to get supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the se
rver is currently unable to handle the request, servicecatalog.k8s.io/v1beta1: the server is currently unable to handle the request
```

As you can see, hitting kube-service-catalog API calls return 200s, but hitting the metrics API return 503.

Comment 12 Casey Callendrello 2019-06-18 14:58:07 UTC
In other words, this isn't a networking issue. If you manage to get another cluster in this state, feel free to poke us.

Comment 15 Ricardo Carrillo Cruz 2019-06-19 09:51:17 UTC
We can curl some of the servicecatalog apiservers from the master, where the main apiserver runs:

```
[root@wmeng195ahha311-master-etcd-zone1-1 ~]# curl -kv https://10.2.0.5:6443/healthz                                                                                                         
* About to connect() to 10.2.0.5 port 6443 (#0)
*   Trying 10.2.0.5...
* Connected to 10.2.0.5 (10.2.0.5) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=apiserver.kube-service-catalog
*       start date: Jun 19 06:18:35 2019 GMT
*       expire date: Jun 18 06:18:36 2021 GMT
*       common name: apiserver.kube-service-catalog
*       issuer: CN=service-catalog-signer
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.2.0.5:6443
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 19 Jun 2019 09:21:50 GMT
< Content-Length: 2
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 10.2.0.5 left intact
ok[root@wmeng195ahha311-master-etcd-zone1-1 ~]# curl -kv https://10.2.2.5:6443/healthz
* About to connect() to 10.2.2.5 port 6443 (#0)
*   Trying 10.2.2.5...
* Connected to 10.2.2.5 (10.2.2.5) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
^C
[root@wmeng195ahha311-master-etcd-zone1-1 ~]# curl -kv https://10.2.4.4:6443/healthz
* About to connect() to 10.2.4.4 port 6443 (#0)
*   Trying 10.2.4.4...
* Connected to 10.2.4.4 (10.2.4.4) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
```

On the apiservers that the request is not completed, you can see the connection is made, so it seems
hung servicecatalog api servers.

Moving the bug to apiserver team.

Comment 16 Ricardo Carrillo Cruz 2019-06-19 09:54:40 UTC
Pings are also fine, thus not networking issue:

```
[root@wmeng195ahha311-master-etcd-zone1-1 ~]# ping 10.2.4.4
PING 10.2.4.4 (10.2.4.4) 56(84) bytes of data.
64 bytes from 10.2.4.4: icmp_seq=1 ttl=64 time=0.785 ms
^C
--- 10.2.4.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.785/0.785/0.785/0.000 ms
[root@wmeng195ahha311-master-etcd-zone1-1 ~]# ping 10.2.0.5
PING 10.2.0.5 (10.2.0.5) 56(84) bytes of data.
64 bytes from 10.2.0.5: icmp_seq=1 ttl=64 time=0.099 ms
^C
--- 10.2.0.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.099/0.099/0.099/0.000 ms
[root@wmeng195ahha311-master-etcd-zone1-1 ~]# ping 10.2.2.5
PING 10.2.2.5 (10.2.2.5) 56(84) bytes of data.
64 bytes from 10.2.2.5: icmp_seq=1 ttl=64 time=0.765 ms
^C
--- 10.2.2.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.765/0.765/0.765/0.000 ms
```

Comment 17 Xingxing Xia 2019-06-19 10:27:55 UTC
(In reply to Ricardo Carrillo Cruz from comment #11)
> F0618 09:24:21.788731       1 controller_manager.go:237] error running
> controllers: failed to get api versions from server: failed to get supported
> resources from server: unable to retrieve the complete list of server APIs:
> metrics.k8s.io/v1beta1: the se
> rver is currently unable to handle the request,
> servicecatalog.k8s.io/v1beta1: the server is currently unable to handle the
> request
> ```
> 
> As you can see, hitting kube-service-catalog API calls return 200s, but
> hitting the metrics API return 503.
Per this hint, I checked component metrics.k8s.io, find it is False:
[root@wmeng195ahha311-master-etcd-zone1-1 ~]# oc get apiservices -o=custom-columns="name:.metadata.name,status:.status.conditions[0].status"
v1beta1.metrics.k8s.io                 False

[root@wmeng195ahha311-master-etcd-zone1-1 ~]# oc logs deploy/metrics-server -n openshift-metrics-server
I0619 06:18:22.137440       1 heapster.go:71] /usr/bin/metrics-server --source=kubernetes.summary_api:?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --tls-cert-file=/certs/tls.crt --tls-private-key-file=/certs/tls.key --metric_resolution=30s --secure-port=8443
...
I0619 06:18:22.786769       1 serve.go:85] Serving securely on 0.0.0.0:8443
I0619 06:33:51.058649       1 logs.go:41] http: TLS handshake error from 10.2.4.1:52634: read tcp 10.2.12.7:8443->10.2.4.1:52634: read: connection timed out
...
I0619 10:14:22.674626       1 logs.go:41] http: TLS handshake error from 10.2.2.1:52684: read tcp 10.2.12.7:8443->10.2.2.1:52684: read: connection timed out

Per my memory of other bug 1625194#c25 , above metrics failure may produce side impact on SC.

Comment 18 Shawn Hurley 2019-06-19 19:33:38 UTC
> Per my memory of other bug 1625194#c25 , above metrics failure may produce side impact on SC.

I believe this is only related to deleting namespaces.

Can we re-try this with new changes here:

https://github.com/openshift/openshift-ansible/pull/11704

Comment 24 Casey Callendrello 2019-06-20 16:58:46 UTC
Found the bug. GCP nodes are coming up with a 1460 eth0 mtu, and trying to set up openshift-sdn vxlan with 1450. That's not gonna work (it needs to be default - 50, or 1410).

openshift-ansible should detect this.

Comment 25 Casey Callendrello 2019-06-20 17:01:43 UTC
And, it looks like someone in the community caught this a few hours ago too.

Comment 31 Weihua Meng 2019-06-21 03:09:34 UTC
Fixed.

openshift-ansible-3.11.123-1.git.0.db681ba.el7

Installation successful. The cluster is working fine.

networkConfig:
  mtu: 1410

It was 1450 before the fix.

Comment 33 errata-xmlrpc 2019-06-26 09:08:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605


Note You need to log in before you can comment on or make changes to this bug.