Bug 1491202

Summary: [Federation] Failed to create load balancer for service federation-system/apiserver on GCE
Product: OpenShift Container Platform Reporter: Qixuan Wang <qixuan.wang>
Component: MasterAssignee: David Eads <deads>
Status: CLOSED ERRATA QA Contact: Qixuan Wang <qixuan.wang>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.0CC: aos-bugs, chezhang, jokerman, mfojtik, mmccomas, qixuan.wang, wsun, xtian
Target Milestone: ---Keywords: TestBlocker
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:10:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
atomic-openshift-master-controllers
none
atomic-openshift-master-api none

Description Qixuan Wang 2017-09-13 09:59:57 UTC
Description of problem:
The apiserver service is pending and blocks subsequent process during federation control panel. The error is "Failed to create load balancer for service federation-system/qwangfed-apiserver: GCECloud.ClusterID is not ready. Call Initialize() before using"
This problem occurs in OCP 3.7.0-0.125.0 with image ose-federation:v3.7.0-0.125.0 and v3.6.173.0.30, not in OCP 3.6.173.0.30 with the same images.


Version-Release number of selected component (if applicable):
openshift v3.7.0-0.125.0
kubernetes v1.7.0+695f48a16f
etcd 3.2.1
registry.ops.openshift.com/openshift3/ose-federation:v3.7.0-0.125.0 

How reproducible:
Always

Steps to Reproduce:
1. Initialize federation control panel 
2. Check whether federation control panel works


Actual results:
1. [root@preserve-910-qe-qwang-37-federation-master-etcd-nfs-1 ~]# kubefed init qwangfed --dns-provider=google-clouddns --dns-zone-name=federation.ocpqe.com. --etcd-persistent-storage=true --image=registry.ops.openshift.com/openshift3/ose-federation:v3.7.0-0.125.0
Creating a namespace federation-system for federation system components... done
Creating federation control plane service...........................................


2. [root@preserve-910-qe-qwang-37-federation-master-etcd-nfs-1 ~]# oc get all -n federation-system
NAME                          CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
svc/qwangfed36173-apiserver   172.30.138.23   <pending>     443:32539/TCP   56m


[root@preserve-910-qe-qwang-37-federation-master-etcd-nfs-1 ~]# oc describe svc/qwangfed36173-apiserver -n federation-system
Name:			qwangfed36173-apiserver
Namespace:		federation-system
Labels:			app=federated-cluster
Annotations:		federation.alpha.kubernetes.io/federation-name=qwangfed36173
Selector:		app=federated-cluster,module=federation-apiserver
Type:			LoadBalancer
IP:			172.30.138.23
Port:			https	443/TCP
NodePort:		https	32539/TCP
Endpoints:		<none>
Session Affinity:	None
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason				Message
  ---------	--------	-----	----			-------------	--------	------				-------
  57m		2m		17	service-controller			Normal		CreatingLoadBalancer		Creating load balancer
  57m		2m		17	service-controller			Warning		CreatingLoadBalancerFailed	Error creating load balancer (will retry): Failed to create load balancer for service federation-system/qwangfed36173-apiserver: GCECloud.ClusterID is not ready. Call Initialize() before using.


Expected results:
The load balancer can be created.


Additional info:

Comment 1 Derek Carr 2017-09-13 16:37:47 UTC
This looks like an error creating a Service type LoadBalancer on GCE (not specific to federation).

Comment 2 Michal Fojtik 2017-09-14 11:40:34 UTC
Can you please provide the master logs when this happen? Also is this permanently broken or it fixes itself (as there should be retry). To me this seems like the GCE is lagging in setting the cluster ID as ready.

Comment 3 Qixuan Wang 2017-09-15 09:47:37 UTC
Created attachment 1326376 [details]
atomic-openshift-master-controllers

Comment 4 Qixuan Wang 2017-09-15 09:49:41 UTC
Created attachment 1326377 [details]
atomic-openshift-master-api

Comment 5 Qixuan Wang 2017-09-15 09:50:32 UTC
It's permanently broken. Attached the master logs.

Comment 8 Zhang Cheng 2017-09-18 03:23:10 UTC
It blocks the federation cluster setup for OpenShift 3.7 on GCP.

Comment 9 Michal Fojtik 2017-09-18 10:11:54 UTC
The https://github.com/openshift/origin/pull/16089 was merged, setting this ON_QA.

Comment 10 Qixuan Wang 2017-09-19 10:30:53 UTC
The fix has not been in OCP 3.7.0-0.126.4 yet.

Comment 11 Michal Fojtik 2017-09-26 13:22:11 UTC
Moving back to modified.

Comment 13 Qixuan Wang 2017-10-13 09:59:42 UTC
Tested on openshift v3.7.0-0.143.2 (kubernetes v1.7.0+80709908fd, etcd 3.2.1, registry.ops.openshift.com/openshift3/ose-federation:v3.7.0-0.147.1), the bug has been fixed, thanks.


[root@preserve-qw-master-etcd-nfs-1 ~]# oc get all -n federation-system
NAME                                              READY     STATUS    RESTARTS   AGE
po/qwangfed-apiserver-2147565502-7b76l            2/2       Running   0          36m
po/qwangfed-controller-manager-1909511936-s5kqx   1/1       Running   1          38m

NAME                     CLUSTER-IP     EXTERNAL-IP                     PORT(S)         AGE
svc/qwangfed-apiserver   172.30.0.140   172.29.147.174,172.29.147.174   443:32697/TCP   38m

NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/qwangfed-apiserver            1         1         1            1           38m
deploy/qwangfed-controller-manager   1         1         1            1           38m

NAME                                        DESIRED   CURRENT   READY     AGE
rs/qwangfed-apiserver-2147565502            1         1         1         36m
rs/qwangfed-apiserver-265993101             0         0         0         38m
rs/qwangfed-controller-manager-1909511936   1         1         1         38m

Comment 17 errata-xmlrpc 2017-11-28 22:10:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188