Bug 1596557 - After running redeploy-certificates.yml playbook in OCP 3.9 ansible service broker stop working
Summary: After running redeploy-certificates.yml playbook in OCP 3.9 ansible service b...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 3.9.z
Assignee: Vadim Rutkovsky
QA Contact: Yadan Pei
URL:
Whiteboard:
Depends On: 1592303 1596233 1667981
Blocks: 1623987
TreeView+ depends on / blocked
 
Reported: 2018-06-29 08:50 UTC by Yadan Pei
Modified: 2019-01-21 15:54 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1592303
: 1623987 (view as bug list)
Environment:
Last Closed: 2018-09-22 04:53:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ansiblelogs (90.01 KB, text/plain)
2018-09-11 02:54 UTC, Yadan Pei
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2658 0 None None None 2018-09-22 04:53:57 UTC

Comment 1 Yadan Pei 2018-06-29 08:52:30 UTC
And here are some outputs
# oc get pods --all-namespaces   // pods becomes CrashLoopBackOff
kube-service-catalog                apiserver-7vz8k                  1/1       Running            1          26m
kube-service-catalog                controller-manager-hh5pj         1/1       Running            4          26m
openshift-ansible-service-broker    asb-1-p8vh4                      0/1       CrashLoopBackOff   8          25m
openshift-ansible-service-broker    asb-etcd-1-tvq7q                 1/1       Running            2          25m
openshift-template-service-broker   apiserver-7j4n8                  1/1       Running            1          25m
openshift-template-service-broker   apiserver-9chdw                  1/1       Running            1          25m

# oc logs asb-1-p8vh4 -n openshift-ansible-service-broker
Using config file mounted to /etc/ansible-service-broker/config.yaml
2018/06/29 08:50:02 Unable to get log.logfile from config
============================================================
==           Starting Ansible Service Broker...           ==
============================================================
[2018-06-29T08:50:02.152Z] [NOTICE] - Initializing clients...
[2018-06-29T08:50:02.154Z] [INFO] - == ETCD CX ==
[2018-06-29T08:50:02.154Z] [INFO] - EtcdHost: asb-etcd.openshift-ansible-service-broker.svc
[2018-06-29T08:50:02.154Z] [INFO] - EtcdPort: 2379
[2018-06-29T08:50:02.154Z] [INFO] - Endpoints: [https://asb-etcd.openshift-ansible-service-broker.svc:2379]
[2018-06-29T08:50:02.169Z] [ERROR] - client: etcd cluster is unavailable or misconfigured; error #0: x509: certificate signed by unknown authority

We may need fixes for ansible server broker too.

Comment 2 Yadan Pei 2018-06-29 08:54:24 UTC
Above output is got after running openshift-ansible/playbooks/redeploy-certificates.yml

Comment 3 Yadan Pei 2018-06-29 09:07:52 UTC
beside ansible service broker, seems template service broker also need fix

Comment 4 Vadim Rutkovsky 2018-08-14 13:39:10 UTC
Created https://github.com/openshift/openshift-ansible/pull/9585

It also seems to fix TSB

Comment 8 Yadan Pei 2018-09-11 02:47:37 UTC
After running /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml with openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch

asb-* pods are running but apiserver-* pods was not started correctly

# oc get pods --all-namespaces
NAMESPACE                           NAME                             READY     STATUS             RESTARTS   AGE
kube-service-catalog                apiserver-4kt52                  0/1       CrashLoopBackOff   9          29m
kube-service-catalog                controller-manager-qzp64         0/1       CrashLoopBackOff   5          29m
openshift-ansible-service-broker    asb-1-ph77j                      1/1       Running            0          15m
openshift-ansible-service-broker    asb-etcd-1-4dq8t                 1/1       Running            0          15m
openshift-template-service-broker   apiserver-4rchm                  1/1       Running            2          27m
openshift-template-service-broker   apiserver-mx4pc                  1/1       Running            1          27m
openshift-web-console               webconsole-7d7cbcf74c-7w64w      1/1       Running            0          13m

# oc logs -f apiserver-4kt52 -n kube-service-catalog
I0911 02:40:14.459147       1 feature_gate.go:184] feature gates: map[OriginatingIdentity:true]
I0911 02:40:14.459291       1 hyperkube.go:188] Service Catalog version v3.9.43 (built 2018-09-08T02:18:49Z)
W0911 02:40:14.751120       1 authentication.go:229] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
Error: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.30.0.1:443: connect: network is unreachable

Comment 9 Yadan Pei 2018-09-11 02:49:15 UTC
I don't see TSB pods are re-created, Vadim, can you help confirm?

Comment 10 Yadan Pei 2018-09-11 02:54:16 UTC
Created attachment 1482254 [details]
ansiblelogs

Comment 11 Vadim Rutkovsky 2018-09-11 09:05:14 UTC
(In reply to Yadan Pei from comment #9)
> I don't see TSB pods are re-created, Vadim, can you help confirm?

These tasks have run:

>TASK [ansible_service_broker : Remove ASB pods] ********************************
>changed: [host-8-244-4.host.centralci.eng.rdu2.redhat.com] => (item=asb)
>changed: [host-8-244-4.host.centralci.eng.rdu2.redhat.com] => (item=asb-etcd)

Please attach the output of `ansible-playbooks -vvv` for more information

>dial tcp 172.30.0.1:443: connect: network is unreachable

Some network problem? Is it reproducible? 
Can new APBs be provisioned?

Comment 12 Yadan Pei 2018-09-12 09:03:25 UTC
Above network error is reproducible, we will debug and open separate bug if that's an issue. 

Despite the network errors, what I can confirm is that some secrets for ASB are re-created and pods are recreated also.

# oc get secret -n openshift-ansible-service-broker  //these secrets are re-created
NAME                         TYPE                                  DATA      AGE
asb-client                   kubernetes.io/service-account-token   4         16m
asb-tls                      kubernetes.io/tls                     2         16m
broker-etcd-auth-secret      Opaque                                2         16m
etcd-auth-secret             Opaque                                1         16m
etcd-tls                     kubernetes.io/tls                     2         16m
# oc get pods -n openshift-ansible-service-broker   // All ASB pods are running
NAME               READY     STATUS    RESTARTS   AGE
asb-1-mbhpg        1/1       Running   0          16m
asb-etcd-1-smn26   1/1       Running   0          16m

Another point I need confirm is I don't see TSB secret/pods are re-created, do we need recreate them also?
# oc get pods -n openshift-template-service-broker
NAME              READY     STATUS             RESTARTS   AGE
apiserver-k5xl5   0/1       CrashLoopBackOff   9          2h
apiserver-t54c7   1/1       Running            1          2h

Comment 14 Vadim Rutkovsky 2018-09-12 09:06:01 UTC
Please attach the following info:

1) versions
2) inventory
3) apiserver container logs

Comment 15 Yadan Pei 2018-09-12 09:39:16 UTC
The network issue is not reproduced on EC2, so it's not a issue any more.

The only remaining concern is whether we need create re-create TSB secrets/pods

Comment 17 Yadan Pei 2018-09-12 09:43:10 UTC
openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch

Comment 18 Vadim Rutkovsky 2018-09-12 10:28:12 UTC
api container logs are still required to find out why is it broken

Comment 20 Vadim Rutkovsky 2018-09-12 10:45:28 UTC
One of the pods failed to connect to kube API server:

"dial tcp 172.30.0.1:443: connect: network is unreachable"

The other one works fine, so the fix worked, but networks issues won't let the first pod start correctly.

Comment 21 Yadan Pei 2018-09-13 01:24:36 UTC
Moving to VERIFIED per comment 12 and comment 20

Comment 23 errata-xmlrpc 2018-09-22 04:53:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2658


Note You need to log in before you can comment on or make changes to this bug.