And here are some outputs # oc get pods --all-namespaces // pods becomes CrashLoopBackOff kube-service-catalog apiserver-7vz8k 1/1 Running 1 26m kube-service-catalog controller-manager-hh5pj 1/1 Running 4 26m openshift-ansible-service-broker asb-1-p8vh4 0/1 CrashLoopBackOff 8 25m openshift-ansible-service-broker asb-etcd-1-tvq7q 1/1 Running 2 25m openshift-template-service-broker apiserver-7j4n8 1/1 Running 1 25m openshift-template-service-broker apiserver-9chdw 1/1 Running 1 25m # oc logs asb-1-p8vh4 -n openshift-ansible-service-broker Using config file mounted to /etc/ansible-service-broker/config.yaml 2018/06/29 08:50:02 Unable to get log.logfile from config ============================================================ == Starting Ansible Service Broker... == ============================================================ [2018-06-29T08:50:02.152Z] [NOTICE] - Initializing clients... [2018-06-29T08:50:02.154Z] [INFO] - == ETCD CX == [2018-06-29T08:50:02.154Z] [INFO] - EtcdHost: asb-etcd.openshift-ansible-service-broker.svc [2018-06-29T08:50:02.154Z] [INFO] - EtcdPort: 2379 [2018-06-29T08:50:02.154Z] [INFO] - Endpoints: [https://asb-etcd.openshift-ansible-service-broker.svc:2379] [2018-06-29T08:50:02.169Z] [ERROR] - client: etcd cluster is unavailable or misconfigured; error #0: x509: certificate signed by unknown authority We may need fixes for ansible server broker too.
Above output is got after running openshift-ansible/playbooks/redeploy-certificates.yml
beside ansible service broker, seems template service broker also need fix
Created https://github.com/openshift/openshift-ansible/pull/9585 It also seems to fix TSB
After running /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml with openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch asb-* pods are running but apiserver-* pods was not started correctly # oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-service-catalog apiserver-4kt52 0/1 CrashLoopBackOff 9 29m kube-service-catalog controller-manager-qzp64 0/1 CrashLoopBackOff 5 29m openshift-ansible-service-broker asb-1-ph77j 1/1 Running 0 15m openshift-ansible-service-broker asb-etcd-1-4dq8t 1/1 Running 0 15m openshift-template-service-broker apiserver-4rchm 1/1 Running 2 27m openshift-template-service-broker apiserver-mx4pc 1/1 Running 1 27m openshift-web-console webconsole-7d7cbcf74c-7w64w 1/1 Running 0 13m # oc logs -f apiserver-4kt52 -n kube-service-catalog I0911 02:40:14.459147 1 feature_gate.go:184] feature gates: map[OriginatingIdentity:true] I0911 02:40:14.459291 1 hyperkube.go:188] Service Catalog version v3.9.43 (built 2018-09-08T02:18:49Z) W0911 02:40:14.751120 1 authentication.go:229] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA' Error: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.30.0.1:443: connect: network is unreachable
I don't see TSB pods are re-created, Vadim, can you help confirm?
Created attachment 1482254 [details] ansiblelogs
(In reply to Yadan Pei from comment #9) > I don't see TSB pods are re-created, Vadim, can you help confirm? These tasks have run: >TASK [ansible_service_broker : Remove ASB pods] ******************************** >changed: [host-8-244-4.host.centralci.eng.rdu2.redhat.com] => (item=asb) >changed: [host-8-244-4.host.centralci.eng.rdu2.redhat.com] => (item=asb-etcd) Please attach the output of `ansible-playbooks -vvv` for more information >dial tcp 172.30.0.1:443: connect: network is unreachable Some network problem? Is it reproducible? Can new APBs be provisioned?
Above network error is reproducible, we will debug and open separate bug if that's an issue. Despite the network errors, what I can confirm is that some secrets for ASB are re-created and pods are recreated also. # oc get secret -n openshift-ansible-service-broker //these secrets are re-created NAME TYPE DATA AGE asb-client kubernetes.io/service-account-token 4 16m asb-tls kubernetes.io/tls 2 16m broker-etcd-auth-secret Opaque 2 16m etcd-auth-secret Opaque 1 16m etcd-tls kubernetes.io/tls 2 16m # oc get pods -n openshift-ansible-service-broker // All ASB pods are running NAME READY STATUS RESTARTS AGE asb-1-mbhpg 1/1 Running 0 16m asb-etcd-1-smn26 1/1 Running 0 16m Another point I need confirm is I don't see TSB secret/pods are re-created, do we need recreate them also? # oc get pods -n openshift-template-service-broker NAME READY STATUS RESTARTS AGE apiserver-k5xl5 0/1 CrashLoopBackOff 9 2h apiserver-t54c7 1/1 Running 1 2h
Please attach the following info: 1) versions 2) inventory 3) apiserver container logs
The network issue is not reproduced on EC2, so it's not a issue any more. The only remaining concern is whether we need create re-create TSB secrets/pods
openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch
api container logs are still required to find out why is it broken
One of the pods failed to connect to kube API server: "dial tcp 172.30.0.1:443: connect: network is unreachable" The other one works fine, so the fix worked, but networks issues won't let the first pod start correctly.
Moving to VERIFIED per comment 12 and comment 20
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2658