Description of problem: Since the ansible service broker pod contains etcd and asb containers, a rolling update strategy will not work b/c the "new" deployment will never be able to start etcd (because the first deployment has the lock). Using the recreate strategy, the new deployment can retrieve the appropriate lock. How reproducible: Always Steps to Reproduce: 1. Deploy an Openshift Cluster + Service Catalog + Ansible Service Broker 2. Redeploy the broker `oc deploy asb --latest` Actual results: $ oc logs asb-2-z4kg8 -c asb Using config file mounted to /etc/ansible-service-broker/config.yaml ============================================================ == Starting Ansible Service Broker... == ============================================================ [2017-10-20T15:09:13.108Z] [NOTICE] Initializing clients... [2017-10-20T15:09:13.108Z] [DEBUG] Trying to connect to etcd [2017-10-20T15:09:13.108Z] [INFO] == ETCD CX == [2017-10-20T15:09:13.108Z] [INFO] EtcdHost: 0.0.0.0 [2017-10-20T15:09:13.108Z] [INFO] EtcdPort: 2379 [2017-10-20T15:09:13.108Z] [INFO] Endpoints: [http://0.0.0.0:2379] [2017-10-20T15:09:14.108Z] [ERROR] client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://0.0.0.0:2379 exceeded header timeout $ oc logs asb-2-deploy --> Scaling up asb-2 from 0 to 1, scaling down asb-1 from 1 to 0 (keep 1 pods available, don't exceed 2 pods) Scaling asb-2 up to 1 $ oc logs asb-1-31h7l -c etcd 2017-10-20 14:49:07.879155 I | etcdmain: etcd Version: 3.2.9 2017-10-20 14:49:07.879220 I | etcdmain: Git SHA: f1d7dd8 2017-10-20 14:49:07.879225 I | etcdmain: Go Version: go1.8.4 2017-10-20 14:49:07.879234 I | etcdmain: Go OS/Arch: linux/amd64 2017-10-20 14:49:07.879250 I | etcdmain: setting maximum number of CPUs to 16, total number of available CPUs is 16 2017-10-20 14:49:07.879510 N | etcdmain: the server is already initialized as member before, starting as etcd member... 2017-10-20 14:49:07.880992 I | embed: listening for peers on http://localhost:2380 2017-10-20 14:49:07.881067 I | embed: listening for client requests on 0.0.0.0:2379 2017-10-20 14:49:08.881731 W | etcdserver: another etcd process is using "/data/member/snap/db" and holds the file lock. 2017-10-20 14:49:08.882677 W | etcdserver: waiting for it to exit before starting... Expected results: The original deployment to be scaled down before bringing up new deployment.
https://github.com/openshift/ansible-service-broker/pull/511
Tested on OCP (openshift v3.7.0-0.190.0, kubernetes v1.7.6+a08f5eeb62, etcd 3.2.8, registry.reg-aws.openshift.com:443/openshift3/ose-ansible-service-broker:v3.7.0-0.190.0, registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.7.0-0.190.0) I met this issue, could you help to look at it's a bug or where I misconfigure, thanks. [root@host-172-16-120-121 ~]# oc get all -n ansible-service-broker NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfigs/asb 2 1 1 config deploymentconfigs/asb-etcd 1 1 1 config NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD routes/asb-1338 asb-1338-ansible-service-broker.apps.1103-qtr.qe.rhcloud.com asb port-1338 reencrypt None NAME READY STATUS RESTARTS AGE po/asb-2-trlzf 0/1 CrashLoopBackOff 5 3m po/asb-etcd-1-m6r9r 1/1 Running 0 3m NAME DESIRED CURRENT READY AGE rc/asb-2 1 1 0 4m rc/asb-etcd-1 1 1 1 4m NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/asb 172.30.156.76 <none> 1338/TCP 14m svc/asb-etcd 172.30.206.95 <none> 2379/TCP 14m [root@host-172-16-120-121 ~]# oc logs po/asb-2-trlzf Using config file mounted to /etc/ansible-service-broker/config.yaml ============================================================ == Starting Ansible Service Broker... == ============================================================ [2017-11-03T08:08:11.615Z] [NOTICE] Initializing clients... [2017-11-03T08:08:11.615Z] [DEBUG] Trying to connect to etcd [2017-11-03T08:08:11.615Z] [INFO] == ETCD CX == [2017-11-03T08:08:11.615Z] [INFO] EtcdHost: asb-etcd.ansible-service-broker.svc [2017-11-03T08:08:11.615Z] [INFO] EtcdPort: 2379 [2017-11-03T08:08:11.615Z] [INFO] Endpoints: [http://asb-etcd.ansible-service-broker.svc:2379] [2017-11-03T08:08:11.619Z] [ERROR] client: etcd cluster is unavailable or misconfigured; error #0: malformed HTTP response "\x15\x03\x01\x00\x02\x02" [root@host-172-16-120-121 ~]# oc logs po/asb-2-trlzf -c etcd Error from server (BadRequest): container etcd is not valid for pod asb-2-trlzf
I tested on other cluster (openshift v3.7.0-0.190.0, kubernetes v1.7.6+a08f5eeb62 ,etcd 3.2.8, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-ansible-service-broker:v3.7.0-0.189.0.0, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-ansible-service-broker:v3.7.0-0.189.0.0), the bug can be reproduced. [root@host-172-16-120-47 ~]# oc get all NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfigs/asb 3 1 1 config NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD routes/asb-1338 asb-1338-openshift-ansible-service-broker.apps.1103-5th.qe.rhcloud.com asb 1338 reencrypt None NAME READY STATUS RESTARTS AGE po/asb-3-gbc9b 2/2 Running 1 41s NAME DESIRED CURRENT READY AGE rc/asb-3 1 1 1 44s NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/asb 172.30.164.136 <none> 1338/TCP 5h [root@host-172-16-120-47 ~]# oc rollout latest asb deploymentconfig "asb" rolled out [root@host-172-16-120-47 ~]# oc get all NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfigs/asb 4 1 1 config NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD routes/asb-1338 asb-1338-openshift-ansible-service-broker.apps.1103-5th.qe.rhcloud.com asb 1338 reencrypt None NAME READY STATUS RESTARTS AGE po/asb-3-gbc9b 2/2 Running 1 1m po/asb-4-deploy 1/1 Running 0 30s po/asb-4-hhwnh 1/2 Error 2 27s NAME DESIRED CURRENT READY AGE rc/asb-3 1 1 1 1m rc/asb-4 1 1 0 30s [root@host-172-16-120-47 ~]# oc logs po/asb-4-hhwnh -c asb Using config file mounted to /etc/ansible-service-broker/config.yaml ============================================================ == Starting Ansible Service Broker... == ============================================================ [2017-11-03T08:50:57.425Z] [NOTICE] Initializing clients... [2017-11-03T08:50:57.425Z] [INFO] == ETCD CX == [2017-11-03T08:50:57.425Z] [INFO] EtcdHost: 0.0.0.0 [2017-11-03T08:50:57.425Z] [INFO] EtcdPort: 2379 [2017-11-03T08:50:57.425Z] [INFO] Endpoints: [http://0.0.0.0:2379] [2017-11-03T08:50:58.425Z] [ERROR] client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://0.0.0.0:2379 exceeded header timeout [root@host-172-16-120-47 ~]# oc logs po/asb-4-hhwnh -c etcd 2017-11-03 08:50:36.288349 I | etcdmain: etcd Version: 3.2.7 2017-11-03 08:50:36.301626 I | etcdmain: Git SHA: bb66589 2017-11-03 08:50:36.301631 I | etcdmain: Go Version: go1.8.3 2017-11-03 08:50:36.301634 I | etcdmain: Go OS/Arch: linux/amd64 2017-11-03 08:50:36.301637 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4 2017-11-03 08:50:36.301678 N | etcdmain: the server is already initialized as member before, starting as etcd member... 2017-11-03 08:50:36.306260 I | embed: listening for peers on http://localhost:2380 2017-11-03 08:50:36.306312 I | embed: listening for client requests on 0.0.0.0:2379 2017-11-03 08:50:37.306709 W | etcdserver: another etcd process is using "/data/member/snap/db" and holds the file lock. 2017-11-03 08:50:37.306732 W | etcdserver: waiting for it to exit before starting... [root@host-172-16-120-47 ~]# oc logs po/asb-4-deploy --> Scaling up asb-4 from 0 to 1, scaling down asb-3 from 1 to 0 (keep 1 pods available, don't exceed 2 pods) Scaling asb-4 up to 1
Shortly after this change was submitted, work was done to support authentication on our internal etcd and that work also split up our broker and etcd deployments. These changes (https://bugzilla.redhat.com/show_bug.cgi?id=1507617) remove the need for this change. Closing this bug. *** This bug has been marked as a duplicate of bug 1507617 ***