1504957 – Ansible Service Broker should use recreate deployment strategy

Bug 1504957 - Ansible Service Broker should use recreate deployment strategy

Summary: Ansible Service Broker should use recreate deployment strategy

Keywords:
Status:	CLOSED DUPLICATE of bug 1507617
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Service Broker
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.7.0
Assignee:	David Zager
QA Contact:	Qixuan Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-20 19:47 UTC by David Zager
Modified:	2017-11-03 13:29 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-03 13:29:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description David Zager 2017-10-20 19:47:59 UTC

Description of problem:

Since the ansible service broker pod contains etcd and asb containers, a rolling update strategy will not work b/c the "new" deployment will never be able to start etcd (because the first deployment has the lock). Using the recreate strategy, the new deployment can retrieve the appropriate lock.

How reproducible: Always


Steps to Reproduce:
1. Deploy an Openshift Cluster + Service Catalog + Ansible Service Broker
2. Redeploy the broker `oc deploy asb --latest`

Actual results:

$ oc logs asb-2-z4kg8 -c asb
Using config file mounted to /etc/ansible-service-broker/config.yaml
============================================================
==           Starting Ansible Service Broker...           ==
============================================================
[2017-10-20T15:09:13.108Z] [NOTICE] Initializing clients...
[2017-10-20T15:09:13.108Z] [DEBUG] Trying to connect to etcd
[2017-10-20T15:09:13.108Z] [INFO] == ETCD CX ==
[2017-10-20T15:09:13.108Z] [INFO] EtcdHost: 0.0.0.0
[2017-10-20T15:09:13.108Z] [INFO] EtcdPort: 2379
[2017-10-20T15:09:13.108Z] [INFO] Endpoints: [http://0.0.0.0:2379]
[2017-10-20T15:09:14.108Z] [ERROR] client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://0.0.0.0:2379 exceeded header timeout

$ oc logs asb-2-deploy
--> Scaling up asb-2 from 0 to 1, scaling down asb-1 from 1 to 0 (keep 1 pods available, don't exceed 2 pods)
    Scaling asb-2 up to 1

$ oc logs asb-1-31h7l -c etcd
2017-10-20 14:49:07.879155 I | etcdmain: etcd Version: 3.2.9
2017-10-20 14:49:07.879220 I | etcdmain: Git SHA: f1d7dd8
2017-10-20 14:49:07.879225 I | etcdmain: Go Version: go1.8.4
2017-10-20 14:49:07.879234 I | etcdmain: Go OS/Arch: linux/amd64
2017-10-20 14:49:07.879250 I | etcdmain: setting maximum number of CPUs to 16, total number of available CPUs is 16
2017-10-20 14:49:07.879510 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-10-20 14:49:07.880992 I | embed: listening for peers on http://localhost:2380
2017-10-20 14:49:07.881067 I | embed: listening for client requests on 0.0.0.0:2379
2017-10-20 14:49:08.881731 W | etcdserver: another etcd process is using "/data/member/snap/db" and holds the file lock.
2017-10-20 14:49:08.882677 W | etcdserver: waiting for it to exit before starting...

Expected results:

The original deployment to be scaled down before bringing up new deployment.

Comment 1 David Zager 2017-10-27 14:28:30 UTC

https://github.com/openshift/ansible-service-broker/pull/511

Comment 3 Qixuan Wang 2017-11-03 08:22:28 UTC

Tested on OCP (openshift v3.7.0-0.190.0, kubernetes v1.7.6+a08f5eeb62, etcd 3.2.8, registry.reg-aws.openshift.com:443/openshift3/ose-ansible-service-broker:v3.7.0-0.190.0, registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.7.0-0.190.0)

I met this issue, could you help to look at it's a bug or where I misconfigure, thanks.

[root@host-172-16-120-121 ~]# oc get all -n ansible-service-broker 
NAME                         REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfigs/asb        2          1         1         config
deploymentconfigs/asb-etcd   1          1         1         config

NAME              HOST/PORT                                                      PATH      SERVICES   PORT        TERMINATION   WILDCARD
routes/asb-1338   asb-1338-ansible-service-broker.apps.1103-qtr.qe.rhcloud.com             asb        port-1338   reencrypt     None

NAME                  READY     STATUS             RESTARTS   AGE
po/asb-2-trlzf        0/1       CrashLoopBackOff   5          3m
po/asb-etcd-1-m6r9r   1/1       Running            0          3m

NAME            DESIRED   CURRENT   READY     AGE
rc/asb-2        1         1         0         4m
rc/asb-etcd-1   1         1         1         4m

NAME           CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
svc/asb        172.30.156.76   <none>        1338/TCP   14m
svc/asb-etcd   172.30.206.95   <none>        2379/TCP   14m

[root@host-172-16-120-121 ~]# oc logs po/asb-2-trlzf
Using config file mounted to /etc/ansible-service-broker/config.yaml
============================================================
==           Starting Ansible Service Broker...           ==
============================================================
[2017-11-03T08:08:11.615Z] [NOTICE] Initializing clients...
[2017-11-03T08:08:11.615Z] [DEBUG] Trying to connect to etcd
[2017-11-03T08:08:11.615Z] [INFO] == ETCD CX ==
[2017-11-03T08:08:11.615Z] [INFO] EtcdHost: asb-etcd.ansible-service-broker.svc
[2017-11-03T08:08:11.615Z] [INFO] EtcdPort: 2379
[2017-11-03T08:08:11.615Z] [INFO] Endpoints: [http://asb-etcd.ansible-service-broker.svc:2379]
[2017-11-03T08:08:11.619Z] [ERROR] client: etcd cluster is unavailable or misconfigured; error #0: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

[root@host-172-16-120-121 ~]# oc logs po/asb-2-trlzf -c etcd
Error from server (BadRequest): container etcd is not valid for pod asb-2-trlzf

Comment 4 Qixuan Wang 2017-11-03 08:55:05 UTC

I tested on other cluster (openshift v3.7.0-0.190.0, kubernetes v1.7.6+a08f5eeb62 ,etcd 3.2.8, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-ansible-service-broker:v3.7.0-0.189.0.0, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-ansible-service-broker:v3.7.0-0.189.0.0), the bug can be reproduced.


[root@host-172-16-120-47 ~]# oc get all
NAME                    REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfigs/asb   3          1         1         config

NAME              HOST/PORT                                                                PATH      SERVICES   PORT      TERMINATION   WILDCARD
routes/asb-1338   asb-1338-openshift-ansible-service-broker.apps.1103-5th.qe.rhcloud.com             asb        1338      reencrypt     None

NAME             READY     STATUS    RESTARTS   AGE
po/asb-3-gbc9b   2/2       Running   1          41s

NAME       DESIRED   CURRENT   READY     AGE
rc/asb-3   1         1         1         44s

NAME      CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
svc/asb   172.30.164.136   <none>        1338/TCP   5h

[root@host-172-16-120-47 ~]# oc rollout latest asb
deploymentconfig "asb" rolled out

[root@host-172-16-120-47 ~]# oc get all
NAME                    REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfigs/asb   4          1         1         config

NAME              HOST/PORT                                                                PATH      SERVICES   PORT      TERMINATION   WILDCARD
routes/asb-1338   asb-1338-openshift-ansible-service-broker.apps.1103-5th.qe.rhcloud.com             asb        1338      reencrypt     None

NAME              READY     STATUS    RESTARTS   AGE
po/asb-3-gbc9b    2/2       Running   1          1m
po/asb-4-deploy   1/1       Running   0          30s
po/asb-4-hhwnh    1/2       Error     2          27s

NAME       DESIRED   CURRENT   READY     AGE
rc/asb-3   1         1         1         1m
rc/asb-4   1         1         0         30s

[root@host-172-16-120-47 ~]# oc logs po/asb-4-hhwnh -c asb
Using config file mounted to /etc/ansible-service-broker/config.yaml
============================================================
==           Starting Ansible Service Broker...           ==
============================================================
[2017-11-03T08:50:57.425Z] [NOTICE] Initializing clients...
[2017-11-03T08:50:57.425Z] [INFO] == ETCD CX ==
[2017-11-03T08:50:57.425Z] [INFO] EtcdHost: 0.0.0.0
[2017-11-03T08:50:57.425Z] [INFO] EtcdPort: 2379
[2017-11-03T08:50:57.425Z] [INFO] Endpoints: [http://0.0.0.0:2379]
[2017-11-03T08:50:58.425Z] [ERROR] client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://0.0.0.0:2379 exceeded header timeout

[root@host-172-16-120-47 ~]# oc logs po/asb-4-hhwnh -c etcd
2017-11-03 08:50:36.288349 I | etcdmain: etcd Version: 3.2.7
2017-11-03 08:50:36.301626 I | etcdmain: Git SHA: bb66589
2017-11-03 08:50:36.301631 I | etcdmain: Go Version: go1.8.3
2017-11-03 08:50:36.301634 I | etcdmain: Go OS/Arch: linux/amd64
2017-11-03 08:50:36.301637 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-11-03 08:50:36.301678 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-11-03 08:50:36.306260 I | embed: listening for peers on http://localhost:2380
2017-11-03 08:50:36.306312 I | embed: listening for client requests on 0.0.0.0:2379
2017-11-03 08:50:37.306709 W | etcdserver: another etcd process is using "/data/member/snap/db" and holds the file lock.
2017-11-03 08:50:37.306732 W | etcdserver: waiting for it to exit before starting...

[root@host-172-16-120-47 ~]# oc logs po/asb-4-deploy
--> Scaling up asb-4 from 0 to 1, scaling down asb-3 from 1 to 0 (keep 1 pods available, don't exceed 2 pods)
    Scaling asb-4 up to 1

Comment 5 David Zager 2017-11-03 13:29:25 UTC

Shortly after this change was submitted, work was done to support authentication on our internal etcd and that work also split up our broker and etcd deployments. These changes (https://bugzilla.redhat.com/show_bug.cgi?id=1507617) remove the need for this change. Closing this bug.

*** This bug has been marked as a duplicate of bug 1507617 ***

Note You need to log in before you can comment on or make changes to this bug.