1585648 – [upgrade]should stop creating asb-etcd-migration pod after the task' wait for migration to complete' failed

Bug 1585648 - [upgrade]should stop creating asb-etcd-migration pod after the task' wait for migration to complete' failed

Summary: [upgrade]should stop creating asb-etcd-migration pod after the task' wait for...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Service Broker
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Fabian von Feilitzsch
QA Contact:	Zihan Tang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-04 09:45 UTC by Zihan Tang
Modified:	2018-07-30 19:17 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-07-30 19:16:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	https://github.com/kubernetes kubernetes issues 62382	0	None	None	None	2020-07-22 17:18:26 UTC
Red Hat Product Errata	RHBA-2018:1816	0	None	None	None	2018-07-30 19:17:31 UTC

Description Zihan Tang 2018-06-04 09:45:37 UTC

Description of problem:
when OCP upgrade from 3.9 to 3.10. if the task [ansible_service_broker : wait for migration to complete] failed, it is still creating asb-etcd-migration pod in openshift-ansible-service-broker ns, which will drain off a lot of resource.

Version-Release number of selected component (if applicable):
openshift-ansible: 3.10.0-0.58.0

How reproducible:
always

Steps to Reproduce:
1. Install OCP v3.9 , etcd pod status is error, this will cause etcd task fails when upgrade to 3.10

# oc get pod 
NAME                       READY     STATUS              RESTARTS   AGE
asb-1-deploy               0/1       Error               0          1h
asb-etcd-1-deploy          0/1       Error               0          1h

2. update to 3.10
3. check TASK [ansible_service_broker : Migrate from etcd to CustomResources] 

Actual results:
3. task 'Migrate from etcd to CustomResources ' will fail, but it is still generating 
asb-etcd-migration pod even the task stop retry. oc logs -f asb-etcd-migration-wtfpv 
time="2018-06-04T09:12:57Z" level=info msg="etcd configuration: {asb-etcd.openshift-ansible-service-broker.svc /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt /var/run/asb-etcd-auth/client.crt /var/run/asb-etcd-auth/client.key 2379}"
time="2018-06-04T09:12:57Z" level=info msg="== ETCD CX =="
time="2018-06-04T09:12:57Z" level=info msg="EtcdHost: asb-etcd.openshift-ansible-service-broker.svc"
time="2018-06-04T09:12:57Z" level=info msg="EtcdPort: 2379"
time="2018-06-04T09:12:57Z" level=info msg="Endpoints: [https://asb-etcd.openshift-ansible-service-broker.svc:2379]"
2018/06/04 09:12:57 Dao::BatchGetRaw
panic: Unable to get all specs from etcd - client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint https://asb-etcd.openshift-ansible-service-broker.svc:2379 exceeded header timeout

goroutine 1 [running]:
main.main()
	/builddir/build/BUILD/ansible-service-broker-1.2.16/cmd/migration/main.go:90 +0x3c16

[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod 
NAME                       READY     STATUS    RESTARTS   AGE
asb-1-deploy               0/1       Error     0          1h
asb-etcd-1-deploy          0/1       Error     0          1h
asb-etcd-migration-2lnqd   0/1       Error     0          3m
asb-etcd-migration-2z2cp   0/1       Error     0          51s
asb-etcd-migration-45k5g   0/1       Error     0          2m
asb-etcd-migration-4r6qb   0/1       Error     0          3m
asb-etcd-migration-4sp6q   0/1       Error     0          2m
asb-etcd-migration-585vc   0/1       Error     0          3m
asb-etcd-migration-5mgjn   0/1       Error     0          3m


[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod | wc -l 
81
[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod -n openshift-ansible-service-broker | wc -l 
84
[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod -n openshift-ansible-service-broker | wc -l 
136

After the job completed, it is still creating migrating pod.
[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod -n openshift-ansible-service-broker | wc -l 
305


Expected results:
it should stop creating asb-etcd-migration pod after the task stop retry.

Additional info:

Comment 2 Fabian von Feilitzsch 2018-06-04 16:42:49 UTC

This looks like the result of an upstream kubernetes bug related to Jobs, where the backoffLimit is no longer respected: https://github.com/kubernetes/kubernetes/issues/62382

Comment 3 Fabian von Feilitzsch 2018-06-04 17:40:38 UTC

https://github.com/openshift/openshift-ansible/pull/8625

Comment 4 openshift-github-bot 2018-06-04 19:11:09 UTC

Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/b3706ea39c192e306728358a365644d2db25419f
Bug 1585648- Set timeout for ASB migration job (workaround for kubernetes/kubernetes#62382)

https://github.com/openshift/openshift-ansible/commit/72428990fb8fa27cdda26238d49a27c7daf9ad3f
Merge pull request #8625 from fabianvf/bz1585648

Bug 1585648- Set timeout for ASB migration job

Comment 6 Zihan Tang 2018-06-06 05:46:37 UTC

Using the fix as workaround , the migration job stop creating pod after timeout.
# oc get pod  -n openshift-ansible-service-broker| wc -l
145

So marked as verified.
version:  openshift-ansible-3.10.0-0.60.0

Comment 8 errata-xmlrpc 2018-07-30 19:16:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Note You need to log in before you can comment on or make changes to this bug.