Bug 1585648 - [upgrade]should stop creating asb-etcd-migration pod after the task' wait for migration to complete' failed
Summary: [upgrade]should stop creating asb-etcd-migration pod after the task' wait for...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 3.10.0
Assignee: Fabian von Feilitzsch
QA Contact: Zihan Tang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-04 09:45 UTC by Zihan Tang
Modified: 2018-07-30 19:17 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-07-30 19:16:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/kubernetes kubernetes issues 62382 0 None None None 2020-07-22 17:18:26 UTC
Red Hat Product Errata RHBA-2018:1816 0 None None None 2018-07-30 19:17:31 UTC

Description Zihan Tang 2018-06-04 09:45:37 UTC
Description of problem:
when OCP upgrade from 3.9 to 3.10. if the task [ansible_service_broker : wait for migration to complete] failed, it is still creating asb-etcd-migration pod in openshift-ansible-service-broker ns, which will drain off a lot of resource.

Version-Release number of selected component (if applicable):
openshift-ansible: 3.10.0-0.58.0

How reproducible:
always

Steps to Reproduce:
1. Install OCP v3.9 , etcd pod status is error, this will cause etcd task fails when upgrade to 3.10

# oc get pod 
NAME                       READY     STATUS              RESTARTS   AGE
asb-1-deploy               0/1       Error               0          1h
asb-etcd-1-deploy          0/1       Error               0          1h

2. update to 3.10
3. check TASK [ansible_service_broker : Migrate from etcd to CustomResources] 

Actual results:
3. task 'Migrate from etcd to CustomResources ' will fail, but it is still generating 
asb-etcd-migration pod even the task stop retry. oc logs -f asb-etcd-migration-wtfpv 
time="2018-06-04T09:12:57Z" level=info msg="etcd configuration: {asb-etcd.openshift-ansible-service-broker.svc /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt /var/run/asb-etcd-auth/client.crt /var/run/asb-etcd-auth/client.key 2379}"
time="2018-06-04T09:12:57Z" level=info msg="== ETCD CX =="
time="2018-06-04T09:12:57Z" level=info msg="EtcdHost: asb-etcd.openshift-ansible-service-broker.svc"
time="2018-06-04T09:12:57Z" level=info msg="EtcdPort: 2379"
time="2018-06-04T09:12:57Z" level=info msg="Endpoints: [https://asb-etcd.openshift-ansible-service-broker.svc:2379]"
2018/06/04 09:12:57 Dao::BatchGetRaw
panic: Unable to get all specs from etcd - client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint https://asb-etcd.openshift-ansible-service-broker.svc:2379 exceeded header timeout

goroutine 1 [running]:
main.main()
	/builddir/build/BUILD/ansible-service-broker-1.2.16/cmd/migration/main.go:90 +0x3c16

[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod 
NAME                       READY     STATUS    RESTARTS   AGE
asb-1-deploy               0/1       Error     0          1h
asb-etcd-1-deploy          0/1       Error     0          1h
asb-etcd-migration-2lnqd   0/1       Error     0          3m
asb-etcd-migration-2z2cp   0/1       Error     0          51s
asb-etcd-migration-45k5g   0/1       Error     0          2m
asb-etcd-migration-4r6qb   0/1       Error     0          3m
asb-etcd-migration-4sp6q   0/1       Error     0          2m
asb-etcd-migration-585vc   0/1       Error     0          3m
asb-etcd-migration-5mgjn   0/1       Error     0          3m


[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod | wc -l 
81
[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod -n openshift-ansible-service-broker | wc -l 
84
[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod -n openshift-ansible-service-broker | wc -l 
136

After the job completed, it is still creating migrating pod.
[root@qe-zitang-39-3master-etcd-1 ~]# oc get pod -n openshift-ansible-service-broker | wc -l 
305


Expected results:
it should stop creating asb-etcd-migration pod after the task stop retry.

Additional info:

Comment 2 Fabian von Feilitzsch 2018-06-04 16:42:49 UTC
This looks like the result of an upstream kubernetes bug related to Jobs, where the backoffLimit is no longer respected: https://github.com/kubernetes/kubernetes/issues/62382

Comment 3 Fabian von Feilitzsch 2018-06-04 17:40:38 UTC
https://github.com/openshift/openshift-ansible/pull/8625

Comment 4 openshift-github-bot 2018-06-04 19:11:09 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/b3706ea39c192e306728358a365644d2db25419f
Bug 1585648- Set timeout for ASB migration job (workaround for kubernetes/kubernetes#62382)

https://github.com/openshift/openshift-ansible/commit/72428990fb8fa27cdda26238d49a27c7daf9ad3f
Merge pull request #8625 from fabianvf/bz1585648

Bug 1585648- Set timeout for ASB migration job

Comment 6 Zihan Tang 2018-06-06 05:46:37 UTC
Using the fix as workaround , the migration job stop creating pod after timeout.
# oc get pod  -n openshift-ansible-service-broker| wc -l
145

So marked as verified.
version:  openshift-ansible-3.10.0-0.60.0

Comment 8 errata-xmlrpc 2018-07-30 19:16:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.