Bug 1656925 - Upgrade Fails at TASK [ansible_service_broker : Create the Broker resource in the catalog]
Summary: Upgrade Fails at TASK [ansible_service_broker : Create the Broker resource in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Catalog
Version: 3.10.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 3.10.z
Assignee: Jay Boyd
QA Contact: Jian Zhang
URL:
Whiteboard:
: 1658018 1659198 (view as bug list)
Depends On:
Blocks: 1661569
TreeView+ depends on / blocked
 
Reported: 2018-12-06 16:39 UTC by Josh Foots
Modified: 2022-03-13 16:23 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The 3.10.72 update added health and liveness probes for the Service Catalog pods. Install was not waiting for the update rollout to finish before proceeding to update Ansible Service Broker. Because of timing, the Service Catalog pods were unavailable when the Broker attempted to register. Consequence: Ansible Service Broker update failed with an error indicating "the server is currently unable to handle the request (post clusterservicebrokers.servicecatalog.k8s.io)" Fix: Installation was updated to wait for the Service Catalog update rollout to finish before proceeding with installing Ansible Service Broker.
Clone Of:
: 1661569 (view as bug list)
Environment:
Last Closed: 2019-01-30 15:13:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc describe output in answer to comment #17 (10.66 KB, text/plain)
2018-12-12 16:53 UTC, Jack Ottofaro
no flags Details
oc describe etc output in answer to comment #21 (54.99 KB, text/plain)
2018-12-13 14:24 UTC, Jack Ottofaro
no flags Details
test fix - wait for Service Catalog rollout to finish (2.93 KB, patch)
2018-12-14 10:20 UTC, Jay Boyd
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3755511 0 None None None 2018-12-14 02:51:35 UTC
Red Hat Knowledge Base (Solution) 3979161 0 Upgrade None OpenShift Service Catalog upgrade fails due to AlreadyExists 2019-03-12 00:05:51 UTC
Red Hat Product Errata RHBA-2019:0206 0 None None None 2019-01-30 15:13:25 UTC

Comment 17 Jay Boyd 2018-12-12 13:49:48 UTC
@Jack- I'm looking for the 'oc describe pod' output for the apiserver pods in the kube-service-catalog namespace.  ie

oc get pods -n kube-service-catalog

and then for each of the apiserver pods listed:

oc describe pod -n kube-service-catalog  api-server-pod-name

Given https://bugzilla.redhat.com/show_bug.cgi?id=1656925#c16 I have a feeling the events listed are goign to indicate the pod was restarted or taken out of service because of a liveness or readiness probe failure.  It would help if anyone can confirm this.

Comment 18 Jay Boyd 2018-12-12 14:21:34 UTC
Associated comment #17, if the oc describe output indicates the pods are being restarted because of liveness probe failures, I'd really like to get the associated log output for the `apiserver` container within the affected pods during the time interval of the failures.

Comment 19 Jack Ottofaro 2018-12-12 16:53:40 UTC
Created attachment 1513720 [details]
oc describe output in answer to comment #17

Comment 23 Jack Ottofaro 2018-12-13 14:24:08 UTC
Created attachment 1514073 [details]
oc describe etc output in answer to comment #21

Comment 27 btai 2018-12-13 17:39:48 UTC
stumbled on the same bug when tried to upgarde from 3.9 -> 3.10 and failed in post control plane upgrade. For us also it was timining issue 
because the run of this play 
usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_components.yml
launches the api-server and controller manager sucessfully. But the after that asb post fails with this error ,stderr": "Error from server (ServiceUnavailable):

Having verified that the service kube-catalog-service has both api-server and controller-manager  running, i again launched the playbook /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_components.yml but this time commenting out the the service catalog part

 tasks:
#  - import_role:
#      name: openshift_service_catalog
#      tasks_from: install.yml
#    when:
#    - openshift_enable_service_catalog | default(true) | bool

Then the playbook run succeded.

Comment 30 Jay Boyd 2018-12-14 10:06:20 UTC
*** Bug 1658018 has been marked as a duplicate of this bug. ***

Comment 31 Jay Boyd 2018-12-14 10:07:14 UTC
*** Bug 1659198 has been marked as a duplicate of this bug. ***

Comment 32 Jay Boyd 2018-12-14 10:20:56 UTC
Created attachment 1514320 [details]
test fix - wait for Service Catalog rollout to finish

Test fix that waits for the rollout of Service Catalog before proceeding.  This file would replace openshift-ansible/roles/openshift_service_catalog/tasks/start.yml in both 3.10 and 3.11.

Comment 33 Jay Boyd 2018-12-14 10:23:14 UTC
I have attached a test fix that waits for the rollout of Service Catalog before proceeding.  This file would replace openshift-ansible/roles/openshift_service_catalog/tasks/start.yml in both 3.10 and 3.11.  I'd appreciate feedback from anyone that is encountering this error and is willing to retry with this in place.

Comment 35 Jay Boyd 2018-12-17 17:17:18 UTC
Has anyone else attempted to work through this issue with the attached fix?  We haven't been able to reproduce the original issue here and I'd like to get additional confirmation this test fix works for multiple deployments.

Comment 36 Robert Bost 2018-12-20 22:04:46 UTC
(In reply to Jay Boyd from comment #35)
> Has anyone else attempted to work through this issue with the attached fix? 
> We haven't been able to reproduce the original issue here and I'd like to
> get additional confirmation this test fix works for multiple deployments.

I had a customer seeing this issue. Using the patched start.yml that introduced the wait tasks allowed us to move past the "Create the Broker resource in the catalog" task.

Comment 37 Jay Boyd 2018-12-21 15:37:06 UTC
Thanks Robert.

I have delivered this fix to 3.10.z with https://github.com/openshift/openshift-ansible/pull/10883

and created https://bugzilla.redhat.com/show_bug.cgi?id=1661569 for tracking delivery to 3.11.z

Comment 39 Jian Zhang 2019-01-17 11:25:47 UTC
LGTM, verify it. Details as below:

1, original OCP cluster 3.10.45:
[root@ip-172-18-4-171 ~]# oc version
oc v3.10.45
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-4-171.ec2.internal:8443
openshift v3.10.45
kubernetes v1.10.0+b81c8f8

[root@ip-172-18-4-171 ~]# oc get pods -n kube-service-catalog 
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-x79fv            1/1       Running   0          11m
controller-manager-cptk9   1/1       Running   0          11m
[root@ip-172-18-4-171 ~]# oc get clusterservicebroker
NAME                      AGE
ansible-service-broker    10m
template-service-broker   10m

2, Upgrade it to the latest version of 3.10. Upgrade success.
[root@ip-172-18-4-171 ~]# oc version
oc v3.10.101
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-4-171.ec2.internal:8443
openshift v3.10.102
kubernetes v1.10.0+b81c8f8

[root@ip-172-18-4-171 ~]# oc get pods -n kube-service-catalog
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-s2xt9            1/1       Running   0          12m
controller-manager-k8fjk   1/1       Running   0          12m

Correlating logs:

TASK [openshift_service_catalog : Wait for API Server rollout success] *********
task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:2
Thursday 17 January 2019  11:04:12 +0000 (0:00:00.084)       0:18:35.484 ****** 
ok: [ec2-3-90-13-179.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/apiserver"], "delta": "0:00:37.036247", "end": "2019-01-17 06:05:21.965431", "failed": false, "rc": 0, "start": "2019-01-17 06:04:44.929184", "stderr": "", "stderr_lines": [], "stdout": "Waiting for rollout to finish: 0 out of 1 new pods have been updated...\nWaiting for rollout to finish: 0 out of 1 new pods have been updated...\nWaiting for rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"apiserver\" successfully rolled out", "stdout_lines": ["Waiting for rollout to finish: 0 out of 1 new pods have been updated...", "Waiting for rollout to finish: 0 out of 1 new pods have been updated...", "Waiting for rollout to finish: 0 of 1 updated pods are available...", "daemon set \"apiserver\" successfully rolled out"]}

TASK [openshift_service_catalog : Wait for Controller Manager rollout success] ***
task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:14
Thursday 17 January 2019  11:04:49 +0000 (0:00:37.294)       0:19:12.779 ****** 
ok: [ec2-3-90-13-179.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/controller-manager"], "delta": "0:00:07.944249", "end": "2019-01-17 06:05:30.194916", "failed": false, "rc": 0, "start": "2019-01-17 06:05:22.250667", "stderr": "", "stderr_lines": [], "stdout": "Waiting for rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"controller-manager\" successfully rolled out", "stdout_lines": ["Waiting for rollout to finish: 0 of 1 updated pods are available...", "daemon set \"controller-manager\" successfully rolled out"]}

...

TASK [ansible_service_broker : Create the Broker resource in the catalog] ******
task path: /usr/share/ansible/openshift-ansible/roles/ansible_service_broker/tasks/install.yml:217
Thursday 17 January 2019  11:05:23 +0000 (0:00:00.035)       0:19:46.793 ****** 
changed: [ec2-3-90-13-179.compute-1.amazonaws.com] => {"changed": true, "failed": false, "results": {"cmd": "/usr/bin/oc get ClusterServiceBroker ansible-service-broker -o json -n default", "results": [{"apiVersion": "servicecatalog.k8s.io/v1beta1", "kind": "ClusterServiceBroker", "metadata": {"creationTimestamp": "2019-01-17T09:49:30Z", "generation": 1, "name": "ansible-service-broker", "resourceVersion": "14742", "selfLink": "/apis/servicecatalog.k8s.io/v1beta1/clusterservicebrokers/ansible-service-broker", "uid": "2f5888c4-1a3d-11e9-9bbe-0a580a800005"}, "spec": {"authInfo": {"bearer": {"secretRef": {"name": "asb-client", "namespace": "openshift-ansible-service-broker"}}}, "caBundle": "xxx", "relistBehavior": "Duration", "relistDuration": "15m0s", "relistRequests": 0, "url": "https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker"}, "status": {"conditions": [{"lastTransitionTime": "2019-01-17T09:50:00Z", "message": "Successfully fetched catalog entries from broker.", "reason": "FetchedCatalog", "status": "True", "type": "Ready"}], "lastCatalogRetrievalTime": "2019-01-17T10:57:40Z", "reconciledGeneration": 1}}], "returncode": 0}, "state": "present"}

Comment 41 errata-xmlrpc 2019-01-30 15:13:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0206


Note You need to log in before you can comment on or make changes to this bug.