Bug 1658018 - re-running playbook fails the task "Create the Broker resource in the catalog"
Summary: re-running playbook fails the task "Create the Broker resource in the catalog"
Keywords:
Status: CLOSED DUPLICATE of bug 1656925
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Catalog
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jay Boyd
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-11 03:33 UTC by Kenjiro Nakayama
Modified: 2022-03-13 16:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-14 10:06:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3755511 0 None None None 2018-12-14 02:51:24 UTC

Description Kenjiro Nakayama 2018-12-11 03:33:29 UTC
Description of problem:

- re-running playbook fails the task "Create the Broker resource in the catalog"

````
  Failure summary:

    1. Hosts:    xxx.example.com
       Play:     Service Catalog
       Task:     Create the Broker resource in the catalog
       Message:  {u'cmd': u'/usr/bin/oc create -f /tmp/brokerout-PsL0ai -n default', u'returncode': 1, u'results': {}, u'stderr': u'Error from server (ServiceUnavailable): error when creating "/tmp/brokerout-PsL0ai": the server is currently unable to handle the request (post clusterservicebrokers.servicecatalog.k8s.io)\n', u'stdout': u''}
```

Version-Release number of selected component (if applicable):

  # rpm -qa | grep ansible
  openshift-ansible-roles-3.10.66-1.git.0.3c3a83a.el7.noarch
  ansible-2.4.6.0-1.el7ae.noarch
  openshift-ansible-playbooks-3.10.66-1.git.0.3c3a83a.el7.noarch
  openshift-ansible-3.10.66-1.git.0.3c3a83a.el7.noarch
  openshift-ansible-docs-3.10.66-1.git.0.3c3a83a.el7.noarch

  # oc version
  oc v3.10.72
  kubernetes v1.10.0+b81c8f8
  features: Basic-Auth GSSAPI Kerberos SPNEGO

  Server https://xxx.example.com:8443
  openshift v3.10.72
  kubernetes v1.10.0+b81c8f8

How reproducible: 100% on customer's env. But 0% on my env.

Steps to Reproduce:
1. # ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml -vvv | tee deploy.log
2. Edit inventory file (remove following variables)

----
openshift_hosted_etcd_storage_kind=dynamic 
openshift_hosted_etcd_storage_volume_name=etcd-vol 
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"] 
openshift_hosted_etcd_storage_volume_size=1Gi
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}
----

3. # ansible-playbook  /usr/share/ansible/openshift-ansible/playbooks/openshift-service-catalog/config.yml -vvv | tee service-catalog-deploy.log

Actual results:
- Got following error:

       Message:  {u'cmd': u'/usr/bin/oc create -f /tmp/brokerout-PsL0ai -n default', u'returncode': 1, u'results': {}, u'stderr': u'Error from server (ServiceUnavailable): error when creating "/tmp/brokerout-PsL0ai": the server is currently unable to handle the request (post clusterservicebrokers.servicecatalog.k8s.io)\n', u'stdout': u''}

Expected results:
- No error.

Additional info:
- Manual creation worked fine.

  # oc get clusterservicebroker --export -o yaml ansible-service-broker > asb-backup.yaml
  # oc delete clusterservicebroker ansible-service-broker
  # oc create -f asb-backup.yaml

Comment 4 Jay Boyd 2018-12-11 14:18:35 UTC
re-install of service catalog runs fine.  Prior to including the install of Ansible Service Broker, install validates the healthz endpoint of Service Catalog, all is good.

Service Catalog's Ansible playbook then includes installing Ansible Service Broker.  Tasks all run fine until we get to

TASK [ansible_service_broker : Create the Broker resource in the catalog]

which ends with

"Error from server (ServiceUnavailable): error when creating \\"/tmp/brokerout-PsL0ai\\": the server is currently unable to handle the request (post clusterservicebrokers.servicecatalog.k8s.io)"


I have a feeling the Catalog API Server is briefly unavailable after it runs through install and prior to ASB trying to create a broker.  Perhaps running `oc get events -n kube-service-catalog -w` prior to kicking off the openshift-service-catalog/config.yml playbook would help.  I appreciate the collection of the apiserver logs from Service Catalog but they all have log entries that start at 02:28:03 or so - from the Ansible logs I imagine this is about 25+ minutes after the failure condition.

To zero in on the error we need to gather additional diagnostic details during the running of the playbook.  If possible:
1) Prior to kicking off the playbook run the `oc get events -n kube-service-catalog -w` and save to a file.  Hopefully this will continue running through out the reproduce.  If possible, monitor it and restart the command if it terminates.

2) immediately when the failure is detected do a `oc describe pod` on each of the service catalog apiserver pods and capture the output.

3) immediately after #3 get the logs for the catalog api servers.

Comment 5 Jay Boyd 2018-12-11 15:32:18 UTC
most likely a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1656925

Comment 9 Kenjiro Nakayama 2018-12-13 07:48:30 UTC
Yes, we confirmed that the event logs revealed the liveness probe was failing so delete it from ansible template file. (oc edit daemonset did not help as it was overwritten by re-running playbook). Jay, can we ask you to fix this issue on 3.10 playbook? Maybe tweaking livness&readiness would be fine?

Comment 11 Mitchell Rollinson 2018-12-14 04:20:06 UTC
*** Bug 1659198 has been marked as a duplicate of this bug. ***

Comment 12 Jay Boyd 2018-12-14 10:06:20 UTC

*** This bug has been marked as a duplicate of bug 1656925 ***


Note You need to log in before you can comment on or make changes to this bug.