Description of problem: - re-running playbook fails the task "Create the Broker resource in the catalog" ```` Failure summary: 1. Hosts: xxx.example.com Play: Service Catalog Task: Create the Broker resource in the catalog Message: {u'cmd': u'/usr/bin/oc create -f /tmp/brokerout-PsL0ai -n default', u'returncode': 1, u'results': {}, u'stderr': u'Error from server (ServiceUnavailable): error when creating "/tmp/brokerout-PsL0ai": the server is currently unable to handle the request (post clusterservicebrokers.servicecatalog.k8s.io)\n', u'stdout': u''} ``` Version-Release number of selected component (if applicable): # rpm -qa | grep ansible openshift-ansible-roles-3.10.66-1.git.0.3c3a83a.el7.noarch ansible-2.4.6.0-1.el7ae.noarch openshift-ansible-playbooks-3.10.66-1.git.0.3c3a83a.el7.noarch openshift-ansible-3.10.66-1.git.0.3c3a83a.el7.noarch openshift-ansible-docs-3.10.66-1.git.0.3c3a83a.el7.noarch # oc version oc v3.10.72 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://xxx.example.com:8443 openshift v3.10.72 kubernetes v1.10.0+b81c8f8 How reproducible: 100% on customer's env. But 0% on my env. Steps to Reproduce: 1. # ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml -vvv | tee deploy.log 2. Edit inventory file (remove following variables) ---- openshift_hosted_etcd_storage_kind=dynamic openshift_hosted_etcd_storage_volume_name=etcd-vol openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"] openshift_hosted_etcd_storage_volume_size=1Gi openshift_hosted_etcd_storage_labels={'storage': 'etcd'} ---- 3. # ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-service-catalog/config.yml -vvv | tee service-catalog-deploy.log Actual results: - Got following error: Message: {u'cmd': u'/usr/bin/oc create -f /tmp/brokerout-PsL0ai -n default', u'returncode': 1, u'results': {}, u'stderr': u'Error from server (ServiceUnavailable): error when creating "/tmp/brokerout-PsL0ai": the server is currently unable to handle the request (post clusterservicebrokers.servicecatalog.k8s.io)\n', u'stdout': u''} Expected results: - No error. Additional info: - Manual creation worked fine. # oc get clusterservicebroker --export -o yaml ansible-service-broker > asb-backup.yaml # oc delete clusterservicebroker ansible-service-broker # oc create -f asb-backup.yaml
re-install of service catalog runs fine. Prior to including the install of Ansible Service Broker, install validates the healthz endpoint of Service Catalog, all is good. Service Catalog's Ansible playbook then includes installing Ansible Service Broker. Tasks all run fine until we get to TASK [ansible_service_broker : Create the Broker resource in the catalog] which ends with "Error from server (ServiceUnavailable): error when creating \\"/tmp/brokerout-PsL0ai\\": the server is currently unable to handle the request (post clusterservicebrokers.servicecatalog.k8s.io)" I have a feeling the Catalog API Server is briefly unavailable after it runs through install and prior to ASB trying to create a broker. Perhaps running `oc get events -n kube-service-catalog -w` prior to kicking off the openshift-service-catalog/config.yml playbook would help. I appreciate the collection of the apiserver logs from Service Catalog but they all have log entries that start at 02:28:03 or so - from the Ansible logs I imagine this is about 25+ minutes after the failure condition. To zero in on the error we need to gather additional diagnostic details during the running of the playbook. If possible: 1) Prior to kicking off the playbook run the `oc get events -n kube-service-catalog -w` and save to a file. Hopefully this will continue running through out the reproduce. If possible, monitor it and restart the command if it terminates. 2) immediately when the failure is detected do a `oc describe pod` on each of the service catalog apiserver pods and capture the output. 3) immediately after #3 get the logs for the catalog api servers.
most likely a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1656925
Yes, we confirmed that the event logs revealed the liveness probe was failing so delete it from ansible template file. (oc edit daemonset did not help as it was overwritten by re-running playbook). Jay, can we ask you to fix this issue on 3.10 playbook? Maybe tweaking livness&readiness would be fine?
*** Bug 1659198 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of bug 1656925 ***