@Jack- I'm looking for the 'oc describe pod' output for the apiserver pods in the kube-service-catalog namespace. ie oc get pods -n kube-service-catalog and then for each of the apiserver pods listed: oc describe pod -n kube-service-catalog api-server-pod-name Given https://bugzilla.redhat.com/show_bug.cgi?id=1656925#c16 I have a feeling the events listed are goign to indicate the pod was restarted or taken out of service because of a liveness or readiness probe failure. It would help if anyone can confirm this.
Associated comment #17, if the oc describe output indicates the pods are being restarted because of liveness probe failures, I'd really like to get the associated log output for the `apiserver` container within the affected pods during the time interval of the failures.
Created attachment 1513720 [details] oc describe output in answer to comment #17
Created attachment 1514073 [details] oc describe etc output in answer to comment #21
stumbled on the same bug when tried to upgarde from 3.9 -> 3.10 and failed in post control plane upgrade. For us also it was timining issue because the run of this play usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_components.yml launches the api-server and controller manager sucessfully. But the after that asb post fails with this error ,stderr": "Error from server (ServiceUnavailable): Having verified that the service kube-catalog-service has both api-server and controller-manager running, i again launched the playbook /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_components.yml but this time commenting out the the service catalog part tasks: # - import_role: # name: openshift_service_catalog # tasks_from: install.yml # when: # - openshift_enable_service_catalog | default(true) | bool Then the playbook run succeded.
*** Bug 1658018 has been marked as a duplicate of this bug. ***
*** Bug 1659198 has been marked as a duplicate of this bug. ***
Created attachment 1514320 [details] test fix - wait for Service Catalog rollout to finish Test fix that waits for the rollout of Service Catalog before proceeding. This file would replace openshift-ansible/roles/openshift_service_catalog/tasks/start.yml in both 3.10 and 3.11.
I have attached a test fix that waits for the rollout of Service Catalog before proceeding. This file would replace openshift-ansible/roles/openshift_service_catalog/tasks/start.yml in both 3.10 and 3.11. I'd appreciate feedback from anyone that is encountering this error and is willing to retry with this in place.
Has anyone else attempted to work through this issue with the attached fix? We haven't been able to reproduce the original issue here and I'd like to get additional confirmation this test fix works for multiple deployments.
(In reply to Jay Boyd from comment #35) > Has anyone else attempted to work through this issue with the attached fix? > We haven't been able to reproduce the original issue here and I'd like to > get additional confirmation this test fix works for multiple deployments. I had a customer seeing this issue. Using the patched start.yml that introduced the wait tasks allowed us to move past the "Create the Broker resource in the catalog" task.
Thanks Robert. I have delivered this fix to 3.10.z with https://github.com/openshift/openshift-ansible/pull/10883 and created https://bugzilla.redhat.com/show_bug.cgi?id=1661569 for tracking delivery to 3.11.z
LGTM, verify it. Details as below: 1, original OCP cluster 3.10.45: [root@ip-172-18-4-171 ~]# oc version oc v3.10.45 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-4-171.ec2.internal:8443 openshift v3.10.45 kubernetes v1.10.0+b81c8f8 [root@ip-172-18-4-171 ~]# oc get pods -n kube-service-catalog NAME READY STATUS RESTARTS AGE apiserver-x79fv 1/1 Running 0 11m controller-manager-cptk9 1/1 Running 0 11m [root@ip-172-18-4-171 ~]# oc get clusterservicebroker NAME AGE ansible-service-broker 10m template-service-broker 10m 2, Upgrade it to the latest version of 3.10. Upgrade success. [root@ip-172-18-4-171 ~]# oc version oc v3.10.101 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-4-171.ec2.internal:8443 openshift v3.10.102 kubernetes v1.10.0+b81c8f8 [root@ip-172-18-4-171 ~]# oc get pods -n kube-service-catalog NAME READY STATUS RESTARTS AGE apiserver-s2xt9 1/1 Running 0 12m controller-manager-k8fjk 1/1 Running 0 12m Correlating logs: TASK [openshift_service_catalog : Wait for API Server rollout success] ********* task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:2 Thursday 17 January 2019 11:04:12 +0000 (0:00:00.084) 0:18:35.484 ****** ok: [ec2-3-90-13-179.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/apiserver"], "delta": "0:00:37.036247", "end": "2019-01-17 06:05:21.965431", "failed": false, "rc": 0, "start": "2019-01-17 06:04:44.929184", "stderr": "", "stderr_lines": [], "stdout": "Waiting for rollout to finish: 0 out of 1 new pods have been updated...\nWaiting for rollout to finish: 0 out of 1 new pods have been updated...\nWaiting for rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"apiserver\" successfully rolled out", "stdout_lines": ["Waiting for rollout to finish: 0 out of 1 new pods have been updated...", "Waiting for rollout to finish: 0 out of 1 new pods have been updated...", "Waiting for rollout to finish: 0 of 1 updated pods are available...", "daemon set \"apiserver\" successfully rolled out"]} TASK [openshift_service_catalog : Wait for Controller Manager rollout success] *** task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:14 Thursday 17 January 2019 11:04:49 +0000 (0:00:37.294) 0:19:12.779 ****** ok: [ec2-3-90-13-179.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/controller-manager"], "delta": "0:00:07.944249", "end": "2019-01-17 06:05:30.194916", "failed": false, "rc": 0, "start": "2019-01-17 06:05:22.250667", "stderr": "", "stderr_lines": [], "stdout": "Waiting for rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"controller-manager\" successfully rolled out", "stdout_lines": ["Waiting for rollout to finish: 0 of 1 updated pods are available...", "daemon set \"controller-manager\" successfully rolled out"]} ... TASK [ansible_service_broker : Create the Broker resource in the catalog] ****** task path: /usr/share/ansible/openshift-ansible/roles/ansible_service_broker/tasks/install.yml:217 Thursday 17 January 2019 11:05:23 +0000 (0:00:00.035) 0:19:46.793 ****** changed: [ec2-3-90-13-179.compute-1.amazonaws.com] => {"changed": true, "failed": false, "results": {"cmd": "/usr/bin/oc get ClusterServiceBroker ansible-service-broker -o json -n default", "results": [{"apiVersion": "servicecatalog.k8s.io/v1beta1", "kind": "ClusterServiceBroker", "metadata": {"creationTimestamp": "2019-01-17T09:49:30Z", "generation": 1, "name": "ansible-service-broker", "resourceVersion": "14742", "selfLink": "/apis/servicecatalog.k8s.io/v1beta1/clusterservicebrokers/ansible-service-broker", "uid": "2f5888c4-1a3d-11e9-9bbe-0a580a800005"}, "spec": {"authInfo": {"bearer": {"secretRef": {"name": "asb-client", "namespace": "openshift-ansible-service-broker"}}}, "caBundle": "xxx", "relistBehavior": "Duration", "relistDuration": "15m0s", "relistRequests": 0, "url": "https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker"}, "status": {"conditions": [{"lastTransitionTime": "2019-01-17T09:50:00Z", "message": "Successfully fetched catalog entries from broker.", "reason": "FetchedCatalog", "status": "True", "type": "Ready"}], "lastCatalogRetrievalTime": "2019-01-17T10:57:40Z", "reconciledGeneration": 1}}], "returncode": 0}, "state": "present"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0206