Created attachment 1297496 [details] provison.png Description of problem: 1) Enable asb and catalog console, make the apb can't be pulled on node, then do provision in catalog console. In console it will always loading and no any response. 2) In conosle when provision success , no any response in conosle. Version-Release number of selected component (if applicable): openshift v3.6.143 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 How reproducible: Always Steps to Reproduce: 1. In console click mediawili-apb 2. Input password then click create 3. In both success and failed case, there is no response in console Actual results: 3. No response in console Expected results: 3. Should response error Additional info: screenshot will attached.
In the case where it can't finish provisioning we expect the spinner to keep spinning because the service is not ready and is not permanently failed either. We have existing bugs tracking improving the wording when something is stuck provisioning especially if there have been any failures reported in the process. Note that the request to create the service instance completing is NOT the same as the service being provisioned, status is reported about the state of the service that we watch. If the broker actually reports back to the service catalog that the provision is complete (check Ready condition on the service instance) then the console will show provision completed. We know things are working as expected with the template service broker and other example brokers. So it is possible the ASB is not reporting status back correctly to the service catalog in the complete scenario. If you have a server already set up with ASB for us to look at that should help us debug the network requests from the console side so we can make a suggestion.
I attempted to reproduce this bug and found that. 1. When the Mediawiki-APB provisioned correctly I did see the screenshot (mediawiki completed). I am not seeing the service catalog get hung in a state of 2. I attempted to remove the ability to pull down the image retested. The screen shots of what I saw are (mediawiki-fail) from the create screen. In the project, you can see that the service was attempted to be provisioned as well as the pod attempted to be created (mediawiki-fail-project). In the last screenshot, you can see that the error for the pod is that it is unable to pull the image. There is another bug that has been added here: https://bugzilla.redhat.com/show_bug.cgi?id=1470851 This bug states that we are not setting the correct status for the job when we can not pull the image.
Created attachment 1297848 [details] mediawiki completed
Created attachment 1297850 [details] mediawiki-fail
Created attachment 1297851 [details] mediawiki-fail-project
Created attachment 1297852 [details] image pull error
On the failure case: The service catalog is not giving us a failure condition. It's simply reporting that the instance isn't ready. Showing an error for failures requires upstream service catalog fix https://github.com/kubernetes-incubator/service-catalog/pull/1017
Based on Shawn's screenshots there is no console bug here. The console is behaving appropriately based on the information we are getting from the service catalog. Moving to the Service Broker component to work out with QE why in their environment the service catalog or ASB may not be communicating status back correctly.
For the success case, we are seeing the console update, though it takes several minutes to complete...when launching downstream APBs it's not uncommon to see 5-10+ minutes for the operation to complete. For the failure case, we are aware of open issues on Ansible Service Broker https://bugzilla.redhat.com/show_bug.cgi?id=1470851 We plan to track the Ansible Service Broker issues in bz1470851, it is aligned to 3.7.0
QE, please re-test this with the success case. I would recommend leaving the initial window up and monitoring the APB deployment in a separate terminal looking at logs. This may take 5-10 minutes, we've seen downloading from downstream registries to be noticeably slower, hence a longer time to deploy the APBs. As to the error cases with the broker not detecting some error conditions and informing Service Catalog, we plan to align those issues to 3.7.0.
Created attachment 1301669 [details] Postgres provisioned successfully and completed, webui still spinning I see a "Server Connection Interrupted" being displayed along with several errors from javascript console: :8443/oapi/v1/namespaces/openshift/templates/jenkins-pipeline-example Failed to load resource: the server responded with a status of 404 () vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500 6 vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7145' failed: Error during WebSocket handshake: Unexpected response code: 500
In regard to comment #11 and the attachment, I recreated this issue this morning deploying Postgres. It deployed successfully, yet still saw spinner and I saw the "Server Connection Interrupted" in webui. From service catalog controller-manager logs: I0720 11:32:14.284100 1 reflector.go:276] github.com/kubernetes-incubator/service-catalog/pkg/client/informers_generated/externalversions/factory.go:61: forcing resync I0720 11:32:14.284200 1 controller_instance.go:220] Processing Instance demo/postgresql-apb-zlhd6 I0720 11:32:14.284239 1 controller.go:272] Creating client for Broker ansible-service-broker, URL: http://asb.openshift-ansible-service-broker.svc:1338 I0720 11:32:14.284316 1 controller_instance.go:214] Not processing event for Instance demo/mediawiki-apb-l0lm8 because checksum showed there is no work to do I0720 11:32:14.284385 1 controller_instance.go:214] Not processing event for Instance demo/postgresql-apb-z605l because checksum showed there is no work to do I0720 11:32:14.284408 1 controller_instance.go:214] Not processing event for Instance demo/mediawiki-apb-0pzvv because checksum showed there is no work to do I0720 11:32:14.284453 1 controller_instance.go:214] Not processing event for Instance demo/mediawiki-apb-23trx because checksum showed there is no work to do I0720 11:32:14.288341 1 utils.go:67] { "state": "succeeded" } I0720 11:32:14.288371 1 controller_instance.go:385] Poll for demo/postgresql-apb-zlhd6 returned "succeeded" : "" I0720 11:32:14.288427 1 controller_instance.go:512] Found status change for Instance "demo/postgresql-apb-zlhd6" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2017-07-20 11:32:14.28842165 +0000 UTC I0720 11:32:14.288446 1 controller_instance.go:524] Updating Ready condition for Instance demo/postgresql-apb-zlhd6 to True I0720 11:32:14.313685 1 controller_instance.go:214] Not processing event for Instance demo/postgresql-apb-zlhd6 because checksum showed there is no work to do I0720 11:32:15.624542 1 leaderelection.go:204] succesfully renewed lease kube-service-catalog/service-catalog-controller-manager I0720 11:32:17.632696 1 leaderelection.go:204] succesfully renewed lease kube-service-catalog/service-catalog-controller-manager I0720 11:32:19.640846 1 leaderelection.go:204] succesfully renewed lease kube-service-catalog/service-catalog-controller-manager I0720 11:32:20.501746 1 reflector.go:405] github.com/kubernetes-incubator/service-catalog/pkg/client/informers_generated/externalversions/factory.go:61: Watch close - *v1alpha1.ServiceClass total 0 items received
Created attachment 1301671 [details] Logs from controller-manager, apiserver, and ansible service broker
Note on the logs I attached prior, that run did have several prior provisions attempted that failed, this was not a clean run. I will try to replicate on a clean run and see if it reoccurs.
The 500 errors are currently being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1472148 If the websockets are not working then we would never get the update that the service was successfully provisioned. I don't think we can validate this bug until the 500 errors on websockets are resolved.
websocket 500 issue fixed in v3.6.170-1
Verify on [root@host-8-175-47 dma]# openshift version openshift v3.6.170 kubernetes v1.6.1+5115d708d7 etcd 3.2.1
Created attachment 1304592 [details] provison successfully
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188