Bug 1470622 - When provision mediawiki success/failed there is no return in catalog console
Summary: When provision mediawiki success/failed there is no return in catalog console
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.0
Assignee: Shawn Hurley
QA Contact: XiaochuanWang
URL:
Whiteboard:
Depends On: 1472148
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-13 10:25 UTC by DeShuai Ma
Modified: 2017-11-28 22:00 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-28 22:00:46 UTC
Target Upstream Version:


Attachments (Terms of Use)
provison.png (105.50 KB, image/png)
2017-07-13 10:25 UTC, DeShuai Ma
no flags Details
mediawiki completed (56.30 KB, image/png)
2017-07-13 20:05 UTC, Shawn Hurley
no flags Details
mediawiki-fail (57.15 KB, image/png)
2017-07-13 20:06 UTC, Shawn Hurley
no flags Details
mediawiki-fail-project (22.08 KB, image/png)
2017-07-13 20:07 UTC, Shawn Hurley
no flags Details
image pull error (92.87 KB, image/png)
2017-07-13 20:07 UTC, Shawn Hurley
no flags Details
Postgres provisioned successfully and completed, webui still spinning (374.58 KB, image/png)
2017-07-20 11:40 UTC, John Matthews
no flags Details
Logs from controller-manager, apiserver, and ansible service broker (111.28 KB, application/x-gzip)
2017-07-20 11:45 UTC, John Matthews
no flags Details
provison successfully (112.21 KB, image/png)
2017-07-26 05:53 UTC, DeShuai Ma
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description DeShuai Ma 2017-07-13 10:25:36 UTC
Created attachment 1297496 [details]
provison.png

Description of problem:
1) Enable asb and catalog console, make the apb can't be pulled on node, then do provision in catalog console. In console it will always loading and no any response.
2) In conosle when provision success , no any response in conosle.

Version-Release number of selected component (if applicable):
openshift v3.6.143
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

How reproducible:
Always

Steps to Reproduce:
1. In console click mediawili-apb
2. Input password then click create
3. In both success and failed case, there is no response in console

Actual results:
3. No response in console

Expected results:
3. Should response error

Additional info:
screenshot will attached.

Comment 1 Jessica Forrester 2017-07-13 11:31:56 UTC
In the case where it can't finish provisioning we expect the spinner to keep spinning because the service is not ready and is not permanently failed either.  We have existing bugs tracking improving the wording when something is stuck provisioning especially if there have been any failures reported in the process. Note that the request to create the service instance completing is NOT the same as the service being provisioned, status is reported about the state of the service that we watch. If the broker actually reports back to the service catalog that the provision is complete (check Ready condition on the service instance) then the console will show provision completed.

We know things are working as expected with the template service broker and other example brokers. So it is possible the ASB is not reporting status back correctly to the service catalog in the complete scenario. If you have a server already set up with ASB for us to look at that should help us debug the network requests from the console side so we can make a suggestion.

Comment 2 Shawn Hurley 2017-07-13 20:03:52 UTC
I attempted to reproduce this bug and found that.

1. When the Mediawiki-APB provisioned correctly I did see the screenshot (mediawiki completed). I am not seeing the service catalog get hung in a state of 

2. I attempted to remove the ability to pull down the image retested. The screen shots of what I saw are (mediawiki-fail) from the create screen. In the project, you can see that the service was attempted to be provisioned as well as the pod attempted to be created (mediawiki-fail-project). In the last screenshot, you can see that the error for the pod is that it is unable to pull the image.


There is another bug that has been added here:
https://bugzilla.redhat.com/show_bug.cgi?id=1470851
This bug states that we are not setting the correct status for the job when we can not pull the image.

Comment 3 Shawn Hurley 2017-07-13 20:05:08 UTC
Created attachment 1297848 [details]
mediawiki completed

Comment 4 Shawn Hurley 2017-07-13 20:06:10 UTC
Created attachment 1297850 [details]
mediawiki-fail

Comment 5 Shawn Hurley 2017-07-13 20:07:04 UTC
Created attachment 1297851 [details]
mediawiki-fail-project

Comment 6 Shawn Hurley 2017-07-13 20:07:33 UTC
Created attachment 1297852 [details]
image pull error

Comment 7 Samuel Padgett 2017-07-13 20:11:01 UTC
On the failure case: The service catalog is not giving us a failure condition. It's simply reporting that the instance isn't ready. Showing an error for failures requires upstream service catalog fix

https://github.com/kubernetes-incubator/service-catalog/pull/1017

Comment 8 Jessica Forrester 2017-07-14 16:36:57 UTC
Based on Shawn's screenshots there is no console bug here. The console is behaving appropriately based on the information we are getting from the service catalog. Moving to the Service Broker component to work out with QE why in their environment the service catalog or ASB may not be communicating status back correctly.

Comment 9 John Matthews 2017-07-18 20:40:15 UTC
For the success case, we are seeing the console update, though it takes several minutes to complete...when launching downstream APBs it's not uncommon to see 5-10+ minutes for the operation to complete.


For the failure case, we are aware of open issues on Ansible Service Broker
 https://bugzilla.redhat.com/show_bug.cgi?id=1470851

We plan to track the Ansible Service Broker issues in bz1470851, it is aligned to 3.7.0

Comment 10 John Matthews 2017-07-19 11:07:35 UTC
QE,  please re-test this with the success case.  I would recommend leaving the initial window up and monitoring the APB deployment in a separate terminal looking at logs.  This may take 5-10 minutes, we've seen downloading from downstream registries to be noticeably slower, hence a longer time to deploy the APBs.

As to the error cases with the broker not detecting some error conditions and informing Service Catalog, we plan to align those issues to 3.7.0.

Comment 11 John Matthews 2017-07-20 11:40:08 UTC
Created attachment 1301669 [details]
Postgres provisioned successfully and completed, webui still spinning

I see a "Server Connection Interrupted" being displayed along with several errors from javascript console:




:8443/oapi/v1/namespaces/openshift/templates/jenkins-pipeline-example Failed to load resource: the server responded with a status of 404 ()
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/bindings?watch=true&resourceVersion=7132' failed: Error during WebSocket handshake: Unexpected response code: 500
6
vendor.js:62013 WebSocket connection to 'wss://192.168.120.4.nip.io:8443/apis/servicecatalog.k8s.io/v1alpha1/namespaces/demo/instances?watch=true&resourceVersion=7145' failed: Error during WebSocket handshake: Unexpected response code: 500

Comment 12 John Matthews 2017-07-20 11:45:12 UTC
In regard to comment #11 and the attachment, I recreated this issue this morning deploying Postgres.  It deployed successfully, yet still saw spinner and I saw the "Server Connection Interrupted" in webui.

From service catalog controller-manager logs:
I0720 11:32:14.284100       1 reflector.go:276] github.com/kubernetes-incubator/service-catalog/pkg/client/informers_generated/externalversions/factory.go:61: forcing resync
I0720 11:32:14.284200       1 controller_instance.go:220] Processing Instance demo/postgresql-apb-zlhd6
I0720 11:32:14.284239       1 controller.go:272] Creating client for Broker ansible-service-broker, URL: http://asb.openshift-ansible-service-broker.svc:1338
I0720 11:32:14.284316       1 controller_instance.go:214] Not processing event for Instance demo/mediawiki-apb-l0lm8 because checksum showed there is no work to do
I0720 11:32:14.284385       1 controller_instance.go:214] Not processing event for Instance demo/postgresql-apb-z605l because checksum showed there is no work to do
I0720 11:32:14.284408       1 controller_instance.go:214] Not processing event for Instance demo/mediawiki-apb-0pzvv because checksum showed there is no work to do
I0720 11:32:14.284453       1 controller_instance.go:214] Not processing event for Instance demo/mediawiki-apb-23trx because checksum showed there is no work to do
I0720 11:32:14.288341       1 utils.go:67] {
  "state": "succeeded"
}
I0720 11:32:14.288371       1 controller_instance.go:385] Poll for demo/postgresql-apb-zlhd6 returned "succeeded" : ""
I0720 11:32:14.288427       1 controller_instance.go:512] Found status change for Instance "demo/postgresql-apb-zlhd6" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2017-07-20 11:32:14.28842165 +0000 UTC
I0720 11:32:14.288446       1 controller_instance.go:524] Updating Ready condition for Instance demo/postgresql-apb-zlhd6 to True
I0720 11:32:14.313685       1 controller_instance.go:214] Not processing event for Instance demo/postgresql-apb-zlhd6 because checksum showed there is no work to do
I0720 11:32:15.624542       1 leaderelection.go:204] succesfully renewed lease kube-service-catalog/service-catalog-controller-manager
I0720 11:32:17.632696       1 leaderelection.go:204] succesfully renewed lease kube-service-catalog/service-catalog-controller-manager
I0720 11:32:19.640846       1 leaderelection.go:204] succesfully renewed lease kube-service-catalog/service-catalog-controller-manager
I0720 11:32:20.501746       1 reflector.go:405] github.com/kubernetes-incubator/service-catalog/pkg/client/informers_generated/externalversions/factory.go:61: Watch close - *v1alpha1.ServiceClass total 0 items received

Comment 13 John Matthews 2017-07-20 11:45:44 UTC
Created attachment 1301671 [details]
Logs from controller-manager, apiserver, and ansible service broker

Comment 14 John Matthews 2017-07-20 12:07:30 UTC
Note on the logs I attached prior, that run did have several prior provisions attempted that failed, this was not a clean run.  I will try to replicate on a clean run and see if it reoccurs.

Comment 15 Jessica Forrester 2017-07-20 13:06:16 UTC
The 500 errors are currently being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1472148

If the websockets are not working then we would never get the update that the service was successfully provisioned.  I don't think we can validate this bug until the 500 errors on websockets are resolved.

Comment 16 Jordan Liggitt 2017-07-25 18:56:02 UTC
websocket 500 issue fixed in v3.6.170-1

Comment 17 DeShuai Ma 2017-07-26 05:52:51 UTC
Verify on [root@host-8-175-47 dma]# openshift version
openshift v3.6.170
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

Comment 18 DeShuai Ma 2017-07-26 05:53:59 UTC
Created attachment 1304592 [details]
provison successfully

Comment 22 errata-xmlrpc 2017-11-28 22:00:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.