|Summary:||Got "500 Internal Server Error" when watch bindings and instances of apigroup servicecatalog.k8s.io|
|Product:||OpenShift Container Platform||Reporter:||weiwei jiang <wjiang>|
|Component:||Master||Assignee:||Jordan Liggitt <jliggitt>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||weiwei jiang <wjiang>|
|Version:||3.6.0||CC:||aos-bugs, deads, dma, eparis, ewolinet, jforrest, jliggitt, jmatthew, jokerman, mmccomas, wjiang|
|Fixed In Version:||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|:||1473523 (view as bug list)||Environment:|
|Last Closed:||2017-08-16 19:38:00 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:|
|Bug Blocks:||1470622, 1473523, 1474520|
Description weiwei jiang 2017-07-18 07:33:28 UTC
Description of problem: After enable service-catalog console, go to the project overview page, then got "500 Internal Server Error" in devtools for watching bindings and instances api, and page show "server connection interrupted". Normal api resources such as BC, IS don't have the problem. Version-Release number of selected component (if applicable): # openshift version openshift v3.6.151 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 How reproducible: always Steps to Reproduce: 1. Install openshift with service-catalog and enable service-catalog console 2. Create a project 3. Go to project overview page Actual results: Got "500 Internal Server Error" in devtools and page show "Server connection interrupted" Expected results: Should not got this error. Additional info:
Comment 1 weiwei jiang 2017-07-18 07:36:34 UTC
Created attachment 1300304 [details] watch operation got 500 for bindings and instances
Comment 2 Jessica Forrester 2017-07-18 12:30:11 UTC
Reassigning since all the other websockets are working in the console and this is specific to the svc catalog websocket connections. @weiwei can you confirm this server was installed using the ansible installer? Are those websocket connections going to the same hostname as the working websockets? We will need master logs from during this time to debug and possibly also logs from the svc catalog containers. Will need this to figure out if this is an aggregator problem or a svc catalog problem.
Comment 3 Paul Morie 2017-07-18 18:14:04 UTC
Was this cluster created using `oc cluster up` or the installer?
Comment 4 weiwei jiang 2017-07-19 03:17:49 UTC
(In reply to Paul Morie from comment #3) > Was this cluster created using `oc cluster up` or the installer? The cluster is created by installer. And I got some useful log in service-catalog apiserver pod after page got 500 error: # oc logs -f apiserver-gtn7l -n kube-service-catalog |grep E0719 E0719 02:14:54.219751 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:14:57.241773 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:22:58.341435 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:22:59.346418 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:31:19.450048 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:32:15.484653 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:44:33.575782 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:47:38.592010 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:51:01.724747 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 02:55:26.687264 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted E0719 03:07:50.860714 1 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
Comment 5 DeShuai Ma 2017-07-19 07:23:15 UTC
After deprovision, the "Provisioned Services" still keep on OverView page unless manual refresh the page. Actually the instance already removed.
Comment 6 Jessica Forrester 2017-07-19 12:12:21 UTC
If the websocket watches are failing then the issue in comment 5 is expected.
Comment 7 ewolinet 2017-07-19 21:30:13 UTC
I'm able to recreate this locally on the console, however if I ssh into the node and run: $ oc policy can-i watch bindings --as=eric -n testproject yes $ oc policy can-i watch instances --as=eric -n testproject yes Where testproject is a newly created project and eric is a user that is an admin for testproject.
Comment 8 ewolinet 2017-07-20 13:24:16 UTC
When I updated a failed 500 request in devtools to add the "Authorization: Bearer" header with a valid token, I saw the request come back as a 200
Comment 9 ewolinet 2017-07-20 13:26:23 UTC
Is this something the installer can configure to happen within the console? If so, what needs to be added where?
Comment 10 Jessica Forrester 2017-07-20 13:41:22 UTC
Websockets from the browser can not use an Authorization bearer header. This should not be needed, the token is being passed via the Sec-Websocket-Protocol header, and it working fine against all other endpoints. If this is now failing against aggregated APIs then we have a problem, but we do not see this issue in the oc cluster up environment.
Comment 11 Eric Paris 2017-07-20 14:17:59 UTC
What needs to be changed in the installer? Tell Scott EXACTLY what flag is set differently, what file needs to contain what, etc. I'm not seeing the root cause here. Unless I'm mistaken I believe that Eric needs to stand up a cluster with oc cluster up and a cluster with the installer, find the difference between them, and explain what exactly needs changed.
Comment 12 Jordan Liggitt 2017-07-20 16:14:36 UTC
ansible installs with a caBundle on the service catalog API service, cluster up installs with insecureSkipTLSVerify: true no other differences leapt out at me
Comment 13 Jordan Liggitt 2017-07-20 19:28:58 UTC
changing the APIService config to "insecureSkipTLSVerify: true" resolved the 500 looks like the upgrade path with TLS verification is not handled correctly
Comment 14 Jordan Liggitt 2017-07-20 20:08:13 UTC
server is returning this error: error dialing backend: x509: cannot validate certificate for 172.30.1.2 because it doesn't contain any IP SANs
Comment 15 Jordan Liggitt 2017-07-21 05:15:57 UTC
To recreate, ensure the APIService configured for the service catalog contains a caBundle, not insecureSkipTLSVerify: true kube issue: https://github.com/kubernetes/kubernetes/issues/49354 kube fix: https://github.com/kubernetes/kubernetes/pull/49353 origin 3.6 fix: https://github.com/openshift/origin/pull/15388 origin 3.7 fix: https://github.com/openshift/origin/pull/15390
Comment 16 Samuel Padgett 2017-07-24 20:41:04 UTC
*** Bug 1474520 has been marked as a duplicate of this bug. ***
Comment 17 Jordan Liggitt 2017-07-25 01:48:39 UTC
Comment 18 Jordan Liggitt 2017-07-25 18:55:30 UTC
fixed in v3.6.170-1
Comment 19 weiwei jiang 2017-07-26 02:59:25 UTC
Checked with # openshift version openshift v3.6.170 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 and the issue can not be reproduced now.
Comment 21 Jordan Liggitt 2017-08-14 22:38:04 UTC
This was fixed in 3.6.0