Bug 1435328 - Timeout while oc get images for 40K builds
Summary: Timeout while oc get images for 40K builds
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.7.z
Assignee: Michal Fojtik
QA Contact: Vikas Laad
URL:
Whiteboard: aos-scalability-35
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-23 14:39 UTC by Vikas Laad
Modified: 2018-04-05 09:29 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-05 09:28:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0636 0 None None None 2018-04-05 09:29:03 UTC

Description Vikas Laad 2017-03-23 14:39:00 UTC
Description of problem:
I am doing etcd performance analysis for thousands builds. After creating 40K cakephp quickstart builds "oc get images" is timing out. Trying to get images project by project also does not work.

root@ip-172-31-6-118: ~ # oc get images --all-namespaces -n proj0
Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR

root@ip-172-31-6-118: ~ # oc get images  -n proj0
Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR

root@ip-172-31-6-118: ~ # oc project proj0
Now using project "proj0" on server "https://ip-172-31-6-118.us-west-2.compute.internal:8443".
root@ip-172-31-6-118: ~ # oc get images
Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR

Env details
1 master m4.xlarge
1 etcd m4.2xlarge
1 infra m4.2xlarge
4 nodes m4.xlarge

Version-Release number of selected component (if applicable):
openshift v3.5.0.55
kubernetes v1.5.2+43a9be4
etcd 3.1.0

How reproducible:


Steps to Reproduce:
1. create 200 projects and cakephp builds
2. start 50 concurrent builds at a time
3. reach 40K builds and try oc get images

Actual results:
See the error

Expected results:
Should list images

Additional info:

Comment 2 Fabiano Franz 2017-03-23 20:26:01 UTC
Can you please provide the output for the same 'get' commands running with the '--loglevel=9' flag?

Comment 3 Ben Parees 2017-03-23 20:43:10 UTC
oc get images is going to be more the realm of the platform management team.  Adding michal.

Comment 4 Vikas Laad 2017-03-24 13:09:57 UTC
root@ip-172-31-6-118: ~ # oc get images --all-namespaces --loglevel=9 
I0324 09:04:37.299190  112392 loader.go:354] Config loaded from file /root/.kube/config
I0324 09:04:37.301151  112392 cached_discovery.go:112] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/servergroups.json
I0324 09:04:37.301269  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/apps/v1beta1/serverresources.json
I0324 09:04:37.301339  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/authentication.k8s.io/v1beta1/serverresources.json
I0324 09:04:37.301405  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/autoscaling/v1/serverresources.json
I0324 09:04:37.301466  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/batch/v1/serverresources.json
I0324 09:04:37.301544  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/batch/v2alpha1/serverresources.json
I0324 09:04:37.301610  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/certificates.k8s.io/v1alpha1/serverresources.json
I0324 09:04:37.301743  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/extensions/v1beta1/serverresources.json
I0324 09:04:37.301797  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/policy/v1beta1/serverresources.json
I0324 09:04:37.301855  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/storage.k8s.io/v1beta1/serverresources.json
I0324 09:04:37.302286  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/v1/serverresources.json
I0324 09:04:37.302722  112392 cached_discovery.go:112] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/servergroups.json
I0324 09:04:37.302787  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/apps/v1beta1/serverresources.json
I0324 09:04:37.302839  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/authentication.k8s.io/v1beta1/serverresources.json
I0324 09:04:37.304855  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/autoscaling/v1/serverresources.json
I0324 09:04:37.304923  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/batch/v1/serverresources.json
I0324 09:04:37.305013  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/batch/v2alpha1/serverresources.json
I0324 09:04:37.305093  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/certificates.k8s.io/v1alpha1/serverresources.json
I0324 09:04:37.305232  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/extensions/v1beta1/serverresources.json
I0324 09:04:37.305310  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/policy/v1beta1/serverresources.json
I0324 09:04:37.305365  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/storage.k8s.io/v1beta1/serverresources.json
I0324 09:04:37.305828  112392 cached_discovery.go:70] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/v1/serverresources.json
I0324 09:04:37.308522  112392 cached_discovery.go:112] returning cached discovery info from /root/.kube/ip_172_31_6_118.us_west_2.compute.internal_8443/servergroups.json
I0324 09:04:37.308714  112392 round_trippers.go:299] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: oc/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4" https://ip-172-31-6-118.us-west-2.compute.internal:8443/oapi/v1/images
I0324 09:05:37.342117  112392 round_trippers.go:318] GET https://ip-172-31-6-118.us-west-2.compute.internal:8443/oapi/v1/images  in 60033 milliseconds
I0324 09:05:37.342161  112392 round_trippers.go:324] Response Headers:
I0324 09:05:37.342260  112392 helpers.go:221] Connection error: Get https://ip-172-31-6-118.us-west-2.compute.internal:8443/oapi/v1/images: stream error: stream ID 1; INTERNAL_ERROR
F0324 09:05:37.342283  112392 helpers.go:116] Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR

Comment 7 Fabiano Franz 2017-04-04 18:19:45 UTC
Vikas, in the process of doing this etcd performance analysis, did you try to reach out directly to the api with curl (instead of through oc)? Something like what's suggested in the last few lines of logs when running in --loglevel=9, like 

curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: oc/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4" https://ip-172-31-6-118.us-west-2.compute.internal:8443/oapi/v1/images

I'm trying to figure if the error happens in the client side or already on server when through the API.

Comment 8 Fabiano Franz 2017-04-04 18:20:13 UTC
Michal, any idea based on the logs messages?

Comment 9 Vikas Laad 2017-04-04 18:27:05 UTC
Fabiano,
I tried using directly etcdctl2 on the etcd node at that time, and it was not working. I had to provide 30s for --total-timeout parameter.

Comment 16 Michal Fojtik 2018-01-29 11:12:13 UTC
OK, the --request-timeout option for 'oc' client landed in 3.7 (and higher versions). Moving this ON_QA to try to get large number of images using this option (it should not hit the request timeout).

Comment 17 Vikas Laad 2018-01-31 13:57:06 UTC
Verified in following release, created 100K images and 40K more objects, oc get images and oc get all everything works. All work without any additional parameter.

openshift v3.7.27
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

Comment 21 errata-xmlrpc 2018-04-05 09:28:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636


Note You need to log in before you can comment on or make changes to this bug.