1382730 – Changing requestTimeoutSeconds has no effect over 3600 seconds

Bug 1382730 - Changing requestTimeoutSeconds has no effect over 3600 seconds

Summary: Changing requestTimeoutSeconds has no effect over 3600 seconds

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RFE
Sub Component:
Version:	3.2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Seth Jennings
QA Contact:	Xiaoli Tian
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1436437 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-07 14:15 UTC by Brendan Mchugh
Modified:	2019-12-16 07:02 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-03-12 13:54:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Brendan Mchugh 2016-10-07 14:15:29 UTC

Description of problem:
In trying to increase the timeout for "oc exec" for example for watching a log or running top on the pod, setting requestTimeoutSeconds to a value higher than 3600 has no effect and the connection is closed after 1 hour.


Version-Release number of selected component (if applicable):
openshift v3.2.1.13-1-gc2a90e1
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5


How reproducible:
Always

Steps to Reproduce:
1. set requestTimeoutSeconds: 7200 in /etc/origin/master/master-config.yaml
2. systemctl restart atomic-openshift-master-api.service
3. oc project someproject
4. oc exec -p app2-4-84fv3 -i -t bash
5. run top within the pod

Actual results:
After 1 hour the session stops.

Expected results:
The session should continue for 2 hours.

Additional info:
Setting the value to something smaller than 3600 works.
For example 120 will close the session after 2 minutes.

Comment 2 DeShuai Ma 2016-10-17 08:38:01 UTC

In your case, you can use 'streaming-connection-idle-timeout' on node:
On nodes configure '/etc/origin/node/node-config.yaml' like:
-------------------
kubeletArguments:
 streaming-connection-idle-timeout:
 - "120m"

Comment 3 Vladislav Walek 2016-10-18 06:51:28 UTC

Hello DeShuai Ma,

thanks for the workaround, unfortunately customer is saying it is not working for him:

Hi,
After setting this parameter on node (and restarting node service), "oc exec" command doesn't stop after less than 1 hour but it stop responding : our batch make echo output but "oc exec" output is frozen ....
regards

Could you please check ?

Thank you
Vladislav

Comment 4 Derek Carr 2016-10-19 22:02:52 UTC

Seth, can you investigate?  The default value for streaming-connection-idle-timeout if not modified by the customer should be 4 hrs.

Comment 5 Seth Jennings 2016-10-20 02:46:45 UTC

There are two things here 1) the connection from the node to the master and 2) the connection between the master an the client (oc exec).

streaming-connection-idle-timeout is the timeout the node enforces on the master and requestTimeoutSeconds is the timeout the master enforces on the client.

When I set streaming-connection-idle-timeout absurdly low:

kubeletArguments:
streaming-connection-idle-timeout:
- "10s"

Observe the node cutting off the master proxy and the master proxy cuts off the client, noting the process started in the exec is still running

E1019 20:48:56.615021 29666 proxy.go:172] Error proxying data from client to backend: write tcp 10.42.10.23:33012->10.42.10.23:10250: write: broken pipe
E1019 20:49:04.615748 29666 exec.go:143] Exec session 9e55601375551148a175dd400b1e9210524f2748ce7a72517451c1d8d8985245 in container 6d2a5f594c00db7d0af53f06f2d0407af57e89db987ec830f03a8dc7e0c2da14 terminated but process still running!

When I set requestTimeoutSeconds absurdly low:

assetConfig:
...
servingInfo:
...
requestTimeoutSeconds: 10

E1019 21:12:02.540812 3920 exec.go:143] Exec session b022eaffa6f82b6c1b092869810a097bb810b46ab4ae8626b1432beb697b9a6c in container 8a62ac3bab378d112fc15da03d9e0b69faa310a41b879f7de20af48be0bdfa61 terminated but process still running!

Observe the master proxy cuts off the client, noting the process started in the exec is still running.

In both of the cases above, the "oc exec" command terminates.

However, there are two places where requestTimeoutSeconds is used for a timeout in the master config but mean different things.

servingInfo:
...
requestTimeoutSeconds: 3600

assetConfig:
...
servingInfo:
...
requestTimeoutSeconds: 0

servingInfo->requestTimeoutSeconds is the timeout the master enforces on the node and assetConfig->servingInfo->requestTimeoutSeconds is the timeout the master enforces on the client.

If the servingInfo->requestTimeoutSeconds timeout is hit, the master proxy error is observed but the "oc exec" is _not_ terminated (seems to be the situation the customer is hitting after the workaround)

I believe the fix is to set requestTimeoutSeconds in _both_ the servingInfo and assetConfig->servingInfo. If you are going to set requestTimeoutSeconds > 4 hrs, streaming-connection-idle-timeout will need to be extended as well.

Comment 7 Derek Carr 2016-10-26 17:33:38 UTC

I am moving this to QA to verify they agree with Seth's assessment.

Comment 9 Zhang Cheng 2016-10-31 06:55:23 UTC

Still lost connection after 1 hour while Using "requestTimeoutSeconds: 5400" in both the servingInfo and assetConfig->servingInfo.

Version-Release:
openshift v3.2.2.2
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

Steps to Reproduce:
1. set "requestTimeoutSeconds: 5400" in both the servingInfo and assetConfig->servingInfo in master-config.yaml, and without streaming-connection-idle-timeout in node-config.yaml(4 hours by default).
2. systemctl restart atomic-openshift-master
3. oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/pod-pull-by-tag.yaml
4. oc exec pod-pull-by-tag -i -t bash
5. run top within the pod

Actual results:
Lost connection after 1 hour.

Expected results:
The session should continue for 1.5 hours.

Comment 10 Zhang Cheng 2016-10-31 08:12:07 UTC

Same test result in OCP 3.4: lost connection after 1 hour while Using "requestTimeoutSeconds: 5400" in both the servingInfo and assetConfig->servingInfo

Version-Release:
openshift v3.4.0.17+b8a03bc
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 11 Andy Goldstein 2016-11-01 14:56:46 UTC

This is additionally governed by the hard coded ReadTimeout and WriteTimeout values of 60 minutes here: https://github.com/openshift/origin/blob/74552466e64a1321191ad5862485dbbee751369e/vendor/k8s.io/kubernetes/pkg/kubelet/server/server.go#L123-L124. It is not currently possible to change these.

For long-running batch jobs, we recommend using a Job instead of 'oc exec' if at all possible.

Comment 14 Seth Jennings 2017-01-27 17:37:42 UTC

opened Trello card to track this RFE:
https://trello.com/c/ZuIllYzU/682-allow-setting-oc-exec-timeout

Comment 15 Seth Jennings 2017-03-29 13:59:43 UTC

*** Bug 1436437 has been marked as a duplicate of this bug. ***

Comment 17 Eric Rich 2018-03-12 13:54:36 UTC

This bug has been identified as a dated (created more than 3 months ago) bug. 
This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, 
however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. 

As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, 
as it is currently not part of the products immediate priorities.

Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details.

Comment 18 Brendan Mchugh 2018-03-19 11:57:43 UTC

Clearing needinfo, case was closed long ago.

Note You need to log in before you can comment on or make changes to this bug.