I noticed this while debugging a pod on 3.10. Using `oc rsh` I was tailing the output of a file and after almost exactly 2 minutes was disconnected. When I reconnected my other sh was still running so this looked like Docker still thought the exec session was running (which itself is a problem, because a dead stream should result in the session getting terminated and cleaned up on the node).
1. From your laptop, run `time oc rsh RUNNING_POD`
2. Run `top` (means client is receiving continuous writes from server)
After 2 minutes the session is disconnected, even though top is sending continuous traffic.
This was on GCP 3.10 from a recent master post-rebase. GCP master load balancer has a 2 minute timeout but it is timeout to a backend for a request, not for a one way connection idle (and watches aren't being detached after 2m either).
Does not occur against 3.9 AWS clusters like us-east-1 - stayed open forever.
docker client didn't change in the rebase. I don't recall apiserver or kubelet handling of exec changing upstream either, but can check
I see logs failing still on the one cluster. Can you verify logs behaves the same way.
Were you accessing from within the instance or outside? When you were outside, what network were you on?
I can't reproduce this locally via cluster up, so I assume this have to be provider specific problem where the GCP loadbalancer must somehow break the connection after 2 minutes.
Moving to the networking team for future investigation. I don't think this is a 3.10 blocker.
Closing unless somebody can provide a reproducer. No issues with the `top` scenario in a 3.10 GCP cluster running for 100 minutes, nor with an `oc logs` tail for 30 minutes in the same cluster (accessed from the public internet).