Bug 1905194 - Detecting broken connections to the Kube API takes up to 15 minutes
Summary: Detecting broken connections to the Kube API takes up to 15 minutes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Lukasz Szaszkiewicz
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks: 1905195
TreeView+ depends on / blocked
 
Reported: 2020-12-07 17:19 UTC by Lukasz Szaszkiewicz
Modified: 2021-02-24 15:40 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1905195 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:40:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oauth-apiserver pull 25 0 None closed pin to github.com/openshift/kubernetes-client-go 2021-02-18 07:26:03 UTC
Github openshift oauth-server pull 60 0 None closed pin to github.com/openshift/kubernetes-client-go 2021-02-18 07:26:04 UTC
Github openshift openshift-apiserver pull 155 0 None closed pin to github.com/openshift/kubernetes-client-go 2021-02-18 07:26:04 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:40:49 UTC

Description Lukasz Szaszkiewicz 2020-12-07 17:19:49 UTC
All API servers (openshift-apiserver, oauth-server, oauth-apiserver) rely on the TCP stack to detect broken network connections to KAS.
This can take up to 15 minutes. During that time our platform might be unavailable. 


There are already reported cases in which aggregated APIs (i.e. `openshift-apiserver`) were unable to establish a new connection to the Kube API for 15 minutes:

 - after "ungraceful termination" https://bugzilla.redhat.com/show_bug.cgi?id=1881878
 - after a network error https://bugzilla.redhat.com/show_bug.cgi?id=1879232#c39)


Detecting a broken connection should be quicker ideally it should take seconds not minutes.

Comment 1 Lukasz Szaszkiewicz 2020-12-07 17:26:07 UTC
@Xia I can share some ideas on how to verify the bug if you are interested.

Comment 3 Xingxing Xia 2020-12-10 11:07:42 UTC
Was occupied by other queued and rising work. So didn't work on this bug yet. Will try to look next day.

Comment 4 Xingxing Xia 2020-12-11 13:18:03 UTC
Lukasz Szaszkiewicz, would like to hear your ideas :)
Currently I guess (not yet have time to try today):
In terminals, ssh to masters from bastion, check crictl logs -f the_OAS_container.
In other terminals, also ssh to masters from bastion, on all masters, meantime make kube-apiserver down by mv /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmp.

Comment 5 Lukasz Szaszkiewicz 2020-12-11 13:37:55 UTC
(In reply to Xingxing Xia from comment #4)
> Lukasz Szaszkiewicz, would like to hear your ideas :)
> Currently I guess (not yet have time to try today):
> In terminals, ssh to masters from bastion, check crictl logs -f
> the_OAS_container.
> In other terminals, also ssh to masters from bastion, on all masters,
> meantime make kube-apiserver down by mv
> /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmp.

Hi, sure. So I was able to reproduce the issue by blocking (iptables) communication between two components.
For example to break communication between OAS and KAS you could.

1. Get SSH access to a master node
2. Get a pid of OAS (crictl ps, crictl inspect | grep "pid")
3. Get into OAS's network ns (nsenter -t $PID -n /bin/bash)
4. Block traffic from OAS to KAS (iptables -I FORWARD -p tcp --sport $PORT -s $IP -j DROP)
5. Withouth the fix OAS won't be able to open a new connection to KAS for ~15 minutes. You may observe slowness of the platform and gradual degradation.

Don't forget that this BZ targets multiple servers (all susceptible) oauth-server, oauth-apiserver, kube-apiserver, openshift-apiserver. All share the same code. 
It would be nice if you could check oauth-server, oauth-apiserver and openshift-apiserver

Comment 8 Xingxing Xia 2020-12-25 09:57:09 UTC
Hi Lukasz, to ensure the 4.7 verification is correct. I first tried to reproduce in 4.6 env (4.6.0-0.nightly-2020-12-21-163117), take OAS for example:
On one master (master-1):
[root@wduan-1225a-dmt4h-master-1 ~]# crictl ps --name=openshift-apiserver$
CONTAINER           IMAGE                                                              CREATED             STATE               NAME                  ATTEMPT             POD ID
0bbc03b5398a5       8288a707532596804713c53b91ba049beb1dc516e105f7d5170f76e6d9462903   6 hours ago         Running             openshift-apiserver   0                   1f8266e30ac65
[root@wduan-1225a-dmt4h-master-1 ~]# crictl inspect 0bbc03b5398a5 | grep pid
    "pid": 38684,
[root@wduan-1225a-dmt4h-master-1 ~]# nsenter -t 38684 -n /bin/bash

# I don't know what source port and IP you intended to mean for "--sport $PORT -s $IP", so I first tried OAS pod's port (and IP):
[root@wduan-1225a-dmt4h-master-1 ~]# iptables -I FORWARD -p tcp --sport 8443 -j DROP
However, I didn't see any abnormality, failure or slowness in the env by repeatedly checking COs, oc login and oc get project (I don't know that the concrete test for "Without the fix OAS won't be able to open a new connection to KAS for ~15 minutes", I just tried these OAS resource related commands)

Then I tried KAS pod's port (and IP) together:
[root@wduan-1225a-dmt4h-master-1 ~]# iptables -I FORWARD -p tcp --sport 6443 -j DROP
Still didn't see any abnormality, failure or slowness.

Then I tried above on evn all 3 masters, again still didn't see any abnormality, failure or slowness.

Could you clarify more? Any incorrect steps of me? Thanks.

Comment 9 Xingxing Xia 2021-01-08 10:53:36 UTC
Today consulted network QE about nsenter and alternatives, e.g. networkpolicy (which proves not applicable because kube-apiserver uses host network), and iptables on masters in #forum-qe. Still not succeed in reproducing in 4.6. Hmm, will switch to just try regression test of case OCP-19941 "when 1 master is down the cluster should continue serving well without unavailable more than 1.5 min" to verify this bug.

Comment 10 Lukasz Szaszkiewicz 2021-01-11 09:07:34 UTC
Hi,

# I don't know what source port and IP you intended to mean for "--sport $PORT -s $IP", so I first tried OAS pod's port (and IP):
# [root@wduan-1225a-dmt4h-master-1 ~]# iptables -I FORWARD -p tcp --sport 8443 -j DROP

Yeah, I meant OAS's port and IP. A Source port of 8443 seems incorrect. I would expect it to be "more random". For example:

iptables -I FORWARD -p tcp --sport 58696 (you need to find the actual port) -s 10.128.0.63  -j DROP

Comment 11 Xingxing Xia 2021-01-12 11:33:57 UTC
Tried to reproduce in 4.6 (4.6.0-0.nightly-2021-01-10-033123) again:
$ oc get po -o wide -n openshift-apiserver
apiserver-857f8c5f76-v69hz   2/2     Running   0          6h10m   10.128.0.11   ip-10-0-208-199.us-east-2.compute.internal ...
apiserver-857f8c5f76-v8s6m   2/2     Running   0          6h3m    10.130.0.6    ip-10-0-182-12.us-east-2.compute.internal  ...
apiserver-857f8c5f76-x6mdg   2/2     Running   0          6h7m    10.129.0.7    ip-10-0-140-246.us-east-2.compute.internal ...
# on master ip-10-0-208-199
[root@ip-10-0-208-199 ~]# crictl ps --name=openshift-apiserver$
2aaceadfed2ca       653a3c8eb75d54e75d8f375e4fc5e105ce9bd9127d1e6bb7a3e9c659e4ee0d08   7 hours ago         Running             openshift-apiserver   0                   c64c4133469f1
[root@ip-10-0-208-199 ~]# crictl inspect 2aaceadfed2ca | grep pid
    "pid": 7343,
[root@ip-10-0-208-199 ~]# nsenter -t 7343 -n /bin/bash
[root@ip-10-0-208-199 ~]# netstat -n # many connections and random ports, not quite sure which port for --sport? see http://file.rdu.redhat.com/~xxia/nsenter_then_netstat-n.txt
[root@ip-10-0-208-199 ~]# netstat -n | grep "172.30.0.1:443"
tcp        0      0 10.128.0.11:43906       172.30.0.1:443          ESTABLISHED
tcp        0      0 10.128.0.11:41206       172.30.0.1:443          ESTABLISHED
# I chosen these ports
[root@ip-10-0-208-199 ~]# iptables -I FORWARD -p tcp --sport 43906 -s 10.128.0.11 -j DROP
[root@ip-10-0-208-199 ~]# iptables -I FORWARD -p tcp --sport 41206 -s 10.128.0.11 -j DROP
** Even I repeated above in all 3 masters, still didn't see any abnormality/slowness/failure **
** Even later I tried without port `iptables -I FORWARD -p tcp -s <IP> -j DROP` on all 3 masters, still didn't reproduce any abnormality/slowness/failure **

Sorry bothering you a few times, thanks for your kind replies. Can I switch to just try regression test of case OCP-19941 to verify this bug as comment 9 mentions?

Comment 12 Lukasz Szaszkiewicz 2021-01-12 15:48:25 UTC
Sure, on the other hand, I reproduced it a few times. We could set up a debugging session and I could show you how I did that.

Comment 13 Xingxing Xia 2021-01-13 12:04:16 UTC
First test openshift-apiserver in old 4.6 (4.6.0-0.nightly-2021-01-12-112514) env which does not yet have the fix:
Enter the master of one openshift-apiserver pod:
[root@ip-10-0-198-250 ~]# crictl ps --name=openshift-apiserver$
ab3931ce6a1e6       653a3c8e...   2 hours ago  Running   openshift-apiserver   0   2c63d0d6c137c
[root@ip-10-0-198-250 ~]# crictl inspect ab3931ce6a1e6 | grep pid
    "pid": 283501,
[root@ip-10-0-198-250 ~]# nsenter -t 283501 -n /bin/bash
[root@ip-10-0-198-250 ~]# export PS1='[\u@\h \D{%F %T %Z} \W]\$ '
[root@ip-10-0-198-250 2021-01-13 10:44:04 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0      0 10.130.0.8:48646        172.30.0.1:443          ESTABLISHED 283548/cluster-kube  keepalive (15.28/0/0)
tcp        0      0 10.130.0.8:35432        172.30.0.1:443          ESTABLISHED 283501/openshift-ap  keepalive (24.16/0/0)
[root@ip-10-0-198-250 2021-01-13 10:44:35 UTC ~]# iptables -I INPUT -m state --state ESTABLISHED,RELATED -p tcp --dport 35432 -s 172.30.
0.1  -j DROP

# note the connection changes to "on" at once
[root@ip-10-0-198-250 2021-01-13 10:47:47 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0      0 10.130.0.8:48646        172.30.0.1:443          ESTABLISHED 283548/cluster-kube  keepalive (19.61/0/0)
tcp        0  16875 10.130.0.8:35432        172.30.0.1:443          ESTABLISHED 283501/openshift-ap  on (9.24/7/0)

# repeately check, found "on" lasts about 15 mins! Only after so long time, then new connection "57780" shows up:
[root@ip-10-0-198-250 2021-01-13 11:02:32 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0      0 10.130.0.8:57794        172.30.0.1:443          TIME_WAIT   -                    timewait (16.99/0/0)
tcp        0      0 10.130.0.8:48646        172.30.0.1:443          ESTABLISHED 283548/cluster-kube  keepalive (27.13/0/0)
tcp        0      0 10.130.0.8:57780        172.30.0.1:443          ESTABLISHED 283501/openshift-ap  keepalive (15.92/0/0)

# In another terminal, run oc new-project repeatedly, will fail like:
$ oc new-project xxia-proj
Error from server (Timeout): Timeout: request did not complete within requested timeout 34s
$ oc new-project xxia-proj
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get projectrequests.project.openshift.io)
# Only with above new connection showing up, it can succeed
$ oc new-project xxia-proj
Now using project "xxia-proj" on server "https://...

Repate above in 4.7.0-0.nightly-2021-01-13-005809 env, the new connection can immediately show up, about within 30s. And oc new-project can succeed if tried then.

Comment 14 Xingxing Xia 2021-01-13 12:04:41 UTC
Then test oauth-openshift in old 4.6 (4.6.0-0.nightly-2021-01-12-112514) env which does not yet have the fix:
[root@ip-10-0-198-250 2021-01-13 11:15:52 UTC ~]# crictl ps --name=oauth-openshift$
a05f6f5390e52       898f0701e52...   49 seconds ago      Running     oauth-openshift     0    2f6d83ad74238
[root@ip-10-0-198-250 2021-01-13 11:18:25 UTC ~]# crictl inspect a05f6f5390e52 | grep pid
    "pid": 422691,
[root@ip-10-0-198-250 2021-01-13 11:18:52 UTC ~]# nsenter -t 422691 -n /bin/bash
[root@ip-10-0-198-250 2021-01-13 11:19:15 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0      0 10.130.0.9:49114        172.30.0.1:443          ESTABLISHED 422691/oauth-server  keepalive (29.19/0/0)
[root@ip-10-0-198-250 2021-01-13 11:19:26 UTC ~]# iptables -I INPUT -m state --state ESTABLISHED,RELATED -p tcp --dport 49114 -s 172.30.
0.1  -j DROP
# repeately check, "on" status lasts long, new connection does not show up:
[root@ip-10-0-198-250 2021-01-13 11:30:57 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0  24399 10.130.0.9:49114        172.30.0.1:443          ESTABLISHED 422691/oauth-server  on (37.28/13/0)

# In another terminal, repeatedly run login, will fail:
[xxia@pres 2021-01-13 19:20:00 CST my]$ oc login -u testuser-41 -p ...
Unable to connect to the server: EOF
[xxia@pres 2021-01-13 19:21:27 CST my]$ oc login -u testuser-41 -p ...
Unable to connect to the server: EOF

Only after more minutes, new connection shows up and login can work.

# Repeat the test in 4.7 env, new connection can immediately show up and login can immediately come back to work:
[root@xxia13dr-01130201-master-0 2021-01-13 11:25:30 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0      0 10.128.0.93:54818       172.30.0.1:443          ESTABLISHED 365024/oauth-server  keepalive (2.16/0/0)
[root@xxia13dr-01130201-master-0 2021-01-13 11:25:35 UTC ~]# iptables -I INPUT -m state --state ESTABLISHED,RELATED -p tcp --dport 54818 -s 172.30.0.1  -j DROP
[root@xxia13dr-01130201-master-0 2021-01-13 11:25:59 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0      0 10.128.0.93:59298       172.30.0.1:443          ESTABLISHED 365024/oauth-server  keepalive (0.50/0/0)

[xxia@pres 2021-01-13 19:25:10 CST my]$ oc login -u testuser-41 -p ...
Login successful.

Comment 15 Xingxing Xia 2021-01-13 12:05:06 UTC
Third, verify oauth-apiserver in 4.7:
[root@xxia13dr-01130201-master-0 2021-01-13 11:31:22 UTC ~]# iptables -I INPUT -m state --state ESTABLISHED,RELATED -p tcp --dport 59888 -s 172.30.0.1  -j DROP
[root@xxia13dr-01130201-master-0 2021-01-13 11:31:23 UTC ~]# netstat --tcp --numeric --program --timer --wide | grep "172.30.0.1"
tcp        0   3652 10.128.0.57:59888       172.30.0.1:443          ESTABLISHED 60933/oauth-apiserv  on (0.05/4/0)
...

# In another terminal, run login, will fail at the beginning:
[xxia@pres 2021-01-13 19:31:25 CST my]$ oc login -u testuser-41 -p ...
Error from server (InternalError): Internal error occurred: unexpected response: 400
...
[xxia@pres 2021-01-13 19:32:39 CST my]$ oc login -u testuser-41 -p ...
error: server_error The authorization server encountered an unexpected condition that prevented it from fulfilling the request.
[xxia@pres 2021-01-13 19:32:41 CST my]$ oc login -u testuser-41 -p ...
Error from server (InternalError): Internal error occurred: unexpected response: 500
# But after some time that is not as long as 15 mins, login can succeed:
[xxia@pres 2021-01-13 19:32:49 CST my]$ oc login -u testuser-41 -p ...
Login successful.
But 19:32:49 - 11:31:23 = about 1.5 mins, a bit longer than above openshift-apiserver and oauth-openshift.

Comment 16 Xingxing Xia 2021-01-13 12:07:18 UTC
In summary, previous 3 comments verify openshift-apiserver, oauth-openshift, and oauth-apiserver in 4.7, they can detect broken connections to the kube apiserver immediately rather than long 15 minutes.

Comment 18 errata-xmlrpc 2021-02-24 15:40:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.