Bug 1826769 - HTTP keepalive response loss in HTTP persistent connection to apiserver for hostNetwork pods with OVN
Summary: HTTP keepalive response loss in HTTP persistent connection to apiserver for h...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: mcambria@redhat.com
QA Contact: Anurag saxena
URL:
Whiteboard: SDN-CI-IMPACT
: 1834914 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-22 14:03 UTC by Peng Liu
Modified: 2020-08-13 17:52 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-14 16:41:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Peng Liu 2020-04-22 14:03:56 UTC
Description of problem:

HTTP keepalive response loss in HTTP persistent connection for hostNetwork pods with OVN

Version-Release number of selected component (if applicable):

Client Version: 4.5.0-0.ci-2020-04-13-003218
Server Version: 4.5.0-0.ci-2020-04-13-003218
Kubernetes Version: v1.18.0-rc.1

How reproducible:

Create a cluster with OVN as cluster network

Steps to Reproduce:
1. Create a pod with hostNetwork
```
---
apiVersion: v1
kind: Pod
metadata:
  name: testpod
  namespace: default
spec:
  hostNetwork: true
  containers:
  - name: testpod 
    image: openshift/origin-cli
    command: [ "sleep", "infinity"]

```

2. Attach to the pod, and start an HTTP persistent connection to the kube-apiserver. e.g. oc get pod --watch testpod

3. Start 'toolbox' on the worker where the testpod is scheduled, then install tcpdump and iproute

4. In the toolbox, find out the TCP connection by 'ss -tanp | grep oc', trace the TCP connection with tcpdump e.g.'tcpdump -n -i ovn-k8s-mp0 tcp port 49942' 

Actual results:
By default, every 30s, the client sets HTTP keepalive message to the apiserver, and the apiserver shall reply. In the following tcpdump output, there are a lot of HTTP keepalive responses are missing

tcpdump: listening on ovn-k8s-mp0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:05:21.384878 IP (tos 0x0, ttl 64, id 6470, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x640c), seq 944094870, ack 836716150, win 616, options [nop,nop,TS val 4176286007 ecr 3403763358], length 0
13:05:21.385276 IP (tos 0x0, ttl 61, id 909, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0x13c6 (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3403793563 ecr 4175621368], length 0
13:05:51.592980 IP (tos 0x0, ttl 64, id 6471, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x7811), seq 0, ack 1, win 616, options [nop,nop,TS val 4176316212 ecr 3403793563], length 0
13:06:21.800846 IP (tos 0x0, ttl 64, id 6472, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x0214), seq 0, ack 1, win 616, options [nop,nop,TS val 4176346417 ecr 3403793563], length 0
13:06:52.008845 IP (tos 0x0, ttl 64, id 6473, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x8c16), seq 0, ack 1, win 616, options [nop,nop,TS val 4176376622 ecr 3403793563], length 0
13:06:52.009210 IP (tos 0x0, ttl 61, id 913, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0xb1cc (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3403884179 ecr 4175621368], length 0
13:07:22.216846 IP (tos 0x0, ttl 64, id 6474, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0xb41e), seq 0, ack 1, win 616, options [nop,nop,TS val 4176406828 ecr 3403884179], length 0
13:07:52.424860 IP (tos 0x0, ttl 64, id 6475, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x3e21), seq 0, ack 1, win 616, options [nop,nop,TS val 4176437033 ecr 3403884179], length 0
13:08:22.632846 IP (tos 0x0, ttl 64, id 6476, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0xc823), seq 0, ack 1, win 616, options [nop,nop,TS val 4176467238 ecr 3403884179], length 0
13:08:52.840867 IP (tos 0x0, ttl 64, id 6477, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x5226), seq 0, ack 1, win 616, options [nop,nop,TS val 4176497443 ecr 3403884179], length 0
13:08:52.841280 IP (tos 0x0, ttl 61, id 917, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0xd9d5 (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404005000 ecr 4175621368], length 0
13:09:17.805530 IP (tos 0x0, ttl 61, id 918, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0x7854 (correct), seq 0, ack 1, win 249, options [nop,nop,TS val 3404029962 ecr 4175621368], length 0
13:09:17.805566 IP (tos 0x0, ttl 64, id 6478, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x18ab), seq 1, ack 1, win 616, options [nop,nop,TS val 4176522406 ecr 3404005000], length 0
13:09:23.048872 IP (tos 0x0, ttl 64, id 6479, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x0432), seq 0, ack 1, win 616, options [nop,nop,TS val 4176527648 ecr 3404005000], length 0
13:09:53.256842 IP (tos 0x0, ttl 64, id 6480, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x8e33), seq 0, ack 1, win 616, options [nop,nop,TS val 4176557854 ecr 3404005000], length 0
13:09:53.257227 IP (tos 0x0, ttl 61, id 920, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0x2e1f (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404065410 ecr 4176522406], length 0
13:10:23.464854 IP (tos 0x0, ttl 64, id 6481, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x2c3b), seq 0, ack 1, win 616, options [nop,nop,TS val 4176588059 ecr 3404065410], length 0
13:10:23.465236 IP (tos 0x0, ttl 61, id 921, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0xb821 (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404095615 ecr 4176522406], length 0
13:10:53.672847 IP (tos 0x0, ttl 64, id 6482, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x4040), seq 0, ack 1, win 616, options [nop,nop,TS val 4176618264 ecr 3404095615], length 0
13:11:23.880828 IP (tos 0x0, ttl 64, id 6483, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0xca42), seq 0, ack 1, win 616, options [nop,nop,TS val 4176648469 ecr 3404095615], length 0
13:11:23.881428 IP (tos 0x0, ttl 61, id 923, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0xcc25 (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404156026 ecr 4176522406], length 0
13:11:54.088862 IP (tos 0x0, ttl 64, id 6484, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x6849), seq 0, ack 1, win 616, options [nop,nop,TS val 4176678674 ecr 3404156026], length 0
13:11:54.089284 IP (tos 0x0, ttl 61, id 924, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0x5628 (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404186231 ecr 4176522406], length 0
13:12:24.296849 IP (tos 0x0, ttl 64, id 6485, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x7c4d), seq 0, ack 1, win 616, options [nop,nop,TS val 4176708880 ecr 3404186231], length 0
13:12:24.297442 IP (tos 0x0, ttl 61, id 926, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0xe029 (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404216437 ecr 4176522406], length 0
13:12:54.504771 IP (tos 0x0, ttl 64, id 6486, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x9051), seq 0, ack 1, win 616, options [nop,nop,TS val 4176739085 ecr 3404216437], length 0
13:12:54.505158 IP (tos 0x0, ttl 61, id 927, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0x6a2c (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404246642 ecr 4176522406], length 0
13:13:24.712867 IP (tos 0x0, ttl 64, id 6487, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0xa456), seq 0, ack 1, win 616, options [nop,nop,TS val 4176769290 ecr 3404246642], length 0
13:13:24.713520 IP (tos 0x0, ttl 61, id 928, offset 0, flags [DF], proto TCP (6), length 52)
    172.30.0.1.https > 10.128.2.2.49942: Flags [.], cksum 0xf42e (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3404276847 ecr 4176522406], length 0
13:13:54.920839 IP (tos 0x0, ttl 64, id 6488, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0xb85b), seq 0, ack 1, win 616, options [nop,nop,TS val 4176799495 ecr 3404276847], length 0
13:14:25.128857 IP (tos 0x0, ttl 64, id 6489, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.49942 > 172.30.0.1.https: Flags [.], cksum 0xb8c7 (incorrect -> 0x425e), seq 0, ack 1, win 616, options [nop,nop,TS val 4176829700 ecr 3404276847], length 0


Expected results:
All HTTP keepalive response messages can be received by the pod with hostNetwork.

Additional info:

Comment 1 Peng Liu 2020-04-22 14:10:06 UTC
Initially, I noticed this issue in the sriov-network-operator config daemon pod, which is a hostNetwork pod and utilizes the Kubernetes client-go informer to watch the custom resource update. Sometimes, I saw the custom resource updates are not received by the pod as expected.

Comment 5 mcambria@redhat.com 2020-05-13 18:58:41 UTC
I believe I reproduced the setup on AWS.  

I started testpod using your supplied yaml

For Step 2 I did:

1/ oc rsh testpod
2/ put of copy of kubeconfig from my laptop to /tmp/kubeadmin and exported that so I could
3/ use this node to create a persistent tcp connection using "oc get pod --watch testpod" as suggested

$ oc rsh testpod
sh-4.2# vi /tmp/kubeadmin
sh-4.2# export KUBECONFIG=/tmp/kubeadmin
sh-4.2# oc get pod --watch testpod
NAME      READY   STATUS    RESTARTS   AGE
testpod   1/1     Running   0          2m22s

I ran tcpdump on the same node where the testpod is running via "oc debug node/xxx" where "xxx" is the node shown via "oc get pod testpod -o wide"


sh-4.2# ss -tanp | grep oc
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port            
ESTAB      0      0      10.0.159.253:57212              10.0.173.240:6443                users:(("oc",pid=638495,fd=3)

sh-4.2# tcpdump -n -i ens3 -n tcp port 57212
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
18:26:12.515074 IP 10.0.159.253.57212 > 10.0.173.240.6443: Flags [.], ack 2228341658, win 1171, options [nop,nop,TS val 1179527924 ecr 2953284654], length 0
18:26:12.515634 IP 10.0.173.240.6443 > 10.0.159.253.57212: Flags [.], ack 1, win 297, options [nop,nop,TS val 2953314863 ecr 1179510755], length 0
18:26:42.723088 IP 10.0.159.253.57212 > 10.0.173.240.6443: Flags [.], ack 1, win 1171, options [nop,nop,TS val 1179558132 ecr 2953314863], length 0
18:26:42.723634 IP 10.0.173.240.6443 > 10.0.159.253.57212: Flags [.], ack 1, win 297, options [nop,nop,TS val 2953345071 ecr 1179510755], length 0
18:27:12.931106 IP 10.0.159.253.57212 > 10.0.173.240.6443: Flags [.], ack 1, win 1171, options [nop,nop,TS val 1179588340 ecr 2953345071], length 0
18:27:12.931651 IP 10.0.173.240.6443 > 10.0.159.253.57212: Flags [.], ack 1, win 297, options [nop,nop,TS val 2953375279 ecr 1179510755], length 0
18:27:43.139103 IP 10.0.159.253.57212 > 10.0.173.240.6443: Flags [.], ack 1, win 1171, options [nop,nop,TS val 1179618549 ecr 2953375279], length 0
18:27:43.139650 IP 10.0.173.240.6443 > 10.0.159.253.57212: Flags [.], ack 1, win 297, options [nop,nop,TS val 2953405488 ecr 1179510755], length 0



Observations:

tcpdump didn't capture anything on the ovn-k8s-mp0 interface.  This makes sense given that the testpod is on the host network.  It should not be using OVN netdev (ovn-k8s-mp0) but a device of the host/VM which is used to connect to external hosts/VMs.

I did find bidirectional tcp traffic on the tcp sessions identified by "ss -tanp | grep oc" on a host interface, ens3.  There are differences from what you show however.

a/ In both your description and in what I see via tcpdump on multiple AWS clusters, we are NOT using HTTP Keepalive.  HTTP Keepalive uses Layer 7 and IIRC sends actual text in HTTP Header, resulting in the tcp sequence number to increment to account for this text.  It looks like TCP Keepalive (Layer 4) is being used.  (BTW, doesn't k8s use a websocket for e.g. notifications?  Regardless, the tcp sequence number doesn't change so no bytes are sent.)

b/ I do not see any "keep alive packet loss"

c/ All my tcpdump captures show source and destination IP on 10.0.x.x/20.  Your capture shows e.g. "10.128.2.2 > 172.30.0.1"   On AWS the host networks are on 10.0.x.y/20 subnets.  What does your SR-IOV setup use for addresses on the host interfaces?

d/ What I see is traffic to/from apiserver on port 6333; your capture shows "https".  Typically https is port 443 (iirc tcpdump will display what it finds in /etc/services.) 

So those are the differences I see.  To proceed we should figure out why things are different.

Perhaps you're capturing on the wrong device?

Is the apiserver really using 172.30.0.1 "https"?

Does the output of 'ss -tanp | grep oc' in Step 4 really show a tcp connection to "https" == tcp/443?  Or should we see traffic to/from tcp/6433?

Comment 6 Peng Liu 2020-05-14 07:06:58 UTC
It seems you didn't use the kubeconfig which attached to pod automatically. https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod 
The /tmp/kubeadmin file you used may not have the kube-apiservice service URL which can be resolved to IP '172.30.0.1'. To access the kube-apiserver in a pod with 'oc' command, you don't need to specify the 'KUBECONFIG' environment variable. It shall just work. But you may need to bind the service account a proper role, so that it can access the pod API.

Comment 7 mcambria@redhat.com 2020-05-14 20:52:26 UTC
Using the kube-apiservice service URL I don't see HTTP keepalives being used.  I see TCP Keepalive packets, and don't see any acks being lost.

Using tcpdump with "-i any" just in case asymmetric routing is used:

sh-4.2# ss -tanp | grep oc                   
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
ESTAB      0      0      10.128.2.2:43456              172.30.0.1:443                 users:(("oc",pid=614771,fd=3))
sh-4.2# 
sh-4.2# tcpdump -n -i any  -n  tcp port 43456
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
20:42:20.617190 IP 10.128.2.2.43456 > 172.30.0.1.443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747343644 ecr 1983361085], length 0
20:42:20.617301 IP 169.254.33.2.43456 > 10.0.131.163.6443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747343644 ecr 1983361085], length 0
20:42:20.617319 IP 10.0.128.162.43456 > 10.0.131.163.6443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747343644 ecr 1983361085], length 0
20:42:20.617425 IP 10.0.131.163.6443 > 10.0.128.162.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983391294 ecr 1747313149], length 0
20:42:20.617435 IP 10.0.131.163.6443 > 169.254.33.2.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983391294 ecr 1747313149], length 0
20:42:20.617485 IP 172.30.0.1.443 > 10.128.2.2.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983391294 ecr 1747313149], length 0
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
sh-4.2# 

Each interface individually:


sh-4.2# tcpdump -n -i ovn-k8s-mp0 -n  tcp port 43456
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ovn-k8s-mp0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:45:21.865185 IP 10.128.2.2.43456 > 172.30.0.1.443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747524890 ecr 1983542340], length 0
20:45:21.865539 IP 172.30.0.1.443 > 10.128.2.2.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983572549 ecr 1747493371], length 0
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
sh-4.2# tcpdump -n -i ovn-k8s-gw0 -n  tcp port 43456
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ovn-k8s-gw0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:45:52.073263 IP 169.254.33.2.43456 > 10.0.131.163.6443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747555097 ecr 1983572549], length 0
20:45:52.073387 IP 10.0.131.163.6443 > 169.254.33.2.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983602758 ecr 1747493371], length 0
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
sh-4.2# tcpdump -n -i ens3 -n  tcp port 43456
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
20:46:22.281304 IP 10.0.128.162.43456 > 10.0.131.163.6443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747585305 ecr 1983602758], length 0
20:46:22.281389 IP 10.0.131.163.6443 > 10.0.128.162.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983632967 ecr 1747493371], length 0
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
sh-4.2# 


sh-4.2# oc version 
Client Version: v4.2.0-alpha.0-571-gf415627
Server Version: 4.5.0-0.ci-2020-05-11-130006
Kubernetes Version: v1.18.0-rc.1
sh-4.2# 

If keepalives are being lost, the connection would terminate eventually.  Does the connection come down? 

Does tcpdump with "-i any" show any other traffic with the same ephemeral tcp port used by the client as show via 'ss -tanp | grep oc'?

Comment 8 Peng Liu 2020-05-15 07:35:29 UTC
> Using the kube-apiservice service URL I don't see HTTP keepalives being used.  I see TCP Keepalive packets, and don't see any acks being lost.

I cannot say those messages are 100% HTTP keepalive, as they are HTTPS which cannot be decoded by tcpdump. But it is the best guess. Here are my reason. In https://github.com/golang/go/blob/0e617d3d5c7e89b1ad1b0285fc77314b8d056211/src/net/http/transport.go#L46, we can see the golang HTTP client have a default 30s keepalive timer. And it can't be the TCP keepalive, because the default net.ipv4.tcp_keepalive_time is 7200s and the default net.ipv4.tcp_keepalive_intvl is 75s. It means the TCP keepalive message can only be sent every 75s after 7200s idle for one connection.

> If keepalives are being lost, the connection would terminate eventually.  Does the connection come down? 
Yes, it will go down eventually. However, it can take up to 10mins to be time out, which is unacceptable.

> Does tcpdump with "-i any" show any other traffic with the same ephemeral tcp port used by the client as show via 'ss -tanp | grep oc'?

I think you may need to tcpdump for a longer time. Here I collected the trace on all the interfaces for 5 mins http://pastebin.test.redhat.com/865367. We can see the requests to apiserver didn't get a response at '07:07:38.264792', '07:08:38.679891', '07:09:08.887865', '07:09:39.095868' ...

Comment 9 mcambria@redhat.com 2020-05-15 16:49:16 UTC

The TCP packets do not contain any data, e.g. the sequence number doesn't increment, so nothing is being encrypted.  The IP length of 52 = 40 byte TCP header and 12 bytes of tcp options (e.g. timestamp option)

> I think you may need to tcpdump for a longer time.

I let some of my captures run for an hour.

> Here I collected the trace on all the interfaces for 5 mins
> the requests to apiserver didn't get a response at '07:07:38.264792'

I'm assuming 192.168.111.22 is the address of the apiserver.

07:07:38.264792 IP (tos 0x0, ttl 64, id 56639, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.2.56210 > 172.30.0.1.443: Flags [.], cksum 0xb8c7 (incorrect -> 0x6bdc), seq 3806694337, ack 3471633134, win 343, options [nop,nop,TS val 1729971343 ecr 3792878936], length 0
07:07:38.264960 IP (tos 0x0, ttl 62, id 56639, offset 0, flags [DF], proto TCP (6), length 52)
    169.254.33.2.56210 > 192.168.111.22.6443: Flags [.], cksum 0x124e (correct), seq 0, ack 1, win 343, options [nop,nop,TS val 1729971343 ecr 3792878936], length 0
07:07:38.264974 IP (tos 0x0, ttl 61, id 56639, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.111.24.56210 > 192.168.111.22.6443: Flags [.], cksum 0xad8d (correct), seq 0, ack 1, win 343, options [nop,nop,TS val 1729971343 ecr 3792878936], length 0
07:07:38.265105 IP (tos 0x0, ttl 64, id 29121, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.111.22.6443 > 192.168.111.24.56210: Flags [.], cksum 0x4a5f (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3792939347 ecr 1729608740], length 0
07:07:38.265119 IP (tos 0x0, ttl 63, id 29121, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.111.22.6443 > 169.254.33.2.56210: Flags [.], cksum 0xaf1f (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3792939347 ecr 1729608740], length 0
07:07:38.265127 IP (tos 0x0, ttl 63, id 29121, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.111.22.6443 > 169.254.33.2.56210: Flags [.], cksum 0xaf1f (correct), seq 1, ack 1, win 249, options [nop,nop,TS val 3792939347 ecr 1729608740], length 0

The packet sent at '07:07:38.264792' has id 56639.  It is ack'ed.  The ack has id 29121.   But the ack is sent to the wrong place.  It seems to get lost on the last leg.  It looks like the apiserver(?) 192.168.111.22.6443 address/port does not get SNAT'ed to 172.30.0.1.443 (the kube-apiservice service) and the destination address should DNAT'ed to that of the client, 10.128.2.2


From my prior comment, this is what I see when things work.  The last packet is where things are different.

sh-4.2# tcpdump -n -i any  -n  tcp port 43456
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
20:42:20.617190 IP 10.128.2.2.43456 > 172.30.0.1.443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747343644 ecr 1983361085], length 0
20:42:20.617301 IP 169.254.33.2.43456 > 10.0.131.163.6443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747343644 ecr 1983361085], length 0
20:42:20.617319 IP 10.0.128.162.43456 > 10.0.131.163.6443: Flags [.], ack 1254449195, win 905, options [nop,nop,TS val 1747343644 ecr 1983361085], length 0
20:42:20.617425 IP 10.0.131.163.6443 > 10.0.128.162.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983391294 ecr 1747313149], length 0
20:42:20.617435 IP 10.0.131.163.6443 > 169.254.33.2.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983391294 ecr 1747313149], length 0
20:42:20.617485 IP 172.30.0.1.443 > 10.128.2.2.43456: Flags [.], ack 1, win 277, options [nop,nop,TS val 1983391294 ecr 1747313149], length 0
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
sh-4.2#


Why is your apiserver(?) address using 192.168.111.22?

What else is different between setups?  I'm using 4.5 AWS CI builds.

Comment 11 zhaozhanqi 2020-05-22 10:38:10 UTC
I tried this on version 4.5.0-0.nightly-2020-05-21-024030, did not reproduce this issue. 

#ss -tanp | grep sriov
ESTAB     0      0                    10.128.2.2:51000               172.30.0.1:443          users:(("sriov-network-c",pid=116084,fd=3))                        
# tcpdump -n -i ovn-k8s-mp0 tcp port 51000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ovn-k8s-mp0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:29:05.719245 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 495794858:495794889, ack 1182403833, win 500, options [nop,nop,TS val 3080113930 ecr 3097246815], length 31
10:29:05.719293 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 31, win 1068, options [nop,nop,TS val 3097251775 ecr 3080113930], length 0
10:29:05.719955 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 1:40, ack 31, win 1068, options [nop,nop,TS val 3097251775 ecr 3080113930], length 39
10:29:05.721689 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 31:88, ack 40, win 500, options [nop,nop,TS val 3080113933 ecr 3097251775], length 57
10:29:05.762127 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 88, win 1068, options [nop,nop,TS val 3097251817 ecr 3080113933], length 0
10:29:06.885940 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 40:79, ack 88, win 1068, options [nop,nop,TS val 3097252941 ecr 3080113933], length 39
10:29:06.893506 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 88:177, ack 79, win 500, options [nop,nop,TS val 3080115104 ecr 3097252941], length 89
10:29:06.893561 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 177, win 1068, options [nop,nop,TS val 3097252949 ecr 3080115104], length 0
10:29:06.893657 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [.], seq 177:2873, ack 79, win 500, options [nop,nop,TS val 3080115104 ecr 3097252941], length 2696
10:29:06.893672 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 2873, win 1110, options [nop,nop,TS val 3097252949 ecr 3080115104], length 0
10:29:06.894073 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 2873:3987, ack 79, win 500, options [nop,nop,TS val 3080115105 ecr 3097252941], length 1114
10:29:06.894096 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 3987, win 1131, options [nop,nop,TS val 3097252949 ecr 3080115105], length 0
10:29:06.895237 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 79:120, ack 3987, win 1131, options [nop,nop,TS val 3097252950 ecr 3080115105], length 41
10:29:06.895463 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 120:3930, ack 3987, win 1131, options [nop,nop,TS val 3097252951 ecr 3080115105], length 3810
10:29:06.895835 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [.], ack 3930, win 559, options [nop,nop,TS val 3080115107 ecr 3097252950], length 0
10:29:06.896507 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 3987:4022, ack 3930, win 559, options [nop,nop,TS val 3080115108 ecr 3097252950], length 35
10:29:06.896651 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 4022:4083, ack 3930, win 559, options [nop,nop,TS val 3080115108 ecr 3097252950], length 61
10:29:06.896694 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 4083, win 1131, options [nop,nop,TS val 3097252952 ecr 3080115108], length 0
10:29:06.907793 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 4083:4149, ack 3930, win 559, options [nop,nop,TS val 3080115119 ecr 3097252952], length 66
10:29:06.907825 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [.], seq 4149:6845, ack 3930, win 559, options [nop,nop,TS val 3080115119 ecr 3097252952], length 2696
10:29:06.907844 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 6845, win 1174, options [nop,nop,TS val 3097252963 ecr 3080115119], length 0
10:29:06.908216 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 6845:7966, ack 3930, win 559, options [nop,nop,TS val 3080115119 ecr 3097252952], length 1121
10:29:06.949081 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 7966, win 1195, options [nop,nop,TS val 3097253004 ecr 3080115119], length 0
10:29:10.671504 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 3930:3969, ack 7966, win 1195, options [nop,nop,TS val 3097256726 ecr 3080115119], length 39
10:29:10.679804 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 7966:8055, ack 3969, win 559, options [nop,nop,TS val 3080118890 ecr 3097256726], length 89
10:29:10.679863 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 8055, win 1195, options [nop,nop,TS val 3097256735 ecr 3080118890], length 0
10:29:10.679960 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 8055:11865, ack 3969, win 559, options [nop,nop,TS val 3080118890 ecr 3097256726], length 3810
10:29:10.679974 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 11865, win 1254, options [nop,nop,TS val 3097256735 ecr 3080118890], length 0
10:29:10.722109 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 11865:11896, ack 3969, win 559, options [nop,nop,TS val 3080118933 ecr 3097256735], length 31
10:29:10.722136 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 11896, win 1254, options [nop,nop,TS val 3097256777 ecr 3080118933], length 0
10:29:10.722801 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 3969:4008, ack 11896, win 1254, options [nop,nop,TS val 3097256778 ecr 3080118933], length 39
10:29:10.725565 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 11896:11930, ack 4008, win 559, options [nop,nop,TS val 3080118936 ecr 3097256778], length 34
10:29:10.766155 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 11930, win 1254, options [nop,nop,TS val 3097256821 ecr 3080118936], length 0
10:29:15.725545 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 11930:11961, ack 4008, win 559, options [nop,nop,TS val 3080123936 ecr 3097256821], length 31
10:29:15.725589 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 11961, win 1254, options [nop,nop,TS val 3097261780 ecr 3080123936], length 0
10:29:15.726137 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 4008:4047, ack 11961, win 1254, options [nop,nop,TS val 3097261781 ecr 3080123936], length 39
10:29:15.727862 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 11961:12018, ack 4047, win 559, options [nop,nop,TS val 3080123938 ecr 3097261781], length 57
10:29:15.768151 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 12018, win 1254, options [nop,nop,TS val 3097261823 ecr 3080123938], length 0
10:29:20.728005 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 12018:12049, ack 4047, win 559, options [nop,nop,TS val 3080128938 ecr 3097261823], length 31
10:29:20.728066 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 12049, win 1254, options [nop,nop,TS val 3097266782 ecr 3080128938], length 0
10:29:20.728680 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 4047:4086, ack 12049, win 1254, options [nop,nop,TS val 3097266783 ecr 3080128938], length 39
10:29:20.730796 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 12049:12106, ack 4086, win 559, options [nop,nop,TS val 3080128941 ecr 3097266783], length 57
10:29:20.771131 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 12106, win 1254, options [nop,nop,TS val 3097266825 ecr 3080128941], length 0
10:29:25.671693 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [P.], seq 4086:4125, ack 12106, win 1254, options [nop,nop,TS val 3097271725 ecr 3080128941], length 39
10:29:25.678932 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 12106:12195, ack 4125, win 559, options [nop,nop,TS val 3080133888 ecr 3097271725], length 89
10:29:25.678989 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 12195, win 1254, options [nop,nop,TS val 3097271733 ecr 3080133888], length 0
10:29:25.679076 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 12195:16005, ack 4125, win 559, options [nop,nop,TS val 3080133888 ecr 3097271725], length 3810
10:29:25.679091 IP 10.128.2.2.51000 > 172.30.0.1.https: Flags [.], ack 16005, win 1314, options [nop,nop,TS val 3097271733 ecr 3080133888], length 0
10:29:25.731067 IP 172.30.0.1.https > 10.128.2.2.51000: Flags [P.], seq 16005:16036, ack 4125, win 559, options [nop,nop,TS val 3080133941 ecr 3097271733], length 31

Comment 12 Federico Paolinelli 2020-05-29 10:12:07 UTC
Seeing this happening while reproducing https://bugzilla.redhat.com/show_bug.cgi?id=1840104

Happening on 4.4 only though.
Reopening and changing the affected version.

oc version
Client Version: 4.4.0-0.nightly-2020-05-27-032519
Server Version: 4.4.0-0.nightly-2020-05-27-032519
Kubernetes Version: v1.17.1+f63db30


tcpdump -n -i ovn-k8s-mp0 tcp port 34518
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ovn-k8s-mp0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:03:12.523308 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1415280526, win 1362, options [nop,nop,TS val 447287914 ecr 298413810], length 0
10:03:42.731231 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447318119 ecr 298413810], length 0
10:03:44.422370 IP 172.30.0.1.https > 10.135.0.2.34518: Flags [.], ack 1, win 313, options [nop,nop,TS val 298506116 ecr 447136727], length 0
10:03:44.422406 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447319811 ecr 298413810], length 0
10:04:12.939287 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447348325 ecr 298413810], length 0
10:04:43.147293 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447378531 ecr 298413810], length 0
10:05:13.355332 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447408736 ecr 298413810], length 0
10:05:43.563338 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447438942 ecr 298413810], length 0
10:06:13.771338 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447469147 ecr 298413810], length 0
10:06:13.771673 IP 172.30.0.1.https > 10.135.0.2.34518: Flags [.], ack 1, win 313, options [nop,nop,TS val 298655454 ecr 447319811], length 0
10:06:43.979291 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447499353 ecr 298655454], length 0
10:06:44.647051 IP 172.30.0.1.https > 10.135.0.2.34518: Flags [.], ack 1, win 313, options [nop,nop,TS val 298686325 ecr 447319811], length 0
10:06:44.647071 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447500021 ecr 298655454], length 0
10:07:14.187234 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447529558 ecr 298655454], length 0
10:07:44.395328 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447559764 ecr 298655454], length 0
10:08:14.603316 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447589970 ecr 298655454], length 0
10:08:44.811300 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447620175 ecr 298655454], length 0
10:09:15.019324 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447650381 ecr 298655454], length 0
10:09:45.227324 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447680586 ecr 298655454], length 0
10:10:15.435225 IP 10.135.0.2.34518 > 172.30.0.1.https: Flags [.], ack 1, win 1362, options [nop,nop,TS val 447710792 ecr 298655454], length 0

Comment 19 mcambria@redhat.com 2020-07-09 14:30:00 UTC
*** Bug 1834914 has been marked as a duplicate of this bug. ***

Comment 20 mcambria@redhat.com 2020-07-14 16:40:55 UTC
This can no longer be reproduced on v4.5.

OVN has the same tag in v4.4 and v4.5.  OVN rules are the same in both versions.

Kernels are different:


v4.4: Red Hat Enterprise Linux CoreOS 44.81.202005062110-0


v4.5 Red Hat Enterprise Linux CoreOS 46.82.202007110640-0 (Ootpa)   4.18.0-211.el8.x86_64

Comment 21 Sai Sindhur Malleni 2020-08-13 17:52:00 UTC
I can also confirm that we are not seeing https://bugzilla.redhat.com/show_bug.cgi?id=1834914 which has been marked a duplicate of this bug, anymore on our testing.


Note You need to log in before you can comment on or make changes to this bug.