Bug 1825219 - openshift-apiserver becomes False after env runs some time due to communication between one master to pods on another master fails with "Unable to connect to the server" [NEEDINFO]
Summary: openshift-apiserver becomes False after env runs some time due to communicati...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: mcambria@redhat.com
QA Contact: zhaozhanqi
URL:
Whiteboard: SDN-CI-IMPACT
: 1840112 1861359 1890341 1899349 1921797 1940706 (view as bug list)
Depends On: 1849736
Blocks: 1967994 1988483
TreeView+ depends on / blocked
 
Reported: 2020-04-17 12:20 UTC by Xingxing Xia
Modified: 2022-06-02 20:03 UTC (History)
66 users (show)

Fixed In Version: ngerasim@redhat.com
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1836052 1967994 1988483 (view as bug list)
Environment:
Last Closed: 2021-12-20 20:17:28 UTC
Target Upstream Version:
anusaxen: needinfo-
anusaxen: needinfo-
mcambria: needinfo? (pladd)


Attachments (Terms of Use)
corrected second packets (1.13 KB, application/vnd.tcpdump.pcap)
2020-08-21 15:11 UTC, Aaron Conole
no flags Details
daemonset to clear route cache entries (1.70 KB, text/plain)
2020-09-28 15:14 UTC, mcambria@redhat.com
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1059 0 None closed Bug 1825219: drop icmp frag needed received from other nodes in the cluster 2021-06-21 10:13:02 UTC
Github openshift cluster-network-operator pull 1107 0 None closed Bug 1825219: Fix nil checks in bootstrapSDN 2021-06-21 10:13:04 UTC
Red Hat Knowledge Base (Solution) 5252831 0 None None None 2020-09-23 16:35:28 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:33:11 UTC

Comment 4 Stefan Schimanski 2020-04-17 15:20:40 UTC
I debugged the cluster and found out that

- the cluster-apiserver-operator connects to openshift-apiserver through kube-apiserver. I exec'ed into the operator pods and try:

  curl -k -i -H "Authorization: Bearer $(cat /var/run/ecrets/kubernetes.io/serviceaccount/token)" --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Host: kubernetes.default.svc.cluster.local" https://a.b.c.d:6443/apis/user.openshift.io/v1/users

  For 30% of the calls the requests for a.b.c.d=10.0.0.4 (that's master-0). The other instances are fine.

  We see this:

HTTP/1.1 503 Service Unavailable
Audit-Id: 36f7f163-8d7e-4e14-82ae-b6383d16661b
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Fri, 17 Apr 2020 15:04:51 GMT
Content-Length: 64

Error trying to reach service: 'net/http: TLS handshake timeout

- The message "Error trying to reach service" is from apimachinery/pkg/util/proxy/transport.go and used by the aggregator, i.e. the aggregator cannot reach openshift-apiserver. We randomly select an openshift-apiserver endpoint IP. So probably one is not reachable. 
- The openshift-apiserver pods don't show any trace of error. So probably requests never reach their target.

- from inside msater-0 kube-apiserver logs I see

E0417 14:59:11.519427       1 controller.go:114] loading OpenAPI spec for "v1.user.openshift.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error trying to reach service: 'net/http: TLS handshake timeout', Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]

  So OpenAPI download fails as well.

- I logged into this pods:

kubectl exec -n openshift-kube-apiserver -it kube-apiserver-xxia-autbug-w6cn2-master-0 /bin/bash

  and checked the openshift-apiserver endpoints manually:

kubectl get endpoints -n openshift-apiserver
api    10.128.0.10:8443,10.129.0.8:8443,10.130.0.10:8443   35h

  and surprising none of them is reachable at all:

curl -i -k https://10.130.0.10:8443

  just blocks and eventually (after a minute?):

HTTP/1.1 503 Service Unavailable
Server: squid/4.9
Mime-Version: 1.0
Date: Fri, 17 Apr 2020 15:19:20 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 3588
X-Squid-Error: ERR_CONNECT_FAIL 110
Vary: Accept-Language
Content-Language: en

curl: (56) Received HTTP code 503 from proxy after CONNECT

  Which proxy? There shouldn't be a proxy in-between.

- I double checked the same from the openshift-apiserver-operator, just to verify that the curl is supposed to work:

kubectl exec -n openshift-apiserver-operator -it openshift-apiserver-operator-68858b89cb-5kqcp /bin/bash
curl -i -k https://10.130.0.10:8443
HTTP/1.1 403 Forbidden
Audit-Id: fa4c7d47-6cd7-4267-8aaa-2724b6af652f
Cache-Control: no-store
Content-Type: application/json
X-Content-Type-Options: nosniff
Date: Fri, 17 Apr 2020 15:16:04 GMT
Content-Length: 233

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403

  This is what is expected.

Comment 5 Stefan Schimanski 2020-04-17 15:21:38 UTC
Moving this to SDN team as there is clearly something wrong with networking.

Comment 6 Ben Bennett 2020-04-17 18:08:41 UTC
Moved to 4.5 to develop the fix, and then we can consider the backport from there.  The problem seems to be between the squid proxy and the target.

Dane, can you take a quick look and see if anything strange catches your eye.  Thanks

Comment 7 Daneyon Hansen 2020-04-23 22:24:00 UTC
As Stefan mentions, traffic between endpoints on the pod, service or machine networks should not be proxied. These networks are automatically added to NO_PROXY from the install-config ConfigMap when the cluster-wide egress proxy feature is enabled. Verify these networks are present in proxy.status.noProxy. For example:

$ oc get cm/cluster-config-v1 -n kube-system -o yaml
apiVersion: v1
data:
  install-config: |
    <SNIP>
    networking:
      clusterNetwork:
      - cidr: 10.128.0.0/14
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.0.0.0/16
      networkType: OpenShiftSDN
      serviceNetwork:
      - 172.30.0.0/16
<SNIP>

If the pod, service and machine networks differ from your install-config, then you must update the configmap and force a reconciliation of the proxy object or update proxy.spec.noProxy with the appropriate network addresses.

$ oc get proxy/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
<SNIP>
status:
  noProxy: <THE_NETWORKS_FROM_ABOVE>,<OTHER_SYSTEM_GENERATED_NOPROXIES>,<USER_PROVIDED_NOPROXIES>

Configuring cluster-wide egress proxy for Azure is covered in detail at https://docs.openshift.com/container-platform/4.3/installing/installing_azure/installing-azure-private.html#installation-configure-proxy_installing-azure-private

Comment 8 Xingxing Xia 2020-05-09 06:25:02 UTC
Currently on hand I don't have an env of the comment 0 matrix "4_4/ipi-on-azure/versioned-installer-ovn-customer_vpc-http_proxy".
But I have an env of matrix 4_5/upi-on-aws/versioned-installer-http_proxy-ovn-ci . Against this env, I checked like comment 7, the proxy.spec.noProxy includes the networks of install-config "networking" part. Then I checked like comment 4:
$ oc rsh -n openshift-kube-apiserver -it kube-apiserver-ip-10-0-52-18.us-east-2.compute.internal
[root@ip-10-0-52-18 /]# env | grep -i proxy
NO_PROXY=.cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.wzheng-share.qe.devcluster.openshift.com,etcd-0.wzheng-share.qe.devcluster.openshift.com,etcd-1.wzheng-share.qe.devcluster.openshift.com,etcd-2.wzheng-share.qe.devcluster.openshift.com,localhost,test.no-proxy.com
HTTPS_PROXY=http://<proxy user>:<proxy pass>@ec2-3-***.amazonaws.com:3128
HTTP_PROXY=http://<proxy user>:<proxy pass>@ec2-3-***.amazonaws.com:3128
 
# though 10.128.0.0/14 is included in NO_PROXY, why below curl still go through the proxy server?
[root@ip-10-0-52-18 /]# curl -v -k https://10.128.0.10:8443
* About to connect() to proxy ec2-3-*** port 3128 (#0)
*   Trying 10.0.11.102...
* Connected to ec2-3-*** (10.0.11.102) port 3128 (#0)
* Establish HTTP proxy tunnel to 10.128.0.10:8443
* Proxy auth using Basic with user '<proxy user>'
> CONNECT 10.128.0.10:8443 HTTP/1.1
> Host: 10.128.0.10:8443
...
< HTTP/1.1 503 Service Unavailable
< Server: squid/4.9
< Mime-Version: 1.0
< Date: Sat, 09 May 2020 03:40:41 GMT
...
* Received HTTP code 503 from proxy after CONNECT

This is expected, because curl does not support CIDR in NO_PROXY, see https://curl.haxx.se/docs/manual.html "A comma-separated list of host names that shouldn't go through any proxy is set in ... NO_PROXY". Then I tried appending above IP by ` export NO_PROXY="$NO_PROXY,10.128.0.10" `, then the following ` curl -i -k https://10.128.0.10:8443 ` worked without the 503 issue.
Though, this env does not have the "openshift-apiserver False" issue of this bug.

Comment 11 Daneyon Hansen 2020-05-15 16:54:23 UTC
Now that it's clear the calls to apiserver are not being proxied, I'm reassigning to the SDN team.

Comment 16 Xingxing Xia 2020-06-03 02:48:14 UTC
This morning, checked the env again, `oc get co openshift-apiserver` is fine now:
openshift-apiserver   4.5.0-0.nightly-2020-06-01-165039   True        False         False      82m
> the master-0 KAS no route to the master-1 OAS pod
> the master-1 KAS no route to the master-0 OAS pod 
But this "no route to host" issue in comment 13 still exists. And the OAS-O logs can show "no route to host" and thus switch between False and True:
oc logs -n openshift-apiserver-operator openshift-apiserver-operator-7b598687b-9pfjm | grep "clusteroperator/openshift-apiserver changed: Available changed from"
I0603 00:53:36.975214       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"9e404cb5-5bac-47a3-a382-af659698de50", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available changed from True to False ("APIServicesAvailable: apiservices.apiregistration.k8s.io/v1.apps.openshift.io: not available: failing or missing response from https://10.129.0.47:8443/apis/apps.openshift.io/v1: Get https://10.129.0.47:8443/apis/apps.openshift.io/v1: dial tcp 10.129.0.47:8443: connect: no route to host")
I0603 00:53:39.166852       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"9e404cb5-5bac-47a3-a382-af659698de50", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available changed from False to True ("")
I0603 00:53:39.174551       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"9e404cb5-5bac-47a3-a382-af659698de50", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available changed from False to True ("")
I0603 01:17:06.974338       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"9e404cb5-5bac-47a3-a382-af659698de50", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available changed from True to False ("APIServicesAvailable: apiservices.apiregistration.k8s.io/v1.apps.openshift.io: not available: failing or missing response from https://10.129.0.47:8443/apis/apps.openshift.io/v1: Get https://10.129.0.47:8443/apis/apps.openshift.io/v1: dial tcp 10.129.0.47:8443: connect: no route to host")
I0603 01:17:09.220269       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"9e404cb5-5bac-47a3-a382-af659698de50", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available changed from False to True ("")

Comment 17 Xingxing Xia 2020-06-03 03:15:30 UTC
Checked more, found openshift-apiserver-operator is on master-0 (comment 16 10.129.0.47 is the IP of openshift-apiserver pod on master-1):
[xxia@pres 2020-06-03 11:08:57 CST my]$ oc get po -n openshift-apiserver-operator -o wide
NAME                                           READY   STATUS    RESTARTS   AGE   IP           NODE                          NOMINATED NODE   READINESS GATES
openshift-apiserver-operator-7b598687b-9pfjm   1/1     Running   2          26h   10.130.0.2   hongli-pl039-hmf46-master-0   <none>           <none>

Comment 19 Xingxing Xia 2020-06-05 08:51:05 UTC
Note: as comment 13 shows, comment 13's env does not have http(s)_proxy but hit the issue.

Today hit it again which is upi-on-azure http_proxy env this time. Debugging found more clue about network:
Communication between pod on master-1 to any pod on master-2 fails with "Unable to connect to the server". Other communication, e.g. between pod on master-1 to any pod on master-0, does not fail with it:
# check master-2 pods
$ oc get po -A -o wide | grep master-2 | grep -v "10\.0\.0" | grep -v Completed
openshift-apiserver                                     apiserver-86b47c6dcf-r6nvf                                        1/1     Running            0          20h     10.129.0.9     qe-jiazha-up3-06040541-master-2           <none>           <none>
openshift-controller-manager                            controller-manager-bxhnp                                          1/1     Running            0          119m    10.129.0.22    qe-jiazha-up3-06040541-master-2           <none>           <none>
...
openshift-multus                                        multus-admission-controller-gdzfj                                 2/2     Running            0          20h     10.129.0.7     qe-jiazha-up3-06040541-master-2           <none>           <none>

# check master-0 pods
[xxia@pres 2020-06-05 16:18:14 CST my]$ oc get po -A -o wide | grep master-0 | grep -v "10\.0\.0" | grep -v Completed
openshift-apiserver                                     apiserver-86b47c6dcf-tfspw                                        1/1     Running             0          21h     10.130.0.17    qe-jiazha-up3-06040541-master-0           <none>           <none>
...
openshift-controller-manager                            controller-manager-jw8p6                                          1/1     Running             0          150m    10.130.0.34    qe-jiazha-up3-06040541-master-0           <none>           <none>
...
openshift-multus                                        multus-admission-controller-2k8hj                                 2/2     Running             0          21h     10.130.0.9     qe-jiazha-up3-06040541-master-0           <none>           <none>

# ssh to master-1, communication with any above pod on master-2 fails with "Unable to connect to the server"
[core@qe-jiazha-up3-06040541-master-1 ~]$ oc get --insecure-skip-tls-verify --raw "/" --server https://10.129.0.9:8443/
Unable to connect to the server: net/http: TLS handshake timeout
[core@qe-jiazha-up3-06040541-master-1 ~]$ oc get --insecure-skip-tls-verify --raw "/" --server https://10.129.0.22:8443/
Unable to connect to the server: net/http: TLS handshake timeout
[core@qe-jiazha-up3-06040541-master-1 ~]$ oc get --insecure-skip-tls-verify --raw "/" --server https://10.129.0.7:8443/
Unable to connect to the server: net/http: TLS handshake timeout

# However, master-1 communication with any above pod on master-0 does not fail with "Unable to connect to the server"
[core@qe-jiazha-up3-06040541-master-1 ~]$ oc get --insecure-skip-tls-verify --raw "/" --server https://10.130.0.17:8443/
Error from server (Forbidden): forbidden: User "system:anonymous" cannot get path "/"
[core@qe-jiazha-up3-06040541-master-1 ~]$ oc get --insecure-skip-tls-verify --raw "/" --server https://10.130.0.34:8443/
Error from server (Forbidden): forbidden: User "system:anonymous" cannot get path "/"
[core@qe-jiazha-up3-06040541-master-1 ~]$ oc get --insecure-skip-tls-verify --raw "/" --server https://10.130.0.9:8443/
error: You must be logged in to the server (the server has asked for the client to provide credentials)

Per above clue, checked logs of above pods on master-2, all found many:
I0605 08:17:56.860067       1 log.go:172] http: TLS handshake error from 10.128.0.1:47786: EOF
I0605 08:18:44.713085       1 log.go:172] http: TLS handshake error from 10.128.0.1:48376: EOF
While pods on master-0 and master-1, don't have such logs. 10.128.0.1 seems related to above "cidr: 10.128.0.0/14"

Comment 21 zhaozhanqi 2020-06-09 05:38:16 UTC
I debugged more from the cluster with reproduced this issue in below
https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/96986/artifact/workdir/install-dir/auth/kubeconfig/*view*/

Found only all https services CANNOT be accessed from master0 hostnetwork pod to master1 container pod. but http works well. and all https services can be accessed from master0 hostnetwork pod to master1 container pod

see. 

###there is one test pod I created on master-1########
oc get pod hello-pod2 -o wide
NAME         READY   STATUS    RESTARTS   AGE    IP            NODE                              NOMINATED NODE   READINESS GATES
hello-pod2   1/1     Running   0          6m4s   10.130.0.28   qe-yapei68sh2-06080632-master-1   <none>           <none>
 
##### try to access above test pod with https with port 8443 on master-0 hostnetwork pod#####
 
$oc exec multus-7c2m9 -n openshift-multus -- curl --connect-timeout 5 https://10.130.0.28:8443 -k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
curl: (28) Operation timed out after 5001 milliseconds with 0 out of 0 bytes received
command terminated with exit code 28
 
######try to access above test pod with http with port 8080 on master-0 hostnetwork pod#####
$oc exec multus-7c2m9 -n openshift-multus -- curl --connect-timeout 5 http://10.130.0.28:8080 -k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    28  100    28    0     0  10241      0 
Hello-OpenShift-1 http-8080    0

#### try to access the https 8443 on another pod which on master-2 ####

$oc exec multus-rf4qw -n openshift-multus -- curl https://10.130.0.28:8443 -k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    29  100    29    0     0    829      0 --:--:-- --:--:-- --:--:--   852
Hello-OpenShift-1 https-8443

Comment 22 zhaozhanqi 2020-06-09 06:45:36 UTC
> 
> Found only all https services CANNOT be accessed from master0 hostnetwork
> pod to master1 container pod. but http works well. and all https services
> can be accessed from master0 hostnetwork pod to master1 container pod
> 

sorry, typo here. and all https services can be accessed from master2 hostnetwork pod to master1 container pod

Comment 40 Xingxing Xia 2020-06-18 02:17:41 UTC
Though hit frequently last two weeks, comment 39 didn't reproduce, removing the keyword unless hit again.

Comment 41 Ben Bennett 2020-06-18 13:17:28 UTC
Since we have been unable to reproduce this for the past three days, the severity has been lowered.  We will continue to investigate this and we can consider a backport once the real issue is understood.

Comment 45 Xingxing Xia 2020-06-30 02:13:38 UTC
(In reply to Anurag saxena from comment #44)
> Apparently on above debug cluster ,kubeapiserver on master2 is continuously complaining timeouts
Yeah it shows timeouts because, as above comments found the cause, communication between one master (here master 2 as you pointed) to pods on another master fails (note, all pods on another master, including the openshift-apiserver pod there; here openshift-apiserver pods host v1.xxx.openshift.io resources, thus the communication returns 503 when querying the openshift-apiserver endpoint on that another master)

Comment 46 Anurag saxena 2020-07-14 22:04:52 UTC
This is a testblocker for now specially blocking OVN Hybrid windows cluster on Azure. These clusters seems to degrade with apiserver within 12 hours, blocking further testing.

Comment 49 Anurag saxena 2020-07-23 19:37:55 UTC
@mcambria, ip route show cache on one of 6 nodes says
(That one node is one of the master)

# oc debug node/reliab453ovn2-kv2xm-master-0 -- chroot /host ip route show cache
Starting pod/reliab453ovn2-kv2xm-master-0-debug ...
To use host binaries, run `chroot /host`
10.0.0.8 dev eth0 
    cache expires 317sec mtu 1400 

Removing debug pod ...

Kubeconfig : https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/103439/artifact/workdir/install-dir/auth/kubeconfig

Comment 55 Simon Reber 2020-08-03 13:11:55 UTC
*** Bug 1861359 has been marked as a duplicate of this bug. ***

Comment 69 Aaron Conole 2020-08-21 15:11:22 UTC
Created attachment 1712185 [details]
corrected second packets

Comment 84 mcambria@redhat.com 2020-09-17 13:57:28 UTC
*** Bug 1840112 has been marked as a duplicate of this bug. ***

Comment 89 Ben Bennett 2020-09-23 16:35:28 UTC
Knowledge Base article https://access.redhat.com/solutions/5252831 describes the workaround.  But the article doesn't describe how to apply it to the nodes.  It would be best to use a daemonset to make sure it runs on all nodes.  You can just have the daemonset run a bash loop forever as they are doing at https://github.com/Azure/ARO-RP/blob/master/pkg/routefix/routefix.go#L31. The daemonset they use is at https://github.com/Azure/ARO-RP/blob/master/pkg/routefix/routefix.go#L147, but it is embedded in a go program.

Comment 90 mcambria@redhat.com 2020-09-28 15:13:11 UTC


Here is how to get the image to use.  First get the name of the network-operator pod:

$ oc get pods  --namespace openshift-network-operator -o wide
NAME                               READY   STATUS    RESTARTS AGE    IP         NODE                         NOMINATED NODE READINESS GATES
network-operator-8c7746884-2mm7p   1/1     Running   0 3d1h 10.0.0.6   qe-anurag54-hmprt-master-0 <none> <none>
$

Describe this pod looking for Image:


$ oc describe pod --namespace openshift-network-operator network-operator-8c7746884-2mm7p  | grep Image
    Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c11ebce7a9c619e0585c10b3a4cbc6f81c3c82670677587fa3e18525e1dc276
    Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c11ebce7a9c619e0585c10b3a4cbc6f81c3c82670677587fa3e18525e1dc276
$


Use this image in the daemonset (also attached):


kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: cachefix
  namespace: openshift-network-operator
  annotations:
    kubernetes.io/description: |
      This daemonset will flush route cache entries created with mtu of 1450.  See https://bugzilla.redhat.com/show_bug.cgi?id=1825219
    release.openshift.io/version: "{{.ReleaseVersion}}"
spec:
  selector:
    matchLabels:
      app: cachefix
  template:
    metadata:
      labels:
        app: cachefix
        component: network
        type: infra
        openshift.io/component: network
        kubernetes.io/os: "linux"
    spec:
      hostNetwork: true
      priorityClassName: "system-cluster-critical"
      containers:
      #
      - name: cachefix
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c11ebce7a9c619e0585c10b3a4cbc6f81c3c82670677587fa3e18525e1dc276
        command:
        - /bin/bash
        - -c
        - |
          set -xe
          echo "I$(date "+%m%d %H:%M:%S.%N") - cachefix - start cachefix ${K8S_NODE}"
          for ((;;))
            do
              if ip route show cache | grep -q 'mtu 14'; then
                 ip route show cache
                 ip route flush cache
              fi
              sleep 60
            done
        lifecycle:
          preStop:
            exec:
              command: ["/bin/bash", "-c", "echo cachefix done"]
        securityContext:
          privileged: true
        env:
        - name: K8S_NODE
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
      nodeSelector:
        beta.kubernetes.io/os: "linux"
      tolerations:
        - operator: "Exists"
          effect: "NoExecute"
        - operator: "Exists"
          effect: "NoSchedule"

Comment 91 mcambria@redhat.com 2020-09-28 15:14:37 UTC
Created attachment 1717284 [details]
daemonset to clear route cache entries

Comment 92 Ben Bennett 2020-09-28 16:28:31 UTC
The workaround Mike provided should get QE unblocked while we wait for Azure to work out what is really wrong, and for the kernel change that works around the issue.

Comment 105 Ben Bennett 2020-10-08 13:01:17 UTC
*** Bug 1886141 has been marked as a duplicate of this bug. ***

Comment 115 Ben Bennett 2020-11-06 17:54:09 UTC
*** Bug 1890341 has been marked as a duplicate of this bug. ***

Comment 116 Ben Bennett 2020-11-23 15:41:43 UTC
*** Bug 1899349 has been marked as a duplicate of this bug. ***

Comment 120 Ben Bennett 2020-12-03 14:34:11 UTC
*** Bug 1899349 has been marked as a duplicate of this bug. ***

Comment 154 To Hung Sze 2021-03-15 15:40:28 UTC
Please ignore the comment above. Wrong ticket. Sorry.

Comment 162 Vladislav Walek 2021-03-24 17:36:29 UTC
*** Bug 1940706 has been marked as a duplicate of this bug. ***

Comment 176 Ben Bennett 2021-05-10 17:59:19 UTC
*** Bug 1921797 has been marked as a duplicate of this bug. ***

Comment 178 Ben Bennett 2021-05-25 12:36:49 UTC
Pulling this back until we get https://github.com/openshift/cluster-network-operator/pull/1107 merged too.

Comment 180 zhaozhanqi 2021-06-08 04:20:34 UTC
Hi, Michael

I have one cluster with version 4.8.0-0.nightly-2021-06-03-221810 on azure and working 2 days. and this issue did not be happen. 

but cannot confirm the fixed PR is working well. 

Do you have a better way to provide the fixed PR can fix this issue?



        volumeMounts:
        - mountPath: /etc/pki/tls/metrics-certs
          name: sdn-metrics-certs
          readOnly: true
      - command:
        - /bin/bash
        - -c
        - |
          set -xe

          touch /var/run/add_iptables.sh
          chmod 0755 /var/run/add_iptables.sh
          cat <<'EOF' > /var/run/add_iptables.sh
          #!/bin/sh
          if [ -z "$3" ]
          then
               echo "Called with host address missing, ignore"
               exit 0
          fi
          echo "Adding ICMP drop rule for '$3' "
          if iptables -C CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION
          then
               echo "iptables already set for $3"
          else
               iptables -A CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION
          fi
          EOF

          echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}"
          iptables -X CHECK_ICMP_SOURCE || true
          iptables -N CHECK_ICMP_SOURCE || true
          iptables -F CHECK_ICMP_SOURCE
          iptables -D INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE || true
          iptables -I INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE
          iptables -N ICMP_ACTION || true
          iptables -F ICMP_ACTION
          iptables -A ICMP_ACTION -j LOG
          iptables -A ICMP_ACTION -j DROP
          oc observe pods -n openshift-sdn -l app=sdn -a '{ .status.hostIP }' -- /var/run/add_iptables.sh
        env:
        - name: K8S_NODE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dd70a5200b6de5bc872b2424701a81031bc212453b6d8b4d11e04995054ca952
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/bash
              - -c
              - echo drop-icmp done
        name: drop-icmp
        resources:
          requests:
            cpu: 5m
            memory: 20Mi
        securityContext:
          privileged: true

Comment 181 mcambria@redhat.com 2021-06-08 12:49:27 UTC
(In reply to zhaozhanqi from comment #180)

> Do you have a better way to provide the fixed PR can fix this issue?

No.  The issue takes 2 to 14 days to even show up.

The best I can suggest is to check the iptables counters to see if any of the `ICMP_ACTION -j DROP` rules are non-zero.

Comment 184 zhaozhanqi 2021-06-15 01:30:39 UTC
I see the master-2 is capture the packet. 


sdn-knq8n                                                        

:ICMP_ACTION - [0:0]
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.7/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.8/32 -p icmp -j ICMP_ACTION
[11:6336] -A CHECK_ICMP_SOURCE -s 10.0.0.6/32 -p icmp -j ICMP_ACTION                 
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.4/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.6/32 -p icmp -j ICMP_ACTION
[11:6336] -A ICMP_ACTION -j LOG
[11:6336] -A ICMP_ACTION -j DROP 

and check operators are working well

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.0-0.nightly-2021-06-08-161629   True        False         False      17h
baremetal                                  4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
cloud-credential                           4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
cluster-autoscaler                         4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
config-operator                            4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
console                                    4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
csi-snapshot-controller                    4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
dns                                        4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
etcd                                       4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
image-registry                             4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
ingress                                    4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
insights                                   4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
kube-apiserver                             4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
kube-controller-manager                    4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
kube-scheduler                             4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
kube-storage-version-migrator              4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
machine-api                                4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
machine-approver                           4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
machine-config                             4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
marketplace                                4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
monitoring                                 4.8.0-0.nightly-2021-06-08-161629   True        False         False      3h42m
network                                    4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
node-tuning                                4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
openshift-apiserver                        4.8.0-0.nightly-2021-06-08-161629   True        False         False      29h
openshift-controller-manager               4.8.0-0.nightly-2021-06-08-161629   True        False         False      4d23h
openshift-samples                          4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
operator-lifecycle-manager                 4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
service-ca                                 4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h
storage                                    4.8.0-0.nightly-2021-06-08-161629   True        False         False      5d23h


Move this bug to 'verified'

Comment 188 errata-xmlrpc 2021-07-27 22:32:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 192 W. Trevor King 2021-12-20 20:17:28 UTC
This bug shipped with some linked pull requests shipping in 4.8.2.  It is important to track that product change.  If you see similar issues in 4.8.2 or later releases, please open a new bug, which may link its own product changing PRs, and ship in some subsequent release.


Note You need to log in before you can comment on or make changes to this bug.