Bug 1967994 - openshift-apiserver becomes False after env runs some time due to communication between one master to pods on another master fails with "Unable to connect to the server"
Summary: openshift-apiserver becomes False after env runs some time due to communicati...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: mcambria@redhat.com
QA Contact: zhaozhanqi
URL:
Whiteboard: SDN-CI-IMPACT
Depends On: 1825219 1988483
Blocks: 1851549
TreeView+ depends on / blocked
 
Reported: 2021-06-04 16:34 UTC by Ben Bennett
Modified: 2023-09-15 01:09 UTC (History)
65 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1825219
Environment:
Last Closed: 2021-06-29 04:19:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1119 0 None open Bug 1967994: Backport daemonset to drop icmp frag needed packets received from other nodes in the cluster to Rel 4.7 2021-06-11 13:53:28 UTC
Red Hat Knowledge Base (Solution) 5252831 0 None None None 2021-07-14 08:14:01 UTC
Red Hat Product Errata RHBA-2021:2502 0 None None None 2021-06-29 04:20:08 UTC

Internal Links: 1979312

Comment 4 zhaozhanqi 2021-06-18 04:05:37 UTC
Verified this bug on 4.7.0-0.nightly-2021-06-17-224843

Even thought we have not captured the packet from the iptables counter(this may need 2-14 days).  However this way already be proved on 4.8 version https://bugzilla.redhat.com/show_bug.cgi?id=1825219#c184

$ oc get node -o wide
NAME                                              STATUS   ROLES    AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
zzhao47azure2-qjnlt-master-0                      Ready    master   55m   v1.20.0+2817867   10.0.0.8      <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-master-1                      Ready    master   55m   v1.20.0+2817867   10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-master-2                      Ready    master   55m   v1.20.0+2817867   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-worker-northcentralus-7fcq9   Ready    worker   47m   v1.20.0+2817867   10.0.32.5     <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-worker-northcentralus-87zpx   Ready    worker   46m   v1.20.0+2817867   10.0.32.6     <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-worker-northcentralus-jhhm4   Ready    worker   46m   v1.20.0+2817867   10.0.32.4     <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8

$ for f in $(oc -n openshift-sdn get pod -l app=sdn -o jsonpath={.items[*].metadata.name})  ; do echo -e "\n${f}\n" ; oc  -n openshift-sdn exec "${f}" -c sdn  -- iptables-save -c | grep ICMP_ACTION; done

sdn-882cw

:ICMP_ACTION - [0:0]
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.8/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.6/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.4/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.6/32 -p icmp -j ICMP_ACTION
[0:0] -A ICMP_ACTION -j LOG
[0:0] -A ICMP_ACTION -j DROP

sdn-d68lz

Unable to connect to the server: net/http: TLS handshake timeout

sdn-n8hr6

:ICMP_ACTION - [0:0]
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.8/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.6/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.6/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.4/32 -p icmp -j ICMP_ACTION
[0:0] -A ICMP_ACTION -j LOG
[0:0] -A ICMP_ACTION -j DROP

So move this to verified.

Comment 7 errata-xmlrpc 2021-06-29 04:19:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2502

Comment 10 Red Hat Bugzilla 2023-09-15 01:09:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.