Bug 1967994

Summary: openshift-apiserver becomes False after env runs some time due to communication between one master to pods on another master fails with "Unable to connect to the server"
Product: OpenShift Container Platform Reporter: Ben Bennett <bbennett>
Component: NetworkingAssignee: mcambria <mcambria>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: abraj, aconole, aconstan, agabriel, alazic, alchan, algonzal, anbhat, anusaxen, aos-bugs, aygarg, bbennett, benjamin.hunt, bjarolim, christopher.obrien, dcbw, dgautam, dhansen, dornelas, dyocum, emarquez, erich, esimard, ffranz, fgiloux, fshaikh, hgomes, jbenc, jdolling, jhou, jocolema, jokerman, josalisb, kewang, lasilva, mcambria, mfojtik, mharri, mheslin, mifiedle, mnunes, namato, naoto30, oarribas, openshift-bugs-escalate, palonsor, pamoedom, pkhaire, qguo, rcarrier, rgregory, ribarry, rsandu, sbhavsar, scuppett, sople, sreber, sttts, swasthan, tsze, vwalek, wabouham, wking, xxia, zzhao
Version: 4.4Keywords: FastFix
Target Milestone: ---Flags: mifiedle: needinfo? (bbennett)
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SDN-CI-IMPACT
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1825219 Environment:
Last Closed: 2021-06-29 04:19:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1825219, 1988483    
Bug Blocks: 1851549    

Comment 4 zhaozhanqi 2021-06-18 04:05:37 UTC
Verified this bug on 4.7.0-0.nightly-2021-06-17-224843

Even thought we have not captured the packet from the iptables counter(this may need 2-14 days).  However this way already be proved on 4.8 version https://bugzilla.redhat.com/show_bug.cgi?id=1825219#c184

$ oc get node -o wide
NAME                                              STATUS   ROLES    AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
zzhao47azure2-qjnlt-master-0                      Ready    master   55m   v1.20.0+2817867   10.0.0.8      <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-master-1                      Ready    master   55m   v1.20.0+2817867   10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-master-2                      Ready    master   55m   v1.20.0+2817867   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-worker-northcentralus-7fcq9   Ready    worker   47m   v1.20.0+2817867   10.0.32.5     <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-worker-northcentralus-87zpx   Ready    worker   46m   v1.20.0+2817867   10.0.32.6     <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8
zzhao47azure2-qjnlt-worker-northcentralus-jhhm4   Ready    worker   46m   v1.20.0+2817867   10.0.32.4     <none>        Red Hat Enterprise Linux CoreOS 47.83.202106161242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-4.rhaos4.7.gitbaade70.el8

$ for f in $(oc -n openshift-sdn get pod -l app=sdn -o jsonpath={.items[*].metadata.name})  ; do echo -e "\n${f}\n" ; oc  -n openshift-sdn exec "${f}" -c sdn  -- iptables-save -c | grep ICMP_ACTION; done

sdn-882cw

:ICMP_ACTION - [0:0]
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.8/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.6/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.4/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.6/32 -p icmp -j ICMP_ACTION
[0:0] -A ICMP_ACTION -j LOG
[0:0] -A ICMP_ACTION -j DROP

sdn-d68lz

Unable to connect to the server: net/http: TLS handshake timeout

sdn-n8hr6

:ICMP_ACTION - [0:0]
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.8/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.5/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.0.6/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.6/32 -p icmp -j ICMP_ACTION
[0:0] -A CHECK_ICMP_SOURCE -s 10.0.32.4/32 -p icmp -j ICMP_ACTION
[0:0] -A ICMP_ACTION -j LOG
[0:0] -A ICMP_ACTION -j DROP

So move this to verified.

Comment 7 errata-xmlrpc 2021-06-29 04:19:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2502