Bug 2037447
| Summary: | Ingress Operator is not closing TCP connections. | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Akash Semil <asemil> |
| Component: | Networking | Assignee: | Andrew McDermott <amcdermo> |
| Networking sub component: | router | QA Contact: | Shudi Li <shudili> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | amcdermo, aos-bugs, bmehra, bpickard, hongli, mmasters, pwaghmod |
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
Ingress Operator performs health checks against the ingress canary route. Once the health check is done Ingress Operator doesn't close the TCP Connection to the load balancer (LB) because keepalives are enabled on the connection. While performing the next health check a new connection is established to the LB instead of using the existing connection.
Consequence:
This causes the number connection to build upon the LB, overtime exhausting the number of connections on the LB.
Fix:
Disable keepalives when connecting to the canary route.
Result:
A new connection is made and closed each time the canary probe is run. With keepalives disabled there is no longer an accumulation of ESTABLISHED connections.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:41:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2063283 | ||
|
Description
Akash Semil
2022-01-05 16:43:41 UTC
Verified it with 4.11.0-0.nightly-2022-02-18-121223, TCP Keep-Alive packets can't be seen anymore.
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-02-18-121223 True False 26m Cluster version is 4.11.0-0.nightly-2022-02-18-121223
%
2.
% oc -n openshift-ingress-operator get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-operator-6b97f96dd-sq2fw 2/2 Running 2 (39m ago) 50m 10.130.0.22 shudi-411-gcpc3001-m54dd-master-0.c.openshift-qe.internal <none> <none>
%
3.
% oc -n openshift-ingress-canary get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
canary canary-openshift-ingress-canary.apps.shudi-411-gcpc3001.qe.gcp.devcluster.openshift.com ingress-canary 8080 edge/Redirect None
% dig canary-openshift-ingress-canary.apps.shudi-411-gcpc3001.qe.gcp.devcluster.openshift.com
; <<>> DiG 9.10.6 <<>> canary-openshift-ingress-canary.apps.shudi-411-gcpc3001.qe.gcp.devcluster.openshift.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38247
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
;; QUESTION SECTION:
;canary-openshift-ingress-canary.apps.shudi-411-gcpc3001.qe.gcp.devcluster.openshift.com. IN A
;; ANSWER SECTION:
canary-openshift-ingress-canary.apps.shudi-411-gcpc3001.qe.gcp.devcluster.openshift.com. 30 IN A 34.136.11.179
;; Query time: 79 msec
;; SERVER: 10.72.17.5#53(10.72.17.5)
;; WHEN: Wed Feb 23 14:51:08 CST 2022
;; MSG SIZE rcvd: 132
%
4.
% oc debug node/shudi-411-gcpc3001-m54dd-master-0.c.openshift-qe.internal
Starting pod/shudi-411-gcpc3001-m54dd-master-0copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.4
If you don't see a command prompt, try pressing enter.
sh-4.4# NAME=ingress-operator-6b97f96dd-sq2fw
sh-4.4# NAMESPACE=openshift-ingress-operator
sh-4.4# pod_id=$(chroot /host crictl pods --namespace ${NAMESPACE} --name ${NAME} -q)
sh-4.4# ns_path="/host/$(chroot /host bash -c "crictl inspectp $pod_id | jq '.info.runtimeSpec.linux.namespaces[]|select(.type==\"network\").path' -r")"
sh-4.4# nsenter_parameters="--net=${ns_path}"
sh-4.4# nsenter $nsenter_parameters -- tcpdump -i any host 34.136.11.179 -s 0 -w 411cap1.pcap
5. copy the captured packets file to local machine and check it, there aren't the tcp keepalive packets
I copied the doc text from bug 2063283. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |