Bug 2053609
Summary: | LoadBalancer SCTP service leaves stale conntrack entry that causes issues if service is recreated | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Pablo Alonso Rodriguez <palonsor> | |
Component: | Networking | Assignee: | Surya Seetharaman <surya> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Weibin Liang <weliang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | achernet, adrian.fernandez-pello.fernandez, anbhat, anusaxen, aojeagar, augol, dcbw, ddelcian, ealcaniz, ffernand, fpaoline, jgato, kkarampo, openshift-bugs-escalate, surya, trozet, weliang, yjoseph | |
Version: | 4.8 | |||
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2060021 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 10:49:30 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2060021 | |||
Bug Blocks: | 2080069 |
Description
Pablo Alonso Rodriguez
2022-02-11 15:44:48 UTC
upstream fix posted: https://github.com/ovn-org/ovn-kubernetes/pull/2829 The fix posted was tested on our QE clusters and we saw that as soon as the stale clusterIP entry goes away, the new conntrack entry gets created and traffic starts passing again. Details of the fix: - What this fix does and why is it needed Currently when we delete a clusterIP service we don't cleanup the conntrack entries on the node. This PR adds the logic to do this to target cases where we reuse the same LB or NP service VIPs against different clusterIPs when services get recreated. - How to verify it oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE sctpservice LoadBalancer 172.30.151.15 172.31.249.39 30102:31458/SCTP 4s oc get pods -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sctpclient 1/1 Running 0 4h17m 10.131.0.26 asood-2231-rxpwk-worker-qc5p5 <none> <none> sctpserver 1/1 Running 0 4h17m 10.131.0.27 asood-2231-rxpwk-worker-qc5p5 <none> <none> Once connection from client to server is established: conntrack -E -p sctp [NEW] sctp 132 3 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 [UNREPLIED] src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 [NEW] sctp 132 3 src=172.31.249.96 dst=172.30.151.15 sport=47541 dport=30102 [UNREPLIED] src=172.30.151.15 dst=169.254.169.2 sport=30102 dport=47541 zone=64001 [NEW] sctp 132 3 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 [UNREPLIED] src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 zone=12 [NEW] sctp 132 3 src=169.254.169.2 dst=10.131.0.27 sport=47541 dport=30102 [UNREPLIED] src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 [NEW] sctp 132 3 src=100.64.0.5 dst=10.131.0.27 sport=47541 dport=30102 [UNREPLIED] src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 zone=27 [UPDATE] sctp 132 3 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 [UPDATE] sctp 132 3 COOKIE_ECHOED src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 [UPDATE] sctp 132 432000 ESTABLISHED src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 [ASSURED] During an active session: conntrack -L -p sctp sctp 132 431994 ESTABLISHED src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=12 use=1 sctp 132 431994 ESTABLISHED src=172.31.249.96 dst=172.30.151.15 sport=47541 dport=30102 src=172.30.151.15 dst=169.254.169.2 sport=30102 dport=47541 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=1 sctp 132 431994 ESTABLISHED src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 sctp 132 431994 ESTABLISHED src=169.254.169.2 dst=10.131.0.27 sport=47541 dport=30102 src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 sctp 132 431994 ESTABLISHED src=100.64.0.5 dst=10.131.0.27 sport=47541 dport=30102 src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=27 use=1 When we delete the LB svc and recreate it with same VIP but now a new clusterIP: oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE sctpservice LoadBalancer 172.30.206.3 172.31.249.39 30102:31692/SCTP 2m47s [DESTROY] sctp 132 src=172.31.249.96 dst=172.30.151.15 sport=47541 dport=30102 src=172.30.151.15 dst=169.254.169.2 sport=30102 dport=47541 [ASSURED] zone=64001 [DESTROY] sctp 132 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 [ASSURED] [DESTROY] sctp 132 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 [ASSURED] zone=12 [DESTROY] sctp 132 src=169.254.169.2 dst=10.131.0.27 sport=47541 dport=30102 src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 [ASSURED] [DESTROY] sctp 132 src=100.64.0.5 dst=10.131.0.27 sport=47541 dport=30102 src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 [ASSURED] zone=27 [NEW] sctp 132 30 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 [UNREPLIED] src=172.30.206.3 dst=172.31.249.96 sport=30102 dport=47541 [NEW] sctp 132 30 src=172.31.249.96 dst=172.30.206.3 sport=47541 dport=30102 [UNREPLIED] src=172.30.206.3 dst=169.254.169.2 sport=30102 dport=47541 zone=64001 [NEW] sctp 132 30 src=169.254.169.2 dst=172.30.206.3 sport=47541 dport=30102 [UNREPLIED] src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 zone=12 [NEW] sctp 132 30 src=169.254.169.2 dst=10.131.0.27 sport=47541 dport=30102 [UNREPLIED] src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 [NEW] sctp 132 30 src=100.64.0.5 dst=10.131.0.27 sport=47541 dport=30102 [UNREPLIED] src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 zone=27 [UPDATE] sctp 132 210 NONE src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 src=172.30.206.3 dst=172.31.249.96 sport=30102 dport=47541 As we can see ^ we clean up the entries properly and new ones are auto-populated. sh-4.4# conntrack -L -p sctp sctp 132 207 NONE src=169.254.169.2 dst=172.30.206.3 sport=47541 dport=30102 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=12 use=2 sctp 132 207 NONE src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 src=172.30.206.3 dst=172.31.249.96 sport=30102 dport=47541 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 sctp 132 207 NONE src=169.254.169.2 dst=10.131.0.27 sport=47541 dport=30102 src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2 sctp 132 207 NONE src=172.31.249.96 dst=172.30.206.3 sport=47541 dport=30102 src=172.30.206.3 dst=169.254.169.2 sport=30102 dport=47541 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=1 sctp 132 207 NONE src=100.64.0.5 dst=10.131.0.27 sport=47541 dport=30102 src=10.131.0.27 dst=100.64.0.5 sport=30102 dport=47541 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=27 use=1 - Description for the changelog cleanup conntrack entries for clusterIP Debug logs capturing conntrack deletes (done for testing purposes): I0223 20:48:13.136499 431210 gateway_shared_intf.go:575] SURYA 172.30.151.15, 30102, SCTP I0223 20:48:13.136519 431210 helper_linux.go:154] SURYA 172.30.151.15, 30102, SCTP I0223 20:48:13.136531 431210 net_linux.go:411] SURYA 172.30.151.15, 30102, SCTP, &{map[4:0xc000abca20] map[6:30102] 132} I0223 20:48:13.136545 431210 conntrack_linux.go:93] SURYA &{map[4:0xc000abca20] map[6:30102] 132} I0223 20:48:13.141386 431210 conntrack_linux.go:102] SURYA 132 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.141397 431210 conntrack_linux.go:454] SURYA 132 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.141405 431210 conntrack_linux.go:472] SURYA match=false I0223 20:48:13.142452 431210 conntrack_linux.go:102] SURYA 132 src=172.31.249.96 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.142465 431210 conntrack_linux.go:454] SURYA 132 src=172.31.249.96 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.142473 431210 conntrack_linux.go:469] SURYA 132 src=172.31.249.96 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.142482 431210 conntrack_linux.go:472] SURYA match=true I0223 20:48:13.142487 431210 conntrack_linux.go:104] SURYA 132 src=172.31.249.96 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.146873 431210 conntrack_linux.go:102] SURYA 132 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.146880 431210 conntrack_linux.go:454] SURYA 132 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.146886 431210 conntrack_linux.go:469] SURYA 132 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.146891 431210 conntrack_linux.go:472] SURYA match=true I0223 20:48:13.146894 431210 conntrack_linux.go:104] SURYA 132 src=172.31.249.96 dst=172.31.249.39 sport=47541 dport=30102 packets=0 bytes=0 src=172.30.151.15 dst=172.31.249.96 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.207980 431210 conntrack_linux.go:102] SURYA 132 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.207986 431210 conntrack_linux.go:454] SURYA 132 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.207991 431210 conntrack_linux.go:469] SURYA 132 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.207996 431210 conntrack_linux.go:472] SURYA match=true I0223 20:48:13.207998 431210 conntrack_linux.go:104] SURYA 132 src=169.254.169.2 dst=172.30.151.15 sport=47541 dport=30102 packets=0 bytes=0 src=10.131.0.27 dst=169.254.169.2 sport=30102 dport=47541 packets=0 bytes=0 mark=0 I0223 20:48:13.224066 431210 conntrack_linux.go:112] SURYA deleted 3 entries Way forward: https://github.com/kubernetes/kubernetes/issues/108523#issuecomment-1074044415 https://github.com/ovn-org/ovn-kubernetes/pull/2829 will be the fix, we will backport it to 4.8.z Way forward: https://github.com/kubernetes/kubernetes/issues/108523#issuecomment-1074044415 https://github.com/ovn-org/ovn-kubernetes/pull/2829 will be the fix, we will backport it to 4.8.z but note that if upstream takes a different approach we will discard our current temp fix and go with whatever upstream kube proxy does. PR merged downstream, moving to MODIFIED. Testing passed in 4.11.0-0.nightly-2022-04-26-181148 by following the verifying steps from https://github.com/ovn-org/ovn-kubernetes/pull/2829 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |