Hide Forgot
Description of problem: The pod is restarted and has a new pod IP. There is a long living stream of udp packets to the nodeport created for this this pod. The old conntrack table entry pointing to the old IP of the pod is never cleaned up. Version-Release number of selected component (if applicable): v3.11.69 How reproducible: Always Steps to Reproduce: 1. Create a service balancing udp with nodePort and externalIPs 2. delete the pod getting traffic 3. conntrack -L | grep podIP Actual results: udp 17 28 src=<ip outside ocp> dst=<service externalIP> sport=55212 dport=<serviceport> [UNREPLIED] src=<podIP/endpoint> dst=<tun0 interface of the node with the externalIP> sport=<serviceport> dport=55212 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=7 Expected results: The conntrack rule is cleared. Additional info: Related bug https://bugzilla.redhat.com/show_bug.cgi?id=1659204 I haven't tested it but this is the function: func ClearEntriesForNAT(execer exec.Interface, origin, dest string, protocol v1.Protocol) error { parameters := parametersWithFamily(utilnet.IsIPv6String(origin), "-D", "--orig-dst", origin, "--dst-nat", dest, "-p", protoStr(protocol)) err := Exec(execer, parameters...) if err != nil && !strings.Contains(err.Error(), NoConnectionToDelete) { // TODO: Better handling for deletion failure. When failure occur, stale udp connection may not get flushed. // These stale udp connection will keep black hole traffic. Making this a best effort operation for now, since it // is expensive to baby sit all udp connections to kubernetes services. return fmt.Errorf("error deleting conntrack entries for UDP peer {%s, %s}, error: %v", origin, dest, err) } return nil } I understand we could fix it by simply replacing: parameters := parametersWithFamily(utilnet.IsIPv6String(origin), "-D", "--orig-dst", origin, "--dst-nat", dest, for: parameters := parametersWithFamily(utilnet.IsIPv6String(origin), "-D", "--orig-dst", origin, "--reply-src", dest,
Verified in v3.11.100 code and testing passed. 1 SVC with 3 endpoints, SVC with Nodeport and externalIP configured. Testing when ep from 3 -> 2 -> 3 and from 1 -> 0 -> 1. The old conntrack table entry was deleted and the new conntrack table entry pointing to the new IP of the pod was created.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758