Bug 1679260

Summary: Conntrack rule for UDP traffic is not removed when using NodePort and externalIPs
Product: OpenShift Container Platform Reporter: Juan Luis de Sousa-Valadas <jdesousa>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: aos-bugs, bbennett, bmeng, cdc, gsapienz, jfindysz, jtanenba, mnaldini, openshift-bugs-escalate, rhowe, weliang
Version: 3.11.0   
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Luis de Sousa-Valadas 2019-02-20 18:17:32 UTC
Description of problem:

The pod is restarted and has a new pod IP. There is a long living stream of udp packets to the nodeport created for this this pod. The old conntrack table entry pointing to the old IP of the pod is never cleaned up. 


Version-Release number of selected component (if applicable):
v3.11.69

How reproducible:
Always

Steps to Reproduce:
1. Create a service balancing udp with nodePort and externalIPs
2. delete the pod getting traffic
3. conntrack -L | grep podIP

Actual results:
udp      17 28 src=<ip outside ocp> dst=<service externalIP> sport=55212 dport=<serviceport> [UNREPLIED] src=<podIP/endpoint> dst=<tun0 interface of the node with the externalIP> sport=<serviceport> dport=55212 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=7

Expected results:

The conntrack rule is cleared.

Additional info:
Related bug https://bugzilla.redhat.com/show_bug.cgi?id=1659204

I haven't tested it but this is the function:
func ClearEntriesForNAT(execer exec.Interface, origin, dest string, protocol v1.Protocol) error {
        parameters := parametersWithFamily(utilnet.IsIPv6String(origin), "-D", "--orig-dst", origin, "--dst-nat", dest,
                "-p", protoStr(protocol))
        err := Exec(execer, parameters...)
        if err != nil && !strings.Contains(err.Error(), NoConnectionToDelete) {
                // TODO: Better handling for deletion failure. When failure occur, stale udp connection may not get flushed.
                // These stale udp connection will keep black hole traffic. Making this a best effort operation for now, since it
                // is expensive to baby sit all udp connections to kubernetes services.
                return fmt.Errorf("error deleting conntrack entries for UDP peer {%s, %s}, error: %v", origin, dest, err)
        }
        return nil
}

I understand we could fix it by simply replacing:

        parameters := parametersWithFamily(utilnet.IsIPv6String(origin), "-D", "--orig-dst", origin, "--dst-nat", dest,

for:
        parameters := parametersWithFamily(utilnet.IsIPv6String(origin), "-D", "--orig-dst", origin, "--reply-src", dest,

Comment 13 Weibin Liang 2019-04-04 17:56:07 UTC
Verified in v3.11.100 code and testing passed.


1 SVC with 3 endpoints, SVC with Nodeport and externalIP configured.

Testing when ep from 3 -> 2 -> 3 and from 1 -> 0 -> 1.
The old conntrack table entry was deleted and the new conntrack table entry
pointing to the new IP of the pod was created.

Comment 16 errata-xmlrpc 2019-06-04 10:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758