Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2060688

Summary: Intermittent Packet Drop from Service-To-Service Communication
Product: OpenShift Container Platform Reporter: Michael Washer <mwasher>
Component: NetworkingAssignee: Martin Kennelly <mkennell>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DEFERRED Docs Contact:
Severity: urgent    
Priority: medium CC: achernet, aconole, anbhat, cgoncalves, ealcaniz, fpaoline, mcambria, mkennell, nusiddiq, openshift-bugs-escalate, rkhan, rravaiol, surya
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2062431 (view as bug list) Environment:
Last Closed: 2022-04-12 15:53:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2066885    
Bug Blocks: 2062431    

Description Michael Washer 2022-03-04 01:04:34 UTC
Description of problem:
When sending UDP packets between Services on the same Nodes, packet loss can be seen.

What we know about the problem:
1) When two Pods send communication using the following flows the source port of the final packet is changed:
App1 -> App2Service -> App2Pod
App2 -> App1Service -> App1Pod

2) This is due to a collision of rules in the connection tracking functionality.
3) This SourcePort change is expected and is NOT the aim of this case.
4) When 2 Pods are communicating using the above flows, packets will intermittently not arrive as their destination. 
5) Collecting PCAPs on both ends of the communication (on both ends of the veth pairs), we can see a packet leave the pod, for example:

Pod1(eth0 | PCAP) -> (veth1 | PCAP) -> OVS -> (veth2 | PCAP) -> Pod2(eth0 | PCAP)

We see packet in veth1 PCAP but not in veth2 PCAP.


Version-Release number of selected component (if applicable):
OpenShift 4.8.24 with OVN Kubernetes

How reproducible:
Reproduce the communication flows above and after a couple of minutes packets will start to drop packets.

Steps to Reproduce:
A script has been produced to assist with this packet flow:
```
# Clone repo
git clone git:MichaelWasher/sport-remapping-recreation.git
cd sport-remapping-recreation

# Setup env
./setup.sh

# Run Collection
./collect.sh

# In different terminal - Run tests and wait (can take minutes) 
./test.sh
```

If you want a progress update (only possible once the test is actually started):
```
oc rsh requester sh -c "tail -f ./requester_logs | grep -e counter -e 'Context Deadline Exceeded' -e 'Time since last packet'"
```

Actual results:
Intermittently packets are not seen in all PCAPs.

Expected results:
All packets are accounted for.

Additional info: