Bug 2108217 - SYN packets could generate high load
Summary: SYN packets could generate high load
Keywords:
Status: CLOSED DUPLICATE of bug 2100544
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Martin Sivák
QA Contact: Gowrishankar Rajaiyan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-18 15:41 UTC by Chen
Modified: 2022-08-09 16:55 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-03 06:21:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2081852 0 urgent CLOSED [Webscale] High OVS cpu usage causing performance issues 2023-09-18 04:36:34 UTC
Red Hat Bugzilla 2088335 0 unspecified CLOSED 4.6 - SYN flood on pod with macvlan interface severely impacts worker when PAO is configured 2022-07-27 11:56:45 UTC

Description Chen 2022-07-18 15:41:00 UTC
Description of problem:

SYN packets could generate high load

Version-Release number of selected component (if applicable):

4.9.25
4.10.10

How reproducible:

100%

Steps to Reproduce:
1. Create an SNO with SR-IOV and PAO
2. Create CPU partitioning PerformanceProfile
3. Run the synflood binary from a VM to an SR-IOV Pod NIC

Actual results:

The load of the SNO will rise infinitely. This causes almost all the liveness and readiness timeouts.

ksoftirqd/0 will occupy 100% CPU

     11 root     -12   0       0      0      0 R  99.7   0.0   2:41.03 ksoftirqd/0

When the synflood binary is run:

$ uptime
 15:34:50 up 23 min,  2 users,  load average: 15.40, 5.97, 4.37

After the synflood binary stops:

$ uptime
 15:39:13 up 27 min,  2 users,  load average: 0.61, 2.71, 3.39

Expected results:


Additional info:

Nokia is testing whether OCP could survive from SYN flooding. The synflood is a binary to continuously generate SYN packet to target.

Comment 2 Martin Sivák 2022-07-18 16:16:29 UTC
This could be related to the configured RPS mask for all devices. We decided that RPS mask is not needed for physical devices in https://github.com/openshift/cluster-node-tuning-operator/pull/377 and https://github.com/openshift/cluster-node-tuning-operator/pull/371

The symptoms seem to match what you are observing.

I suggest you try without RPS and see if the situation improves.

Comment 3 Chen 2022-07-19 01:33:34 UTC
Hi Martin,

Thank you so much for your quick reply!

For temporary solution (Disable RPS on physical devices), could we apply this https://bugzilla.redhat.com/show_bug.cgi?id=2093267#c2 ?

Best Regards,
Chen

Comment 4 Martin Sivák 2022-07-19 06:52:33 UTC
It was the first rought testing version that should still work however.

Comment 6 Martin Sivák 2022-07-19 08:07:53 UTC
Disabling RPS completely (including veth devices) might affect latency of guaranteed pods. Disabling RPS for only physical devices should be ok even for RAN deployments.

Comment 7 Chen 2022-07-19 08:33:50 UTC
Thank you so much Martin!

Do we have better temporary solution other than https://bugzilla.redhat.com/show_bug.cgi?id=2093267#c2 ?

Best Regards,
Chen

Comment 8 Martin Sivák 2022-07-19 08:39:31 UTC
Nope, please test with https://bugzilla.redhat.com/show_bug.cgi?id=2093267#c2

Comment 9 Chen 2022-08-03 06:21:06 UTC

*** This bug has been marked as a duplicate of bug 2100544 ***


Note You need to log in before you can comment on or make changes to this bug.