Bug 2108557
| Summary: | [4.9] [Webscale] High OVS cpu usage causing performance issues | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | Performance Addon Operator | Assignee: | Yanir Quinn <yquinn> |
| Status: | CLOSED ERRATA | QA Contact: | Niranjan Mallapadi Raghavender <mniranja> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.9 | CC: | ajuarez, amorenoz, anbhat, atheurer, bbennett, bhershbe, browsell, cback, cfields, cgaynor, ctrautma, dacarpen, ealcaniz, eglottma, fbaudin, fdeutsch, ffernand, fleitner, gdiotte, grajaiya, i.maximets, jerward, jhsiao, mkennell, openshift-bugs-escalate, ralongi, rcernin, rkhan, shajmakh, surya, vjaypurk, vnema, yquinn |
| Target Milestone: | --- | Flags: | cback:
needinfo-
|
| Target Release: | 4.9.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-12 13:36:57 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2100544 | ||
| Bug Blocks: | 2108556 | ||
|
Comment 2
Yanir Quinn
2022-08-15 16:26:34 UTC
Verification:
Versions:
OCP: 4.9.48
PAO: 4.9.11-2
Steps:
- PP:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: manual
spec:
cpu:
isolated: "0-2"
reserved: "3"
realTimeKernel:
enabled: true
nodeSelector:
node-role.kubernetes.io/workercnf: ""
- GU pod (specify the node name for simplicity):
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
irq-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
spec:
nodeName: worker-1
containers:
- name: test
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3d7f41e3c7e242e67035f94abb8d0faf35bee1c45449ba9c2712a211670914b
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c"]
args: [ "while true; do sleep 100000; done;" ]
resources:
requests:
cpu: 2
memory: "200M"
limits:
cpu: 2
memory: "200M"
nodeSelector:
node-role.kubernetes.io/workercnf: ""
runtimeClassName: performance-manual
- Check the rps is updated only for veth devices:
[root@registry ~]# oc rsh test
sh-4.4# find /sys/devices/virtual/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008
sh-4.4# find /sys/devices/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008
one can see the same devices are found with rps_cpus set
- Check the new annotation works properly:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
annotations:
performance.openshift.io/enable-physical-dev-rps: "true"
name: manual
spec:
...
before setting the annotation, check the following on the node:
sh-4.4# chroot /host
sh-4.4# systemctl list-units -all | grep update-rps@
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
then apply the updated profile and check that now the physical devices are shown:
systemctl list-units -all | grep update-rps@
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
update-rps loaded inactive dead Sets network devices RPS mask
sh-4.4#
note that after the profile is updated, enoX devices were displayed as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.48 low-latency extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:6408 |