Bug 2214885
| Summary: | [16.2.2] high ovs-vswitchd CPU usage on controller (most spent in native_queued_spin_lock_slowpath) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Robin Cernin <rcernin> |
| Component: | openvswitch | Assignee: | Timothy Redaelli <tredaelli> |
| openvswitch sub component: | other | QA Contact: | qding |
| Status: | NEW --- | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | apevec, casantos, chrisbro, chrisw, ctrautma, hakhande, jappleii, ldenny, qding, ralonsoh, scohen, tredaelli |
| Version: | RHEL 8.0 | Flags: | ldenny:
needinfo?
(tredaelli) rcernin: needinfo- |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Robin Cernin
2023-06-14 03:26:05 UTC
One thing I would like to mention we discovered during the troubleshooting is this environment does not have OVN DVR enabled, so a lot of the traffic coming from the compute nodes needs to route through the controller nodes. Maybe that is the difference between the controller and compute ovs-vswitchd cpu usage but we're unsure. This upstream bug looks quite accurate [1] but we don't see any `blocked 1000 ms waiting for revalidator127 to quiesce` messages. [1] https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1827264 Hello: In the Neutron team we initially thought that this issue was related to the router HA VRRP traffic. But this environment is using OVN thus this is not the problem. Investigating a bit I found that this problem could be related to an outdated glibc library. According to the U/S bugs [1][2][3], this issue was fixed in [4], target milestone 2.29. The version installed in a OSP16.2 deployment, using RHEL8.4, is glibc-2.28-151.el8.x86_64. Regards. [1]https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1827264 [2]https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1839592 [3]https://github.com/openvswitch/ovs-issues/issues/175 [4]https://sourceware.org/bugzilla/show_bug.cgi?id=23861 Hi Rodolfo, Is the glibc version something we can get updated in RHEL? I'm glad you agree this seems to be the issue but I'm not sure how we can prove this, are we able to compile a test version of OVS with the updated version? There is a reproducer program linked here [1] I will test in my RHOSP16.2 lab, if we can get a test version of OVS I could install or compile in the lab and see if it fixes the reproducer program at least. Not sure who to set needinfo on sorry so just being cautious. Thanks! Hi Lewis: Sorry, I was expecting you to know the answer to this question. I guess that if there is a bug in a kernel library, we can update it. In any case, if you can compile and test this new glibc version, proving that fixes the issue in OVS, we can call kernel folks to push this fix. Regards. Okay cool, We will test compiling ovs with the glibc 2.29 and report back, if we can prove it resolves the issue we will have good argument for the kernel folks to update. Cheers! After looking into this some more, I've come to the conclusion that it's not really possible to test with glibc 2.9. Firstly it's my understanding that OVS is consuming libc as a dynamic library so compiling OVS won't be necessary, and updating libc on the host is not as straight forward or safe as I assumed[1]. I can't reproduce the issue in my lab and I can't recommend the customer attempt to update libc in their production environment. Are we able to get some other ideas from the OVS team? One solution would be deploying RHOSP17 which shouldn't have this issue as we've updated to RHEL9 and libc 2.34 [1] https://access.redhat.com/discussions/3244811#comment-2024011 |