Bug 1836305

Summary: [telco] ovsdb-server for NBDB pegged at 100% when running these commands
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: openvswitch2.13Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Zhiqiang Fang <zfang>
Severity: urgent Docs Contact:
Priority: urgent    
Version: RHEL 8.0CC: ctrautma, dcbw, dceara, jhsiao, jishi, kfida, qding, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard: Telco
Fixed In Version: openvswitch2.13-2.13.0-28.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1828639 Environment:
Last Closed: 2020-07-15 12:58:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1828639    
Bug Blocks: 1828637, 1837257    

Comment 1 OvS team 2020-05-15 21:03:35 UTC
* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-28
- raft: Disable RAFT jsonrpc inactivity probe. (#1836308)
  [3d9b529afb098531190d57d6f35d1622bb4093cd]
* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-27
- raft: Fix leak of the incomplete command. (#1836307)
  [5c38ccd52fb3925e82eda20f1897ec02abb390d9]
* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-26
- raft: Fix the problem of stuck in candidate role forever. (#1836305)
  [9c76350e271546eedfeb18720975e35b4e36e1f1]
* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-25
- raft: Fix next_index in install_snapshot reply handling. (#1836305)
  [cc3d02699203e2fe9d9fd384d09e268ba614828d]
* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-24
- raft: Avoid busy loop during leader election. (#1836305)
  [053b78c8d60ffb4d212fd7894f91be52027f291f]

* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-23
- raft: Fix raft_is_connected() when there is no leader yet. (#1836305)
  [e732012d7be335650398ff03c2431c64b2c4aaba]

* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-22
- ovsdb-server: Don't disconnect clients after raft install_snapshot. (#1836305)
  [8ff30dfee6cb075e36ed38b77695ff03321ce12b]

* Fri May 15 2020 Dumitru Ceara <dceara> - 2.13.0-21
- raft-rpc: Fix message format. (#1836305)
  [914d885061c9f7e7e6e5f921065301e08837e122]

Comment 4 Jianlin Shi 2020-06-15 08:39:28 UTC
thanks Dumitru for reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=1828639#c8.
reproduced on openvswitch2.13.0-18:
[root@dell-per740-12 ~]# rpm -qa | grep -E "openvswitch|ovn"
ovn2.11-2.11.1-47.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
ovn2.11-central-2.11.1-47.el8fdp.x86_64
ovn2.11-host-2.11.1-47.el8fdp.x86_64
openvswitch2.13-2.13.0-18.el8fdp.x86_64

[root@dell-per740-12 ~]# grep "wakeup due to" /var/log/openvswitch/ovsdb-server-nb.log                
2020-06-15T08:35:38.923Z|00084|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00085|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00086|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00087|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00088|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00089|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00090|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00091|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00092|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:38.923Z|00093|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:44.923Z|00101|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:50.923Z|00103|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)                                                                                         
2020-06-15T08:35:56.923Z|00111|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164 (99% CPU usage)

Verified on openvswitch2.13.0-38:

[root@dell-per740-12 ~]# rpm -qa | grep -E "openvswitch|ovn"                                          
ovn2.11-2.11.1-47.el8fdp.x86_64                                                                       
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch                                                 
ovn2.11-central-2.11.1-47.el8fdp.x86_64                                                               
openvswitch2.13-2.13.0-38.el8fdp.x86_64                                                               
ovn2.11-host-2.11.1-47.el8fdp.x86_64

[root@dell-per740-12 ~]# grep "wakeup due to" /var/log/openvswitch/ovsdb-server-nb.log

<==== no wakeup error

Comment 6 errata-xmlrpc 2020-07-15 12:58:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2948