Bug 1834473
Summary: | ovnkube: set NB/SB database inactivity probes to 60 seconds | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dan Williams <dcbw> | |
Component: | Networking | Assignee: | Dan Williams <dcbw> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | unspecified | CC: | aconstan, anusaxen | |
Version: | 4.3.z | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1834474 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:37:35 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1834474 |
Description
Dan Williams
2020-05-11 19:06:24 UTC
documentation seems to suggest that inactivity_probe is in milliseconds https://github.com/ovn-org/ovn/blob/master/ovn-ic-nb.xml#L243 <column name="inactivity_probe"> Maximum number of milliseconds of idle time on connection to the client before sending an inactivity probe message. If Open vSwitch does not communicate with the client for the specified number of seconds, it will send a probe. If a response is not received for the same additional amount of time, Open vSwitch assumes the connection has been broken and attempts to reconnect. Default is implementation-specific. A value of 0 disables inactivity probes. </column> is this a documentation issue? I don't see any lock changes in the logs, so I'm not sure this is doing anything even when I change it to 60000 milliseconds. I see the tests use "--inactivity-probe=" is that also required? https://github.com/ovn-org/ovn/blob/master/tests/ovn-nbctl.at#L1720 AT_CHECK([ovn-nbctl --inactivity-probe=30000 set-connection ptcp:6641:127.0.0.1 punix:$OVS_RUNDIR/ovnnb_db.sock]) https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ic-nbctl.8.xml#L101 If tried with while ! ovn-nbctl --inactivity-probe=60000 --no-leader-only -t 5 set-connection pssl:9641 -- set connection . inactivity_probe=60000; do but didn't see the lock change in northd I just see the initial lock acquired. 2020-05-14T14:48:23Z|00050|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. verified on 4.5.0-0.nightly-2020-05-21-072118 northd-ovnkube-master-b8m7x-northd:2020-05-21T17:29:54Z|73741|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" northd-ovnkube-master-b8m7x-northd:2020-05-21T17:30:14Z|73840|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" northd-ovnkube-master-b8m7x-northd:2020-05-21T17:30:19Z|73876|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" northd-ovnkube-master-b8m7x-northd:2020-05-21T17:30:24Z|73892|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" # kill -STOP ovnkube-master-b8m7x-northd here # ~120 seconds later c2sc7 takes the lock northd-ovnkube-master-c2sc7-northd:2020-05-21T17:32:34Z|66663|jsonrpc|DBG|ssl:10.0.0.4:9642: received notification, method="locked", params=["ovn_northd"] northd-ovnkube-master-c2sc7-northd:2020-05-21T17:32:34Z|66664|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. northd-ovnkube-master-c2sc7-northd:2020-05-21T17:32:34Z|66675|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" northd-ovnkube-master-c2sc7-northd:2020-05-21T17:38:54Z|68572|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" # kill -CONT ovnkube-master-b8m7x-northd here and it goes to standby northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:12Z|74080|jsonrpc|DBG|ssl:10.0.0.3:9642: send request, method="lock", params=["ovn_northd"], id=5529 northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:12Z|74087|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby. northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:12Z|74102|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. northd-ovnkube-master-c2sc7-northd:2020-05-21T17:39:14Z|68667|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:14Z|74194|jsonrpc|DBG|ssl:10.0.0.6:9642: send request, method="lock", params=["ovn_northd"], id=5534 northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:14Z|74196|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby. northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:14Z|74205|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:18Z|74306|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="lock", params=["ovn_northd"], id=5539 northd-ovnkube-master-b8m7x-northd:2020-05-21T17:39:18Z|74313|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby. northd-ovnkube-master-c2sc7-northd:2020-05-21T17:39:20Z|68702|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" northd-ovnkube-master-c2sc7-northd:2020-05-21T17:40:54Z|69167|jsonrpc|DBG|ssl:10.0.0.4:9642: send request, method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |