Unlike other daemons, ovn-nbctl by default sets "reconnect:warn" logging level. This means that no usual reconnection events logged. Missing logs makes it hard to debug reconnection related issues, i.e. raft leader changes and transaction retries. To fix that, ovn-nbctl daemon should be started with '-vreconnect:file:info'.
(In reply to Ilya Maximets from comment #0) > Unlike other daemons, ovn-nbctl by default sets "reconnect:warn" logging > level. > This means that no usual reconnection events logged. Missing logs makes it > hard to debug reconnection related issues, i.e. raft leader changes and > transaction retries. > > To fix that, ovn-nbctl daemon should be started with '-vreconnect:file:info'. PR posted and linked above.
Ilya, is this what you'd expect to see: 2021-03-29T14:12:31.752Z|00017|reconnect|INFO|ssl:10.0.0.5:9641: waiting 4 seconds before reconnect 2021-03-29T14:12:35.756Z|00018|reconnect|INFO|ssl:10.0.0.7:9641: connecting... 2021-03-29T14:12:35.762Z|00019|reconnect|INFO|ssl:10.0.0.7:9641: connected 2021-03-29T14:12:35.768Z|00020|ovsdb_idl|INFO|ssl:10.0.0.7:9641: clustered database server is disconnected from cluster; trying another server 2021-03-29T14:12:35.768Z|00021|reconnect|INFO|ssl:10.0.0.7:9641: connection attempt timed out 2021-03-29T14:12:35.768Z|00022|reconnect|INFO|ssl:10.0.0.7:9641: continuing to reconnect in the background but suppressing further logging 2021-03-29T14:12:43.781Z|00023|reconnect|INFO|ssl:10.0.0.6:9641: connected 2021-03-29T14:12:58.501Z|00001|vlog|INFO|opened log file /run/ovn/ovn-nbctl.log 2021-03-29T14:12:58.502Z|00002|reconnect|INFO|ssl:10.0.0.6:9641: connecting... 2021-03-29T14:12:58.506Z|00003|reconnect|INFO|ssl:10.0.0.6:9641: connected
(In reply to Dan Williams from comment #2) > Ilya, is this what you'd expect to see: > > 2021-03-29T14:12:31.752Z|00017|reconnect|INFO|ssl:10.0.0.5:9641: waiting 4 > seconds before reconnect > 2021-03-29T14:12:35.756Z|00018|reconnect|INFO|ssl:10.0.0.7:9641: > connecting... > 2021-03-29T14:12:35.762Z|00019|reconnect|INFO|ssl:10.0.0.7:9641: connected > 2021-03-29T14:12:35.768Z|00020|ovsdb_idl|INFO|ssl:10.0.0.7:9641: clustered > database server is disconnected from cluster; trying another server > 2021-03-29T14:12:35.768Z|00021|reconnect|INFO|ssl:10.0.0.7:9641: connection > attempt timed out > 2021-03-29T14:12:35.768Z|00022|reconnect|INFO|ssl:10.0.0.7:9641: continuing > to reconnect in the background but suppressing further logging > 2021-03-29T14:12:43.781Z|00023|reconnect|INFO|ssl:10.0.0.6:9641: connected > 2021-03-29T14:12:58.501Z|00001|vlog|INFO|opened log file > /run/ovn/ovn-nbctl.log > 2021-03-29T14:12:58.502Z|00002|reconnect|INFO|ssl:10.0.0.6:9641: > connecting... > 2021-03-29T14:12:58.506Z|00003|reconnect|INFO|ssl:10.0.0.6:9641: connected Yes. It logs actual 'connected' states and 'backoff' intervals that should be enough to compare with logs from other services.
Verified on 4.8.0-0.nightly-2021-04-09-222447 8:2021-04-13T02:04:32.574979811Z ++ ovn-nbctl --pidfile=/var/run/ovn/ovn-nbctl.pid --detach -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db ssl:172.31.248.47:9641,ssl:172.31.248.66:9641,ssl:172.31.248.67:9641 --log-file=/run/ovn/ovn-nbctl.log -vreconnect:file:info 29:2021-04-13T02:04:18.227946947Z 2021-04-13T02:04:18Z|00024|reconnect|INFO|ssl:172.31.248.47:9642: connection attempt failed (Connection refused) 30:2021-04-13T02:04:18.228017656Z 2021-04-13T02:04:18Z|00025|reconnect|INFO|ssl:172.31.248.47:9642: waiting 4 seconds before reconnect 31:2021-04-13T02:04:22.231229002Z 2021-04-13T02:04:22Z|00026|reconnect|INFO|ssl:172.31.248.66:9641: connecting... 32:2021-04-13T02:04:22.231324985Z 2021-04-13T02:04:22Z|00027|reconnect|INFO|ssl:172.31.248.66:9642: connecting... 33:2021-04-13T02:04:22.231803237Z 2021-04-13T02:04:22Z|00028|reconnect|INFO|ssl:172.31.248.66:9642: connection attempt failed (Connection refused) 34:2021-04-13T02:04:22.231803237Z 2021-04-13T02:04:22Z|00029|reconnect|INFO|ssl:172.31.248.66:9642: continuing to reconnect in the background but suppressing further logging 35:2021-04-13T02:04:22.235475930Z 2021-04-13T02:04:22Z|00030|reconnect|INFO|ssl:172.31.248.66:9641: connected 36:2021-04-13T02:04:22.237612078Z 2021-04-13T02:04:22Z|00031|ovsdb_idl|INFO|ssl:172.31.248.66:9641: clustered database server is disconnected from cluster; trying another server 37:2021-04-13T02:04:22.237652441Z 2021-04-13T02:04:22Z|00032|reconnect|INFO|ssl:172.31.248.66:9641: connection attempt timed out 38:2021-04-13T02:04:22.237676690Z 2021-04-13T02:04:22Z|00033|reconnect|INFO|ssl:172.31.248.66:9641: continuing to reconnect in the background but suppressing further logging 39:2021-04-13T02:04:30.242999407Z 2021-04-13T02:04:30Z|00034|reconnect|INFO|ssl:172.31.248.67:9641: connected 40:2021-04-13T02:04:30.244964985Z 2021-04-13T02:04:30Z|00035|ovsdb_idl|INFO|ssl:172.31.248.67:9641: clustered database server is disconnected from cluster; trying another server 41:2021-04-13T02:04:38.247968966Z 2021-04-13T02:04:38Z|00036|reconnect|INFO|ssl:172.31.248.47:9642: connected 42:2021-04-13T02:04:38.249320774Z 2021-04-13T02:04:38Z|00037|reconnect|INFO|ssl:172.31.248.47:9641: connected 44:2021-04-13T02:04:38.250341564Z 2021-04-13T02:04:38Z|00039|ovsdb_idl|INFO|ssl:172.31.248.47:9642: clustered database server is disconnected from cluster; trying another server 46:2021-04-13T02:04:46.261486129Z 2021-04-13T02:04:46Z|00041|reconnect|INFO|ssl:172.31.248.66:9641: connected 47:2021-04-13T02:04:46.263476362Z 2021-04-13T02:04:46Z|00042|reconnect|INFO|ssl:172.31.248.66:9642: connected 51:2021-04-13T02:04:46.265192771Z 2021-04-13T02:04:46Z|00046|ovsdb_idl|INFO|ssl:172.31.248.66:9642: clustered database server is disconnected from cluster; trying another server 52:2021-04-13T02:04:54.274893865Z 2021-04-13T02:04:54Z|00047|reconnect|INFO|ssl:172.31.248.67:9641: connected 53:2021-04-13T02:04:54.276203252Z 2021-04-13T02:04:54Z|00048|reconnect|INFO|ssl:172.31.248.67:9642: connected
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438