Bug 1944160
Summary: | CNO: nbctl daemon should log reconnection info | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ilya Maximets <i.maximets> |
Component: | Networking | Assignee: | Dan Williams <dcbw> |
Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | dcbw, rbrattai, trozet |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:56:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ilya Maximets
2021-03-29 12:40:35 UTC
(In reply to Ilya Maximets from comment #0) > Unlike other daemons, ovn-nbctl by default sets "reconnect:warn" logging > level. > This means that no usual reconnection events logged. Missing logs makes it > hard to debug reconnection related issues, i.e. raft leader changes and > transaction retries. > > To fix that, ovn-nbctl daemon should be started with '-vreconnect:file:info'. PR posted and linked above. Ilya, is this what you'd expect to see: 2021-03-29T14:12:31.752Z|00017|reconnect|INFO|ssl:10.0.0.5:9641: waiting 4 seconds before reconnect 2021-03-29T14:12:35.756Z|00018|reconnect|INFO|ssl:10.0.0.7:9641: connecting... 2021-03-29T14:12:35.762Z|00019|reconnect|INFO|ssl:10.0.0.7:9641: connected 2021-03-29T14:12:35.768Z|00020|ovsdb_idl|INFO|ssl:10.0.0.7:9641: clustered database server is disconnected from cluster; trying another server 2021-03-29T14:12:35.768Z|00021|reconnect|INFO|ssl:10.0.0.7:9641: connection attempt timed out 2021-03-29T14:12:35.768Z|00022|reconnect|INFO|ssl:10.0.0.7:9641: continuing to reconnect in the background but suppressing further logging 2021-03-29T14:12:43.781Z|00023|reconnect|INFO|ssl:10.0.0.6:9641: connected 2021-03-29T14:12:58.501Z|00001|vlog|INFO|opened log file /run/ovn/ovn-nbctl.log 2021-03-29T14:12:58.502Z|00002|reconnect|INFO|ssl:10.0.0.6:9641: connecting... 2021-03-29T14:12:58.506Z|00003|reconnect|INFO|ssl:10.0.0.6:9641: connected (In reply to Dan Williams from comment #2) > Ilya, is this what you'd expect to see: > > 2021-03-29T14:12:31.752Z|00017|reconnect|INFO|ssl:10.0.0.5:9641: waiting 4 > seconds before reconnect > 2021-03-29T14:12:35.756Z|00018|reconnect|INFO|ssl:10.0.0.7:9641: > connecting... > 2021-03-29T14:12:35.762Z|00019|reconnect|INFO|ssl:10.0.0.7:9641: connected > 2021-03-29T14:12:35.768Z|00020|ovsdb_idl|INFO|ssl:10.0.0.7:9641: clustered > database server is disconnected from cluster; trying another server > 2021-03-29T14:12:35.768Z|00021|reconnect|INFO|ssl:10.0.0.7:9641: connection > attempt timed out > 2021-03-29T14:12:35.768Z|00022|reconnect|INFO|ssl:10.0.0.7:9641: continuing > to reconnect in the background but suppressing further logging > 2021-03-29T14:12:43.781Z|00023|reconnect|INFO|ssl:10.0.0.6:9641: connected > 2021-03-29T14:12:58.501Z|00001|vlog|INFO|opened log file > /run/ovn/ovn-nbctl.log > 2021-03-29T14:12:58.502Z|00002|reconnect|INFO|ssl:10.0.0.6:9641: > connecting... > 2021-03-29T14:12:58.506Z|00003|reconnect|INFO|ssl:10.0.0.6:9641: connected Yes. It logs actual 'connected' states and 'backoff' intervals that should be enough to compare with logs from other services. Verified on 4.8.0-0.nightly-2021-04-09-222447 8:2021-04-13T02:04:32.574979811Z ++ ovn-nbctl --pidfile=/var/run/ovn/ovn-nbctl.pid --detach -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db ssl:172.31.248.47:9641,ssl:172.31.248.66:9641,ssl:172.31.248.67:9641 --log-file=/run/ovn/ovn-nbctl.log -vreconnect:file:info 29:2021-04-13T02:04:18.227946947Z 2021-04-13T02:04:18Z|00024|reconnect|INFO|ssl:172.31.248.47:9642: connection attempt failed (Connection refused) 30:2021-04-13T02:04:18.228017656Z 2021-04-13T02:04:18Z|00025|reconnect|INFO|ssl:172.31.248.47:9642: waiting 4 seconds before reconnect 31:2021-04-13T02:04:22.231229002Z 2021-04-13T02:04:22Z|00026|reconnect|INFO|ssl:172.31.248.66:9641: connecting... 32:2021-04-13T02:04:22.231324985Z 2021-04-13T02:04:22Z|00027|reconnect|INFO|ssl:172.31.248.66:9642: connecting... 33:2021-04-13T02:04:22.231803237Z 2021-04-13T02:04:22Z|00028|reconnect|INFO|ssl:172.31.248.66:9642: connection attempt failed (Connection refused) 34:2021-04-13T02:04:22.231803237Z 2021-04-13T02:04:22Z|00029|reconnect|INFO|ssl:172.31.248.66:9642: continuing to reconnect in the background but suppressing further logging 35:2021-04-13T02:04:22.235475930Z 2021-04-13T02:04:22Z|00030|reconnect|INFO|ssl:172.31.248.66:9641: connected 36:2021-04-13T02:04:22.237612078Z 2021-04-13T02:04:22Z|00031|ovsdb_idl|INFO|ssl:172.31.248.66:9641: clustered database server is disconnected from cluster; trying another server 37:2021-04-13T02:04:22.237652441Z 2021-04-13T02:04:22Z|00032|reconnect|INFO|ssl:172.31.248.66:9641: connection attempt timed out 38:2021-04-13T02:04:22.237676690Z 2021-04-13T02:04:22Z|00033|reconnect|INFO|ssl:172.31.248.66:9641: continuing to reconnect in the background but suppressing further logging 39:2021-04-13T02:04:30.242999407Z 2021-04-13T02:04:30Z|00034|reconnect|INFO|ssl:172.31.248.67:9641: connected 40:2021-04-13T02:04:30.244964985Z 2021-04-13T02:04:30Z|00035|ovsdb_idl|INFO|ssl:172.31.248.67:9641: clustered database server is disconnected from cluster; trying another server 41:2021-04-13T02:04:38.247968966Z 2021-04-13T02:04:38Z|00036|reconnect|INFO|ssl:172.31.248.47:9642: connected 42:2021-04-13T02:04:38.249320774Z 2021-04-13T02:04:38Z|00037|reconnect|INFO|ssl:172.31.248.47:9641: connected 44:2021-04-13T02:04:38.250341564Z 2021-04-13T02:04:38Z|00039|ovsdb_idl|INFO|ssl:172.31.248.47:9642: clustered database server is disconnected from cluster; trying another server 46:2021-04-13T02:04:46.261486129Z 2021-04-13T02:04:46Z|00041|reconnect|INFO|ssl:172.31.248.66:9641: connected 47:2021-04-13T02:04:46.263476362Z 2021-04-13T02:04:46Z|00042|reconnect|INFO|ssl:172.31.248.66:9642: connected 51:2021-04-13T02:04:46.265192771Z 2021-04-13T02:04:46Z|00046|ovsdb_idl|INFO|ssl:172.31.248.66:9642: clustered database server is disconnected from cluster; trying another server 52:2021-04-13T02:04:54.274893865Z 2021-04-13T02:04:54Z|00047|reconnect|INFO|ssl:172.31.248.67:9641: connected 53:2021-04-13T02:04:54.276203252Z 2021-04-13T02:04:54Z|00048|reconnect|INFO|ssl:172.31.248.67:9642: connected Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |