Bug 1872750 - OVN-kubernetes databases not gracefully shut down
Summary: OVN-kubernetes databases not gracefully shut down
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Tim Rozet
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-26 14:40 UTC by Casey Callendrello
Modified: 2020-09-02 23:50 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 764 None closed Bug 1872750: ovn-k: add prestop actions for nbdb and sbdb 2020-09-08 08:03:33 UTC

Description Casey Callendrello 2020-08-26 14:40:46 UTC
Description of problem:

We see more nbdb and sbdb corruption than we'd like. One possible cause of this is that we're not shutting down the databases gracefully.

So, we should be doing "ovs-appctl exit" and sending that to the databases, so they cleanly write then shut down.

Comment 5 Ross Brattain 2020-09-02 23:50:01 UTC
Verified on 4.6.0-0.nightly-2020-09-01-070508

enabled ovn-master debug logging

kind: ConfigMap
apiVersion: v1
metadata:
  name: env-overrides
  namespace: openshift-ovn-kubernetes
  annotations:
data:
# to set the node processes on a single node to verbose
# replace this with the node's name (from oc get nodes)
# To adjust master log levels, use _master
  _master: |
    OVN_KUBE_LOG_LEVEL=5
    OVN_LOG_LEVEL=dbg

tailed logs and observed JSON-RPC exit recieved


2020-09-02T23:44:09Z|39523|poll_loop|DBG|wakeup due to [POLLIN] on fd 14 (/var/run/ovn/ovnnb_db.ctl<->) at ../lib/stream-fd.c:274 (0% CPU usage)
2020-09-02T23:44:09Z|39524|jsonrpc|DBG|unix#63: received request, method="exit", params=[], id=0
2020-09-02T23:44:09Z|39525|unixctl|DBG|received request exit[], id=0
2020-09-02T23:44:09Z|39526|unixctl|DBG|replying with success, id=0: ""
2020-09-02T23:44:09Z|39527|jsonrpc|DBG|unix#63: send reply, result="", id=0
2020-09-02T23:44:09Z|39528|raft|DBG|raft_is_connected: true
2020-09-02T23:44:09Z|39529|poll_loop|DBG|wakeup due to [POLLIN][POLLHUP] on fd 37 (/var/run/ovn/ovnnb_db.ctl<->) at ../lib/stream-fd.c:157 (0% CPU usage)
2020-09-02T23:44:09Z|39530|jsonrpc|DBG|ssl:10.0.141.71:40606: send notification, method="monitor_canceled", params=[["monid","OVN_Northbound"]]
2020-09-02T23:44:09Z|39531|stream_ssl|DBG|server15-->ssl:10.0.141.71:40606 type 256 (5 bytes)
2020-09-02T23:44:09Z|39532|jsonrpc|DBG|ssl:10.0.174.36:53708: send notification, method="monitor_canceled", params=[["monid","OVN_Northbound"]]
2020-09-02T23:44:09Z|39533|stream_ssl|DBG|server17-->ssl:10.0.174.36:53708 type 256 (5 bytes)
2020-09-02T23:44:09Z|39534|jsonrpc|DBG|ssl:10.0.141.71:40658: send notification, method="monitor_canceled", params=[["monid","OVN_Northbound"]]
2020-09-02T23:44:09Z|39535|stream_ssl|DBG|server19-->ssl:10.0.141.71:40658 type 256 (5 bytes)
2020-09-02T23:44:09Z|39536|jsonrpc|DBG|ssl:10.0.174.36:53884: send notification, method="monitor_canceled", params=[["monid","OVN_Northbound"]]
2020-09-02T23:44:09Z|39537|stream_ssl|DBG|server21-->ssl:10.0.174.36:53884 type 256 (5 bytes)
2020-09-02T23:44:09Z|39538|jsonrpc|DBG|ssl:10.0.218.78:57956: send notification, method="monitor_canceled", params=[["monid","OVN_Northbound"]]
2020-09-02T23:44:09Z|39539|stream_ssl|DBG|server23-->ssl:10.0.218.78:57956 type 256 (5 bytes)
2020-09-02T23:44:09Z|39540|jsonrpc|DBG|ssl:10.0.218.78:58024: send notification, method="monitor_canceled", params=[["monid","OVN_Northbound"]]
2020-09-02T23:44:09Z|39541|stream_ssl|DBG|server26-->ssl:10.0.218.78:58024 type 256 (5 bytes)
2020-09-02T23:44:09Z|39542|reconnect|DBG|ssl:10.0.141.71:50828: entering RECONNECT
2020-09-02T23:44:09Z|39543|reconnect|DBG|ssl:10.0.174.36:36214: entering RECONNECT
2020-09-02T23:44:09Z|39545|raft|DBG|raft.c:1146 -->ecf7 become_leader "this server is shutting down": term=4
2020-09-02T23:44:09Z|39546|jsonrpc|DBG|ssl:10.0.174.36:9643: send notification, method="become_leader", params=[{"from":"aab696f6-bce5-4335-9b9b-864632eb69b5","comment":"this server is shutting down","term":4,"cluster":"7d4b6b18-4568-4277-822b-de3daed3ff84","to":"ecf76167-b626-4d57-a816-9580d62ee1c4"}]
2020-09-02T23:44:09Z|39548|stream_ssl|DBG|client14-->ssl:10.0.174.36:9643 type 256 (5 bytes)
2020-09-02T23:44:09Z|39549|stream_ssl|DBG|client14-->ssl:10.0.174.36:9643 alert: warning, close_notify (2 bytes)
2020-09-02T23:44:09Z|39550|stream_ssl|DBG|client29-->ssl:10.0.218.78:9643 type 256 (5 bytes)
2020-09-02T23:44:09Z|39551|stream_ssl|DBG|client29-->ssl:10.0.218.78:9643 alert: warning, close_notify (2 bytes)
2020-09-02T23:44:09Z|39552|stream_ssl|DBG|server11-->ssl:10.0.174.36:37066 type 256 (5 bytes)
2020-09-02T23:44:09Z|39553|stream_ssl|DBG|server11-->ssl:10.0.174.36:37066 alert: warning, close_notify (2 bytes)
2020-09-02T23:44:09Z|39554|stream_ssl|DBG|server25-->ssl:10.0.218.78:58056 type 256 (5 bytes)
2020-09-02T23:44:09Z|39555|stream_ssl|DBG|server25-->ssl:10.0.218.78:58056 alert: warning, close_notify (2 bytes)
2020-09-02T23:44:09Z|00249|poll_loop(log_fsync0)|DBG|wakeup due to [POLLIN] on fd 24 (FIFO pipe:[8464814]) at ../ovsdb/log.c:907
2020-09-02T23:44:09Z|00001|poll_loop(urcu2)|DBG|wakeup due to [POLLIN] on fd 8 (FIFO pipe:[8463177]) at ../lib/fatal-signal.c:274
2020-09-02T23:44:09Z|00479|fatal_signal(urcu2)|WARN|terminating with signal 15 (Terminated)


Note You need to log in before you can comment on or make changes to this bug.