Bug 1944239
Summary: | [OVN] ovn-ctl should wait for database processes being stopped | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Ilya Maximets <i.maximets> |
Component: | OVN | Assignee: | OVN Team <ovnteam> |
Status: | CLOSED ERRATA | QA Contact: | Ehsan Elahi <eelahi> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | FDP 21.C | CC: | ctrautma, dcbw, mmichels, ralongi |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-21 14:44:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ilya Maximets
2021-03-29 15:08:56 UTC
Related OCP issue: https://bugzilla.redhat.com/show_bug.cgi?id=1944264 Tested on ovn2.13-20.12.0-97.el8fdp [root@dell-per740-30 ovn]# pgrep -f OVN_Northbound 52570 [root@dell-per740-30 ovn]# ovn-ctl stop_nb_ovsdb && kill -15 52570 [root@dell-per740-30 ovn]# cat ovsdb-server-nb.log 2021-06-16T19:00:13.290Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-16T19:00:13.320Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-16T19:00:23.332Z|00003|memory|INFO|6048 kB peak resident set size after 10.0 seconds 2021-06-16T19:00:23.332Z|00004|memory|INFO|cells:43 monitors:2 sessions:1 2021-06-16T19:06:29.110Z|00002|daemon_unix(monitor)|INFO|pid 39527 died, exit status 0, exiting =======> kill command executed without error Tested on ovn2.13-20.12.0-135.el8fdp. [root@dell-per740-33 ~]# pgrep -f OVN_Northbound 52676 [root@dell-per740-33 ~]# ovn-ctl stop_nb_ovsdb && kill -15 52676 Exiting ovnnb_db (52676) [ OK ] -bash: kill: (52676) - No such process [root@dell-per740-33 ~]# cd /var/log/ovn [root@dell-per740-33 ovn]# cat ovsdb-server-nb.log 2021-06-17T19:09:51.829Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-17T19:09:51.839Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-17T19:10:01.851Z|00005|memory|INFO|6224 kB peak resident set size after 10.0 seconds 2021-06-17T19:10:01.851Z|00006|memory|INFO|cells:34 monitors:0 2021-06-17T19:10:31.035Z|00002|daemon_unix(monitor)|INFO|pid 52676 died, exit status 0, exiting ==========> kill ended up in error strace the OVN_Northbound process in both the versions showed similar results. After stop_nb_ovsdb, in both the versions, the ports remained opened and I can ping on these ports to netns. However ovn-nbctl does not show any valid databases. Is this what is expected? (In reply to Ehsan Elahi from comment #7) > > strace the OVN_Northbound process in both the versions showed similar > results. OVS processes signals and the "exit" command in the main loop. I see that OVS reported exit code 0 in both cases, but 'kill' succeeded in the first case. It's probably because in the first case signal was delivered to the ovs-vswitchd while it was already too far in the processing of "exit" command, so it didn't get a chance to process SIGTERM. Though it seems to be fine to just check that kill fails, you may want to use non-maskable signal to have a more clear result, e.g. SIGKILL or SIGSEGV. This way OVS will not be able to trap it, so the exit code will reflect the signal regardless of the current state of ovs-vswitchd. > After stop_nb_ovsdb, in both the versions, the ports remained opened and I > can ping on these ports to netns. However ovn-nbctl does not show any valid > databases. Is this what is expected? Yes, this is fine. Dead Northbound database should not affect the dataplane, so ports and traffic should still work. Tried different types of signals. Reproduced on: # rpm -qa | grep ovn ovn2.13-20.12.0-97.el8fdp.x86_64 ovn2.13-host-20.12.0-97.el8fdp.x86_64 ovn2.13-central-20.12.0-97.el8fdp.x86_64 # export PATH=$PATH:/usr/share/ovn/scripts # pgrep -f OVN_Northbound 39579 # ovn-ctl stop_nb_ovsdb && kill -9 39579 # cat ovsdb-server-nb.log 2021-06-21T11:02:51.928Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T11:02:51.949Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T11:03:01.962Z|00003|memory|INFO|6052 kB peak resident set size after 10.0 seconds 2021-06-21T11:03:01.962Z|00004|memory|INFO|cells:43 monitors:2 sessions:1 2021-06-21T11:06:03.922Z|00002|daemon_unix(monitor)|INFO|pid 39579 died, killed (Killed), exiting <=========== db server killed through the signal and the signal details mentioned in the log # ovn-ctl start_nb_ovsdb /etc/ovn/ovnnb_db.db does not exist ... (warning). Creating empty database /etc/ovn/ovnnb_db.db [ OK ] Starting ovsdb-nb [ OK ] # pgrep -f OVN_Northbound 40173 # ovn-ctl stop_nb_ovsdb && kill -11 40173 # cat ovsdb-server-nb.log 2021-06-21T11:02:51.928Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T11:02:51.949Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T11:03:01.962Z|00003|memory|INFO|6052 kB peak resident set size after 10.0 seconds 2021-06-21T11:03:01.962Z|00004|memory|INFO|cells:43 monitors:2 sessions:1 2021-06-21T11:06:03.922Z|00002|daemon_unix(monitor)|INFO|pid 39579 died, killed (Killed), exiting 2021-06-21T11:12:45.186Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T11:12:45.196Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T11:12:55.208Z|00003|memory|INFO|6024 kB peak resident set size after 10.0 seconds 2021-06-21T11:12:55.208Z|00004|memory|INFO|cells:34 monitors:0 2021-06-21T11:13:25.470Z|00002|backtrace(monitor)|WARN|Backtrace using libunwind not supported. 2021-06-21T11:13:25.470Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid 40173 died, killed (Segmentation fault), core dumped, restarting 2021-06-21T11:13:25.474Z|00004|ovsdb_server(ovsdb-server)|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T11:13:25.474Z|00005|memory(ovsdb-server)|INFO|4736 kB peak resident set size after 40.3 seconds 2021-06-21T11:13:25.474Z|00006|memory(ovsdb-server)|INFO|cells:14 monitors:0 <============ db server killed through the signal and the signal name can be seen in the log above Verified on: # rpm -qa | grep ovn ovn2.13-20.12.0-135.el8fdp.x86_64 ovn2.13-host-20.12.0-135.el8fdp.x86_64 ovn2.13-central-20.12.0-135.el8fdp.x86_64 # rpm -qa | grep ovn ovn2.13-central-20.12.0-135.el7fdp.x86_64 ovn2.13-20.12.0-135.el7fdp.x86_64 ovn2.13-host-20.12.0-135.el7fdp.x86_64 # rpm -qa | grep ovn ovn-2021-21.03.0-40.el8fdp.x86_64 ovn-2021-host-21.03.0-40.el8fdp.x86_64 ovn-2021-central-21.03.0-40.el8fdp.x86_64 ## Below results are from verification on 135.el8fdp. Similar results on the other two releases. # pgrep -f OVN_Northbound 97671 # ovn-ctl stop_nb_ovsdb && kill -9 97671 Exiting ovnnb_db (97671) [ OK ] -bash: kill: (97671) - No such process # cat ovsdb-server-nb.log 2021-06-21T10:37:41.529Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T10:37:41.536Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T10:37:41.546Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer 2021-06-21T10:37:41.546Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer) 2021-06-21T10:37:51.547Z|00005|memory|INFO|6224 kB peak resident set size after 10.0 seconds 2021-06-21T10:37:51.547Z|00006|memory|INFO|cells:34 monitors:0 2021-06-21T10:53:05.863Z|00002|daemon_unix(monitor)|INFO|pid 97671 died, exit status 0, exiting # ovn-ctl start_nb_ovsdb Starting ovsdb-nb [ OK ] # pgrep -f OVN_Northbound 97856 # ovn-ctl stop_nb_ovsdb && kill -11 97856 Exiting ovnnb_db (97856) [ OK ] -bash: kill: (97856) - No such process # cat ovsdb-server-nb.log 2021-06-21T10:37:41.529Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T10:37:41.536Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T10:37:41.546Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer 2021-06-21T10:37:41.546Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer) 2021-06-21T10:37:51.547Z|00005|memory|INFO|6224 kB peak resident set size after 10.0 seconds 2021-06-21T10:37:51.547Z|00006|memory|INFO|cells:34 monitors:0 2021-06-21T10:53:05.863Z|00002|daemon_unix(monitor)|INFO|pid 97671 died, exit status 0, exiting 2021-06-21T10:54:20.602Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T10:54:20.611Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T10:54:20.622Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer 2021-06-21T10:54:20.622Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer) 2021-06-21T10:54:30.621Z|00005|memory|INFO|6024 kB peak resident set size after 10.0 seconds 2021-06-21T10:54:30.621Z|00006|memory|INFO|cells:34 monitors:0 2021-06-21T10:55:24.073Z|00002|daemon_unix(monitor)|INFO|pid 97856 died, exit status 0, exiting # ovn-ctl start_nb_ovsdb Starting ovsdb-nb [ OK ] # pgrep -f OVN_Northbound 98371 # ovn-ctl stop_nb_ovsdb && kill -15 98371 Exiting ovnnb_db (98371) [ OK ] -bash: kill: (98371) - No such process # cat ovsdb-server-nb.log 2021-06-21T10:37:41.529Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T10:37:41.536Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T10:37:41.546Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer 2021-06-21T10:37:41.546Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer) 2021-06-21T10:37:51.547Z|00005|memory|INFO|6224 kB peak resident set size after 10.0 seconds 2021-06-21T10:37:51.547Z|00006|memory|INFO|cells:34 monitors:0 2021-06-21T10:53:05.863Z|00002|daemon_unix(monitor)|INFO|pid 97671 died, exit status 0, exiting 2021-06-21T10:54:20.602Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T10:54:20.611Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T10:54:20.622Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer 2021-06-21T10:54:20.622Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer) 2021-06-21T10:54:30.621Z|00005|memory|INFO|6024 kB peak resident set size after 10.0 seconds 2021-06-21T10:54:30.621Z|00006|memory|INFO|cells:34 monitors:0 2021-06-21T10:55:24.073Z|00002|daemon_unix(monitor)|INFO|pid 97856 died, exit status 0, exiting 2021-06-21T11:50:49.421Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2021-06-21T11:50:49.432Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.4 2021-06-21T11:50:49.442Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer 2021-06-21T11:50:49.442Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer) 2021-06-21T11:50:59.443Z|00005|memory|INFO|5920 kB peak resident set size after 10.0 seconds 2021-06-21T11:50:59.443Z|00006|memory|INFO|cells:34 monitors:0 2021-06-21T11:51:26.157Z|00002|daemon_unix(monitor)|INFO|pid 98371 died, exit status 0, exiting <============= For every type of kill signal, the db stopped normally as expected with exist status 0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2507 |