1662396 – [RHOSP14] [OVN] No event generated in OVN Southbound DB when any ovn-controller has restarted in transport node.

Bug 1662396 - [RHOSP14] [OVN] No event generated in OVN Southbound DB when any ovn-controller has restarted in transport node.

Summary: [RHOSP14] [OVN] No event generated in OVN Southbound DB when any ovn-controll...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-networking-ovn
Sub Component:
Version:	14.0 (Rocky)
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Assaf Muller
QA Contact:	Eran Kuris
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-28 07:11 UTC by Pradipta Kumar Sahoo
Modified:	2019-09-09 15:24 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-08 23:10:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Pradipta Kumar Sahoo 2018-12-28 07:11:36 UTC

Description of problem:
Referring to below OVN architectural model, OVN-Southbound DB which is present in Controller is responsible to manage the transport node via ovn-controller which has resided in all compute nodes.

                                         CMS
                                          |
                                          |
                              +-----------|-----------+ \
                              |           |           |   \
                              |     OVN/CMS Plugin    |	   \
                              |           |           |     \
                              |           |           |      \
                              |   OVN Northbound DB   |       \
                              |           |           |       --->> Overcloud Controller
                              |           |           |	     /
                              |       ovn-northd      |     /
                              |           |           |    /
                              +-----------|-----------+	  /
                                          |		 /
                                          |		/
                                +-------------------+ /
                                | OVN Southbound DB |/
                                +-------------------+
                                          |
                                          |
                       +------------------+------------------+
                       |                  |                  | ------>> Overcloud Compute
         HV 1          |                  |    HV n          |
       +---------------|---------------+  .  +---------------|---------------+
       |               |               |  .  |               |               |
       |        ovn-controller         |  .  |        ovn-controller         |
       |         |          |          |  .  |         |          |          |
       |         |          |          |     |         |          |          |
       |  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
       |                               |     |                               |
       +-------------------------------+     +-------------------------------+

Ideally, for any downtime in ovn-controller(in Compute), there should be a mechanism in ovn control-plane which can capture the health status of south-bound devices (ovn-controller, ovs-vswitchd , ovsdb-server).
In RHOSP14 testing, when we restart the ovn-controller container (in Compute), there is no event has captured neither in OVN logs in Controller node.

I am not sure if this is a flaw in the OVN architecture model or we are hitting any known bug/RFE. Please guide us for a better understanding of this monitor mechanism in OVN.


Version-Release number of selected component (if applicable):
Red Hat OpenStack 14


Steps to Reproduce:

1. Monitor all the OVN logs reside in overcloud controller node.
   # tailf /var/log/containers/openvswitch/ovn-northd.log /var/log/containers/openvswitch/ovsdb-server-nb.log /var/log/containers/openvswitch/ovsdb-server-sb.log

2. In Compute node. we restarted below docker service and ovs systemd service. We noticed the logs has captured in only ovn-controller.log resides in compute nodes
	@compute-0 ~]# systemctl restart ovsdb-server.service ovs-vswitchd.service openvswitch.service | sleep 10 |docker restart ovn_controller | sleep 10 | docker restart ovn_metadata_agent
	
	@compute-0 ~]# tailf /var/log/containers/openvswitch/ovn-controller.log
	2018-12-28T06:59:35.900Z|00040|fatal_signal|WARN|terminating with signal 15 (Terminated)
	2018-12-28T06:59:36.263Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log
	2018-12-28T06:59:36.264Z|00002|reconnect|INFO|unix:/run/openvswitch/db.sock: connecting...
	2018-12-28T06:59:36.264Z|00003|reconnect|INFO|unix:/run/openvswitch/db.sock: connected
	2018-12-28T06:59:36.267Z|00004|reconnect|INFO|tcp:172.17.1.17:6642: connecting...
	2018-12-28T06:59:37.269Z|00005|reconnect|INFO|tcp:172.17.1.17:6642: connection attempt timed out
	2018-12-28T06:59:38.270Z|00006|reconnect|INFO|tcp:172.17.1.17:6642: connecting...
	2018-12-28T06:59:38.271Z|00007|reconnect|INFO|tcp:172.17.1.17:6642: connected
	2018-12-28T06:59:38.277Z|00008|jsonrpc|WARN|unix:/run/openvswitch/db.sock: send error: Broken pipe
	2018-12-28T06:59:38.277Z|00009|reconnect|WARN|unix:/run/openvswitch/db.sock: connection dropped (Broken pipe)
	2018-12-28T06:59:38.282Z|00010|dpif_netlink|INFO|The kernel module does not support meters.
	2018-12-28T06:59:38.284Z|00011|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
	2018-12-28T06:59:38.284Z|00012|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
	2018-12-28T06:59:38.284Z|00013|pinctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
	2018-12-28T06:59:38.284Z|00014|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
	2018-12-28T06:59:38.287Z|00015|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
	2018-12-28T06:59:38.288Z|00016|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
	2018-12-28T06:59:39.277Z|00017|reconnect|INFO|unix:/run/openvswitch/db.sock: connecting...
	2018-12-28T06:59:39.277Z|00018|reconnect|INFO|unix:/run/openvswitch/db.sock: connected

Expected results:
So, the concern is, in a larger scale environment how effective way OVN Northbound/Southbound DB or ovn-northd (in the controller) can monitor the health of ovn-controller which running in compute nodes.

Comment 1 Daniel Alvarez Sanchez 2019-01-08 23:10:37 UTC

From OSP networking-ovn perspective, ovn-controller is yet another agent. If it's down for 60 seconds (default parameter in config settings), it'll show as down when you list the agents from the API.

From the pure OVN perspective, you can tell from logs that it reconnected to SB database but also, as you restarted ovn-controller in a controlled way, you should see the Chassis entry going away from the SB database. You can confirm through 'ovn-sbctl list Chassis'

Please, feel free to reopen the bug if you feel that this info is not enough.

Note You need to log in before you can comment on or make changes to this bug.