Bug 1563574 - [ovs2.9] SSL connections drops are constantly logged in ovsdb-server-nb.log
Summary: [ovs2.9] SSL connections drops are constantly logged in ovsdb-server-nb.log
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch
Version: FDP 19.D
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Timothy Redaelli
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-04 08:26 UTC by Mor
Modified: 2020-01-21 06:34 UTC (History)
19 users (show)

Fixed In Version: openvswitch-2.9.0-124.el7fdn
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1780745 1780746 (view as bug list)
Environment:
Last Closed: 2020-01-21 06:33:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
ovn logs (28.34 KB, application/octet-stream)
2018-04-04 08:29 UTC, Mor
no flags Details
packet capture (57.75 KB, application/octet-stream)
2018-04-04 08:29 UTC, Mor
no flags Details
pair of pcap and logs (56.35 KB, application/x-gzip)
2018-04-04 15:35 UTC, Dominik Holler
no flags Details
Try to close the idl when the Connection is stopped (862 bytes, patch)
2018-08-15 18:22 UTC, Terry Wilson
mduarted: review-
Details | Diff
logs and patched connections.py (138.07 KB, application/x-gzip)
2018-08-19 10:44 UTC, Mor
no flags Details
Call stop() on the ovsdbapp Connection (678 bytes, patch)
2018-08-31 16:27 UTC, Terry Wilson
no flags Details | Diff
patch verification process and logs (38.49 KB, text/plain)
2019-01-11 14:29 UTC, Miguel Duarte Barroso
no flags Details
reproducer (852 bytes, text/plain)
2019-06-06 21:41 UTC, Terry Wilson
no flags Details
python-ovs fix (1.30 KB, patch)
2019-06-06 21:41 UTC, Terry Wilson
no flags Details | Diff
terminal log of testing attachment 1578131 (4.79 KB, text/plain)
2019-07-12 09:58 UTC, Dominik Holler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3775111 0 None None None 2018-12-25 22:41:44 UTC
Red Hat Product Errata RHSA-2020:0165 0 None None None 2020-01-21 06:34:09 UTC

Description Mor 2018-04-04 08:26:16 UTC
Description of problem:
ovsdb-server-nb.log is logging SSL connection drops, ovirt-provider-ovn seems to function correctly. Seems like a ovirt-provider-ovn component (python openvswitch or ovsdbapp) trying to connect wrongly to OVSDB NouthBound DB.

Version-Release number of selected component (if applicable):
python-openvswitch-2.9.0-1.el7fdp.noarch
openvswitch-ovn-common-2.9.0-1.el7fdp.x86_64
openvswitch-2.9.0-1.el7fdp.x86_64
openvswitch-ovn-central-2.9.0-1.el7fdp.x86_64
RHV 4.2.2.6-0.1.el7

How reproducible:
100%

Steps to Reproduce:
1. Keep ovirt-provider-ovn running (OVSDB NB client).

Actual results:
errors in the ovsdb-server-db.log:
2018-04-04T07:57:07.783Z|00070|reconnect|WARN|ssl:127.0.0.1:54216: connection dropped (Protocol error)
2018-04-04T08:02:08.281Z|00071|stream_ssl|WARN|SSL_read: unexpected SSL connection close
2018-04-04T08:02:08.281Z|00072|jsonrpc|WARN|ssl:127.0.0.1:54264: receive error: Protocol error

Expected results:
No errors should appear.

Additional info:

Comment 2 Mor 2018-04-04 08:29:24 UTC
Created attachment 1417128 [details]
ovn logs

Comment 3 Mor 2018-04-04 08:29:57 UTC
Created attachment 1417129 [details]
packet capture

Comment 4 Mark Michelson 2018-04-04 14:57:00 UTC
One observation from the logs is that the error happens every 5 minutes. Trying to identify operations that run every 5 minutes would be a good first step towards identifying the problem.

I'm confused about the pcap file. It only covers about a quarter of a second, and none of the port numbers in the pcap match the port numbers in the log file.

Comment 5 Dominik Holler 2018-04-04 15:35:45 UTC
Created attachment 1417332 [details]
pair of pcap and logs

Comment 6 Dominik Holler 2018-04-04 15:36:52 UTC
(In reply to Mark Michelson from comment #4)
> One observation from the logs is that the error happens every 5 minutes.
> Trying to identify operations that run every 5 minutes would be a good first
> step towards identifying the problem.
> 

This is a listing of networks triggerd by ovirt engine.

> I'm confused about the pcap file. It only covers about a quarter of a
> second, and none of the port numbers in the pcap match the port numbers in
> the log file.

Please find a matching pair in attachment 1417332 [details].

Comment 7 lorenzo bianconi 2018-04-13 17:13:34 UTC
Looking at traffic trace I guess the issue could be due to the sequence of operations performed at the connection tear-down. In particular, I think the client does not run 'shutdown' syscall but just closes the socket as soon as it has no more data to transmit. When the client tcp layer receives new data from the ovn-nb db sends back a RST segment to tear-down the connection.
What client are you using? ovsdbapp?

Comment 8 Dominik Holler 2018-04-13 17:58:36 UTC
(In reply to lorenzo bianconi from comment #7)
> What client are you using? ovsdbapp?

Yes, we are using ovsdbapp like in
https://github.com/oVirt/ovirt-provider-ovn/blob/master/provider/ovndb/ovn_north.py#L98

Comment 11 Terry Wilson 2018-08-15 18:22:20 UTC
Created attachment 1476242 [details]
Try to close the idl when the Connection is stopped

Can you test out this patch to see if it fixes the issue for you? I don't really know how RHEV is using ovsdbapp, and the 5-minute interval is suspicious and I wouldn't expect that to come from ovsdbapp itself.

If whatever is handling the ovsdbapp Connection is exiting w/o calling Connection.stop(), I'm not sure if this patch will help, so you might also ensure that Connection.stop() is called in addition to applying the patch.

Comment 12 Mor 2018-08-19 09:24:12 UTC
Sorry Terry - I currently not able to verify your patch because I don't have it installed in my environment, and I also moved to a different team along the way.

To the QE that takes it: the verification should be easily, install RHV 4.2 with ovirt-provider-ovn (engine-setup asks for it), keep it running for a while, watch the ovsdb-server-nb.log for errors as per the bug description.

Dominik, do you think this error is triggered by ovirt-provider-ovn? or is it enough to give it idle time to run and watch the log for errors?

Comment 13 Mor 2018-08-19 10:43:24 UTC
(In reply to Terry Wilson from comment #11)
> Created attachment 1476242 [details]
> Try to close the idl when the Connection is stopped
> 
> Can you test out this patch to see if it fixes the issue for you? I don't
> really know how RHEV is using ovsdbapp, and the 5-minute interval is
> suspicious and I wouldn't expect that to come from ovsdbapp itself.
> 
> If whatever is handling the ovsdbapp Connection is exiting w/o calling
> Connection.stop(), I'm not sure if this patch will help, so you might also
> ensure that Connection.stop() is called in addition to applying the patch.

OK so network team gave me environment to check this. The issue is still reproducible on RHV 4.2.6.3_SNAPSHOT-94. I applied the patch on connection.py, but I still see the error in the logs. See attached archive.

Versions:
ovirt-provider-ovn-1.2.14-1.el7ev.noarch
openvswitch-2.9.0-55.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-5.el7fdp.noarch
openvswitch-ovn-central-2.9.0-55.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-55.el7fdp.x86_64
python-openvswitch-2.9.0-55.el7fdp.noarch
python-ovsdbapp-0.6.0-1.el7ev.noarch

Comment 14 Mor 2018-08-19 10:44:17 UTC
Created attachment 1476873 [details]
logs and patched connections.py

Comment 15 Dominik Holler 2018-08-20 08:04:25 UTC
ovirt-provider-ovn is using python2-ovsdbapp-0.6.0-1, which shuts down using 'close()' instead of 'stop()'. Let's see if we can update ovirt-provider-ovn to use a newer version vsdbapp. Is there a minimum version required for the new connection shut down behavior?

Comment 16 Terry Wilson 2018-08-31 16:17:59 UTC
Looking at the code here: https://github.com/oVirt/ovirt-provider-ovn/blob/2cfaeefb01d5ad112e98a09abfca032465affd8c/provider/neutron/neutron_api.py#L1164 it looks like it specifically calls close() on the 'ovsidl' returned by python-ovs. instead of calling stop() on the connection ovsdbapp returned OvnNbApiIdlImpl which the patch would affect.

Changing the self.ovsidl.close() in __exit__ referenced at the link above to self.idl.ovsdb_connection.stop() I think would exercise the patch.

Comment 17 Terry Wilson 2018-08-31 16:27:57 UTC
Created attachment 1480123 [details]
Call stop() on the ovsdbapp Connection

This patch to the ovn provider would be required to actually use the previous patch.

Comment 18 Roman Hodain 2018-09-04 14:12:40 UTC
(In reply to Terry Wilson from comment #17)
> Created attachment 1480123 [details]
> Call stop() on the ovsdbapp Connection
> 
> This patch to the ovn provider would be required to actually use the
> previous patch.

There is actually a couple of other places where this could happen. For example in ovsdbapp/backend/ovs_idl/idlutils.py in get_schema_helper. We use an rpc connection to get the schema. We run this code:

from ovs import stream
from ovs import jsonrpc
import ovs

ovs.stream.Stream.ssl_set_private_key_file('/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass')
ovs.stream.Stream.ssl_set_certificate_file('/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer')
ovs.stream.Stream.ssl_set_ca_cert_file('/etc/pki/ovirt-engine/ca.pem')

err, strm = stream.Stream.open_block(stream.Stream.open('ssl:127.0.0.1:6641'))
rpc = jsonrpc.Connection(strm)

req = jsonrpc.Message.create_request('get_schema', ['OVN_Northbound']) 
err, resp = rpc.transact_block(req)
rpc.close()

Comment 19 Siddhant Rao 2018-10-29 18:51:31 UTC
Hello,

I have a customer who is facing this same issue.

The customer is running on 4.2.6 and i believe they were not seeing these errors on 4.2.5

Could you let us know on why is this happening, also any workaround or timeline for this issue to get resolved?.

Thanks,
Siddhant Rao

Comment 20 Klaas Demter 2018-11-21 16:00:38 UTC
Any progress on this issue? Seeing those errors on an empty installation of RHV 4.2.6

Also I noticed when I edit the OVS Provider via the RHV Manager and click on test I always have to acknowledge the self signed ssl cert, I thought it should store the cert or maybe have a configurable CA. 
Side node: testing the connection also throws those error messages.

Comment 21 Klaas Demter 2018-11-22 07:11:18 UTC
and I can get rid of it by disabling "Automatic Synchronization". I do not use the default network provider so I guess it is safe to disable it.

Comment 22 Klaas Demter 2018-11-22 09:21:11 UTC
Support said as long as I do not use the ovirt ovn provider I can safely disable "Automatic Synchronization" for it and therefore get rid of the error every 5 minutes.

Comment 25 Miguel Duarte Barroso 2019-01-11 14:21:58 UTC
Comment on attachment 1476242 [details]
Try to close the idl when the Connection is stopped

tested this patch - including the ovirt-provider-ovn change, that enables this one to be triggered - and the problem persists.

Comment 26 Miguel Duarte Barroso 2019-01-11 14:29:34 UTC
Created attachment 1520038 [details]
patch verification process and logs

the process listed in this patch can be followed to reproduce the issue, and work in it, if desired.

Comment 27 Miguel Duarte Barroso 2019-01-11 14:43:46 UTC
The solution proposed in patches [0] and [1] didn't work.

Terry, is there any extra info you require ?

[0] - https://bugzilla.redhat.com/attachment.cgi?id=1480123
[1] - https://bugzilla.redhat.com/attachment.cgi?id=1476242

Comment 28 Dominik Holler 2019-01-15 13:37:45 UTC
clear needinfo on me, because Miguel replied.

Comment 33 Terry Wilson 2019-06-06 21:41:10 UTC
Created attachment 1578130 [details]
reproducer

Comment 34 Terry Wilson 2019-06-06 21:41:57 UTC
Created attachment 1578131 [details]
python-ovs fix

Comment 35 Terry Wilson 2019-06-06 21:44:55 UTC
This issue isn't in ovsdapp, but instead is in python-ovs--see the attached reproducer script. ovs.stream.SSLStream has no special close() handling an so just closes the socket. tcpdumps showed "close notify" from the server, but not from the client. A patch adding a close() method for SSLStream that calls shutdown and ignores any errors is attached. It fixed the issue for me. Can someone experiencing the issue verify that the patch works for them?

Comment 36 Dominik Holler 2019-07-12 09:58:10 UTC
Created attachment 1589825 [details]
terminal log of testing attachment 1578131 [details]

Comment 37 Dominik Holler 2019-07-12 10:00:06 UTC
Comment on attachment 1589825 [details]
terminal log of testing attachment 1578131 [details]

Thanks, Terry, your fix seems to work!

Comment 40 Jianlin Shi 2019-12-18 06:54:35 UTC
reproduced on python-openvswitch-2.9.0-123.el7fdp.x86_64:

[root@hp-dl380pg8-12 bz1780745]# bash -x setup.sh                                                     
+ systemctl start ovn-northd
+ ovs-pki init --force
Creating controllerca...
Creating switchca...
+ pushd /etc/openvswitch                                                                              
/etc/openvswitch ~/bz1780745                                                                          
+ ovs-pki req+sign northdb controller
northdb-req.pem Wed Dec 18 01:50:03 EST 2019                                                          
        fingerprint a0c1cb6e217047f7f5c30f415123add2099ad0b5
+ ovs-pki req+sign nbctl controller
nbctl-req.pem   Wed Dec 18 01:50:03 EST 2019                                                          
        fingerprint 3dd43257df24557e7a4e4c2c6da4d92e2b39e27d                                          
+ popd
~/bz1780745
+ chown -R openvswitch /etc/openvswitch
+ chown -R openvswitch /var/lib/openvswitch                                                           
+ ovn-nbctl set-ssl /etc/openvswitch/northdb-privkey.pem /etc/openvswitch/northdb-cert.pem /var/lib/openvswitch/pki/controllerca/cacert.pem
+ ovn-nbctl set-connection pssl:6641                                                                  
+ python ovsdb_ssl_test.py
(0, None, "[u'hello world']")
+ tail -n 10 /var/log/messages
Dec 18 01:50:02 hp-dl380pg8-12 ovsdb-server: ovs|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server-sb.log
Dec 18 01:50:02 hp-dl380pg8-12 ovsdb-server: ovs|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.9.0
Dec 18 01:50:02 hp-dl380pg8-12 ovn-sbctl: ovs|00001|sbctl|INFO|Called as ovn-sbctl init               
Dec 18 01:50:02 hp-dl380pg8-12 ovn-ctl: Starting ovn-northd [  OK  ]
Dec 18 01:50:02 hp-dl380pg8-12 systemd: Started OVN northd management daemon.                         
Dec 18 01:50:03 hp-dl380pg8-12 ovn-nbctl: ovs|00001|nbctl|INFO|Called as ovn-nbctl set-ssl /etc/openvswitch/northdb-privkey.pem /etc/openvswitch/northdb-cert.pem /var/lib/openvswitch/pki/controllerca/cacert.pem
Dec 18 01:50:03 hp-dl380pg8-12 ovn-nbctl: ovs|00001|nbctl|INFO|Called as ovn-nbctl set-connection pssl:6641
Dec 18 01:50:03 hp-dl380pg8-12 ovsdb-server: ovs|00003|stream_ssl|WARN|SSL_read: unexpected SSL connection close
Dec 18 01:50:03 hp-dl380pg8-12 ovsdb-server: ovs|00004|jsonrpc|WARN|ssl:127.0.0.1:43324: receive error: Protocol error
Dec 18 01:50:03 hp-dl380pg8-12 ovsdb-server: ovs|00005|reconnect|WARN|ssl:127.0.0.1:43324: connection dropped (Protocol error)

<==== error message

[root@hp-dl380pg8-12 bz1780745]# rpm -qa | grep openvswitch
kernel-kernel-networking-openvswitch-ovn-regression-bz1771854_replicate_old_schema-1.0-2.noarch
openvswitch-ovn-host-2.9.0-123.el7fdp.x86_64                                                          
kernel-kernel-networking-openvswitch-ovn-common-1.0-6.noarch
openvswitch-2.9.0-123.el7fdp.x86_64                                                                   
openvswitch-ovn-central-2.9.0-123.el7fdp.x86_64                                                       
python-openvswitch-2.9.0-123.el7fdp.x86_64                                                            
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
openvswitch-ovn-common-2.9.0-123.el7fdp.x86_64

Verified on python-openvswitch-2.9.0-124.el7fdp.x86_64:

[root@hp-dl380pg8-12 bz1780745]# bash -x setup.sh 
+ systemctl start ovn-northd
+ ovs-pki init --force
Creating controllerca...
Creating switchca...
+ pushd /etc/openvswitch
/etc/openvswitch ~/bz1780745
+ ovs-pki req+sign northdb controller
northdb-req.pem Wed Dec 18 01:53:46 EST 2019
        fingerprint 299b89c654e1ffd36f10459e04bb63fe5189d84f
+ ovs-pki req+sign nbctl controller
nbctl-req.pem   Wed Dec 18 01:53:47 EST 2019
        fingerprint 55cf8f65fd158748bd57626e6f11cc64620d338e
+ popd
~/bz1780745
+ chown -R openvswitch /etc/openvswitch
+ chown -R openvswitch /var/lib/openvswitch
+ ovn-nbctl set-ssl /etc/openvswitch/northdb-privkey.pem /etc/openvswitch/northdb-cert.pem /var/lib/openvswitch/pki/controllerca/cacert.pem
+ ovn-nbctl set-connection pssl:6641
+ python ovsdb_ssl_test.py
(0, None, "[u'hello world']")
+ tail -n 10 /var/log/messages
Dec 18 01:53:46 hp-dl380pg8-12 ovn-nbctl: ovs|00001|nbctl|INFO|Called as ovn-nbctl init
Dec 18 01:53:46 hp-dl380pg8-12 ovn-ctl: /var/lib/openvswitch/ovnsb_db.db does not exist ... (warning).
Dec 18 01:53:46 hp-dl380pg8-12 ovn-ctl: Creating empty database /var/lib/openvswitch/ovnsb_db.db [  OK  ]
Dec 18 01:53:46 hp-dl380pg8-12 ovsdb-server: ovs|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server-sb.log
Dec 18 01:53:46 hp-dl380pg8-12 ovsdb-server: ovs|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.9.0
Dec 18 01:53:46 hp-dl380pg8-12 ovn-sbctl: ovs|00001|sbctl|INFO|Called as ovn-sbctl init
Dec 18 01:53:46 hp-dl380pg8-12 ovn-ctl: Starting ovn-northd [  OK  ]
Dec 18 01:53:46 hp-dl380pg8-12 systemd: Started OVN northd management daemon.
Dec 18 01:53:47 hp-dl380pg8-12 ovn-nbctl: ovs|00001|nbctl|INFO|Called as ovn-nbctl set-ssl /etc/openvswitch/northdb-privkey.pem /etc/openvswitch/northdb-cert.pem /var/lib/openvswitch/pki/controllerca/cacert.pem
Dec 18 01:53:47 hp-dl380pg8-12 ovn-nbctl: ovs|00001|nbctl|INFO|Called as ovn-nbctl set-connection pssl:6641

<===== no error message

[root@hp-dl380pg8-12 bz1780745]# rpm -qa | grep openvswitch
kernel-kernel-networking-openvswitch-ovn-regression-bz1771854_replicate_old_schema-1.0-2.noarch
openvswitch-ovn-central-2.9.0-124.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-common-1.0-6.noarch
openvswitch-ovn-common-2.9.0-124.el7fdp.x86_64
openvswitch-ovn-host-2.9.0-124.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
openvswitch-2.9.0-124.el7fdp.x86_64
python-openvswitch-2.9.0-124.el7fdp.x86_64

Comment 42 errata-xmlrpc 2020-01-21 06:33:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:0165


Note You need to log in before you can comment on or make changes to this bug.