Bug 1542512

Summary: [Deployment][TLS] ODL requires restart for OVSDB to allow new SSL connections
Product: Red Hat OpenStack Reporter: Tim Rozet <trozet>
Component: opendaylightAssignee: Tim Rozet <trozet>
Status: CLOSED ERRATA QA Contact: Itzik Brown <itbrown>
Severity: high Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: aadam, jhershbe, knylande, mkolesni, nyechiel, shague, trozet
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: odl_deployment, odl_tls
Fixed In Version: opendaylight-8.0.0-5.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2018-06-27 13:43:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Tim Rozet 2018-02-06 13:36:07 UTC
Description of problem:
This environment is using SSL configuration for OVSDB.  OpenDaylight is already up and configured to use SSL for OVSDB in this scenario.  Each OVS node is configured with private key, cert, and CA cert.   A rest call is then issued to add the OVS cert into the ODL trust store.  After that, OVS manager is set with ssl prefix to connect to ODL.  The result is I can see connections being initiated from OVS and ODL receiving traffic, however ODL never accepts the connection, unless rebooted.  After reboot, everything works as expected.  We should not need to reboot to connect a new switch.



Version-Release number of selected component (if applicable):
Oxygen

How reproducible:
Everytime

Comment 5 Sam Hague 2018-03-15 11:36:04 UTC
Tim, when the certs are pushed to ODL are the ovsdv/ovs nodes already connected to ODL? If so, would it be possible to disconnect them first, push certs and then connect?

Comment 6 Tim Rozet 2018-03-15 17:16:59 UTC
Yeah we actually already do that:
https://github.com/openstack/puppet-neutron/blob/master/manifests/plugins/ovs/opendaylight.pp#L193

We never set manager until the cert is added.

Comment 7 Tim Rozet 2018-03-22 19:39:41 UTC
Enabling debug reveals some more information:
2018-03-20 11:07:09,693 | DEBUG | assiveConnServ-5 | OvsdbConnectionService           | 399 - org.opendaylight.ovsdb.library - 1.6.0.SNAPSHOT | handshake not done yet NEED_UNWRAP


Looking at the OVSDB code, the NEED_UNWRAP case simply issues a retry in the loop and waits:
https://github.com/opendaylight/ovsdb/blob/e6b469e18d5f72402ccb817ce1fb1469dd2a9d6c/library/impl/src/main/java/org/opendaylight/ovsdb/lib/impl/OvsdbConnectionService.java#L439

According to what I can see in the SSL Engine docs:
NEED_UNWRAP
The SSLEngine needs to receive data from the remote side before handshaking can continue.

It also indicates that in this case data may need to be unwrapped, which means:
Attempts to decode SSL/TLS network data into a plaintext application data buffer.

The logs do not show handshake level of detail, but I can see data coming from OVS so I believe there is some issue with the handshake process. However, looking at the same OVSDB code...the peer certificate is not even checked until after the handshake process:
 https://github.com/opendaylight/ovsdb/blob/e6b469e18d5f72402ccb817ce1fb1469dd2a9d6c/library/impl/src/main/java/org/opendaylight/ovsdb/lib/impl/OvsdbConnectionService.java#L426

Since we are not making to the end of the handshake, I'm doubting this issue has to do with certificates and more likely is a bug in the handshake process.

Comment 8 Tim Rozet 2018-03-22 21:34:47 UTC
Interestingly, when the bug is present, OpenFlow is still able to connect and certificate check works fine.  Looking at the OF code, the implementation for checking handshake completion is done differently by using ssl.handshakeFuture(), and listening for it to complete:

https://github.com/opendaylight/openflowplugin/blob/9da2ccf7745c4ece86f84d355c9dfdd74b22832d/openflowjava/openflow-protocol-impl/src/main/java/org/opendaylight/openflowjava/protocol/impl/core/TcpChannelInitializer.java#L99

I'm thinking if we modify the OVSDB logic to not run a loop to parse current handshake status and instead use this future it may fix the issue.

Also going to compare the configuration given to SSL Engine creation between OVSDB and OF.

Comment 9 Tim Rozet 2018-03-23 19:31:55 UTC
Turns out using handshakeFuture() does work, but it does not fix the problem.  OFP uses its own keystore classes/methods to manage the certificates, while OVSDB relies on the AAA certificate manager.  The bug is in how OVSDB is using AAA cert manager, which is whyh OFP worked and OVSDB does not.  The AAA cert manager only updates the data from the keystore when getServerContext is called to get an SSL Context:
https://github.com/opendaylight/aaa/blob/master/aaa-cert/src/main/java/org/opendaylight/aaa/cert/impl/CertificateManagerService.java#L147

  The root cause was OVSDB is only calling this on initial OVSDB manager startup, instead of on a per connection basis.

Comment 14 Itzik Brown 2018-04-26 09:10:49 UTC
Checking with:
opendaylight-8.0.0-5.el7ost

Is it enough to see using ovs-vsctl that all the OVS nodes are connected?

Comment 15 Tim Rozet 2018-04-26 14:11:32 UTC
Yep.  This one is fixed.  Post deployment all the nodes are connected and ODL did not need restart.

Comment 18 errata-xmlrpc 2018-06-27 13:43:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086