Description of problem: This environment is using SSL configuration for OVSDB. OpenDaylight is already up and configured to use SSL for OVSDB in this scenario. Each OVS node is configured with private key, cert, and CA cert. A rest call is then issued to add the OVS cert into the ODL trust store. After that, OVS manager is set with ssl prefix to connect to ODL. The result is I can see connections being initiated from OVS and ODL receiving traffic, however ODL never accepts the connection, unless rebooted. After reboot, everything works as expected. We should not need to reboot to connect a new switch. Version-Release number of selected component (if applicable): Oxygen How reproducible: Everytime
Tim, when the certs are pushed to ODL are the ovsdv/ovs nodes already connected to ODL? If so, would it be possible to disconnect them first, push certs and then connect?
Yeah we actually already do that: https://github.com/openstack/puppet-neutron/blob/master/manifests/plugins/ovs/opendaylight.pp#L193 We never set manager until the cert is added.
Enabling debug reveals some more information: 2018-03-20 11:07:09,693 | DEBUG | assiveConnServ-5 | OvsdbConnectionService | 399 - org.opendaylight.ovsdb.library - 1.6.0.SNAPSHOT | handshake not done yet NEED_UNWRAP Looking at the OVSDB code, the NEED_UNWRAP case simply issues a retry in the loop and waits: https://github.com/opendaylight/ovsdb/blob/e6b469e18d5f72402ccb817ce1fb1469dd2a9d6c/library/impl/src/main/java/org/opendaylight/ovsdb/lib/impl/OvsdbConnectionService.java#L439 According to what I can see in the SSL Engine docs: NEED_UNWRAP The SSLEngine needs to receive data from the remote side before handshaking can continue. It also indicates that in this case data may need to be unwrapped, which means: Attempts to decode SSL/TLS network data into a plaintext application data buffer. The logs do not show handshake level of detail, but I can see data coming from OVS so I believe there is some issue with the handshake process. However, looking at the same OVSDB code...the peer certificate is not even checked until after the handshake process: https://github.com/opendaylight/ovsdb/blob/e6b469e18d5f72402ccb817ce1fb1469dd2a9d6c/library/impl/src/main/java/org/opendaylight/ovsdb/lib/impl/OvsdbConnectionService.java#L426 Since we are not making to the end of the handshake, I'm doubting this issue has to do with certificates and more likely is a bug in the handshake process.
Interestingly, when the bug is present, OpenFlow is still able to connect and certificate check works fine. Looking at the OF code, the implementation for checking handshake completion is done differently by using ssl.handshakeFuture(), and listening for it to complete: https://github.com/opendaylight/openflowplugin/blob/9da2ccf7745c4ece86f84d355c9dfdd74b22832d/openflowjava/openflow-protocol-impl/src/main/java/org/opendaylight/openflowjava/protocol/impl/core/TcpChannelInitializer.java#L99 I'm thinking if we modify the OVSDB logic to not run a loop to parse current handshake status and instead use this future it may fix the issue. Also going to compare the configuration given to SSL Engine creation between OVSDB and OF.
Turns out using handshakeFuture() does work, but it does not fix the problem. OFP uses its own keystore classes/methods to manage the certificates, while OVSDB relies on the AAA certificate manager. The bug is in how OVSDB is using AAA cert manager, which is whyh OFP worked and OVSDB does not. The AAA cert manager only updates the data from the keystore when getServerContext is called to get an SSL Context: https://github.com/opendaylight/aaa/blob/master/aaa-cert/src/main/java/org/opendaylight/aaa/cert/impl/CertificateManagerService.java#L147 The root cause was OVSDB is only calling this on initial OVSDB manager startup, instead of on a per connection basis.
Checking with: opendaylight-8.0.0-5.el7ost Is it enough to see using ovs-vsctl that all the OVS nodes are connected?
Yep. This one is fixed. Post deployment all the nodes are connected and ODL did not need restart.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086