Issue has been seen in oVirt / RHV deployment and reported in bug #1570384. I've been told that openvswitch v 2.10 should already have a fix for this, please rebase on the new version. I would suggest an async release for this.
@Sandro, you say this is fixed in 2.10 (OVS master, I guess). Do you know which commit fixes this? The closest thing I've found is https://patchwork.ozlabs.org/project/openvswitch/list/?series=40928 which is not yet merged.
Hi Sandro, Lorenzo Bianconi gave me instructions on how to reproduce this outside of ovirt. What I can say right now is that the suspected patches to fix this (the ones I referenced earlier) appear NOT to fix this issue. I am looking into it and will let you know when I have a fix.
Thanks Mark, we'll stay tuned.
I believe I've found the cause of the problem. It's from a commit introduced in December. The SSL table in the OVN northbound database has a constraint on it that it can have at most 1 row in it. A check was added in December that makes it so that if we are attempting to insert a row into the table when there is already a row present, then the insert will fail the verification step. This was intended to prevent race conditions where multiple clients might attempt to insert at the same time. The problem is that when running 'ovn-nbctl set-ssl', this check causes the operation to fail. The set-ssl operation creates a transaction that is supposed to delete the current row in the SSL table and then insert a new one. The problem is that the check added in December is unaware that the delete is part of the transaction. Therefore, it fails the transaction because it thinks we are inserting into a table that already has its maximum amount of data in it. There are essentially two issues: 1) The transaction should succeed instead of failing. 2) Even if the transaction fails, it should not cause a hang.
I've submitted a patch upstream: https://patchwork.ozlabs.org/patch/915611/ I am moving this issue to POST.
Can you backport this bug to OVS 2.9? As OVS now in support in RHV, we need this fix urgently.
I have sent a message to the upstream maintainer to please backport this to version 2.9 of OVS.
Ben has backported the change to the OVS 2.9 branch. I'm setting the state of the issue to MODIFIED.
The openvwitch component is delivered through the fast datapath channel, it is not documented in release notes.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2432