Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2044621

Summary: ovsdb-server: Clustered database may send zero uuid as a last transaction id
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Ilya Maximets <i.maximets>
Component: ovsdb2.16Assignee: Ilya Maximets <i.maximets>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 22.ACC: ctrautma, jhsiao, jishi, ralongi, tredaelli
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch2.16-2.16.0-45.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-30 16:28:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ilya Maximets 2022-01-24 20:18:43 UTC
The fix for the https://bugzilla.redhat.com/show_bug.cgi?id=2012949
made it possible for transaction history to be drained in some cases.
If that happens, the next monitor update will send a UUID_ZERO to the
client as the 'last-id'.  This may lead to re-downloading the whole
database on re-connection instead of fast re-sync.  This also blocks
monitor_cond_since support for relays.

Comment 2 OvS team 2022-02-01 14:22:21 UTC
* Tue Feb 01 2022 Ilya Maximets <i.maximets> - 2.16.0-45
- ovsdb: transaction: Keep one entry in the transaction history. [RH git: 7665f42d12] (#2044621)
    commit 6e13565dd32fb2cf5517f51ca06956e2052c4bba
    Author: Ilya Maximets <i.maximets>
    Date:   Sun Dec 19 15:09:38 2021 +0100
    
        ovsdb: transaction: Keep one entry in the transaction history.
    
        If a single transaction exceeds the size of the whole database (e.g.,
        a lot of rows got removed and new ones added), transaction history will
        be drained.  This leads to sending UUID_ZERO to the clients as the last
        transaction id in the next monitor update, because monitor doesn't
        know what was the actual last transaction id.  In case of a re-connect
        that will cause re-downloading of the whole database, since the
        client's last_id will be out of sync.
    
        One solution would be to store the last transaction ID separately
        from the actual transactions, but that will require a careful
        management in cases where database gets reset and the history needs
        to be cleared.  Keeping the one last transaction instead to avoid
        the problem.  That should not be a big concern in terms of memory
        consumption, because this last transaction will be removed from the
        history once the next transaction appeared.  This is also not a concern
        for a fast re-sync, because this last transaction will not be used
        for the monitor reply; it's either client already has it, so no need
        to send, or it's a history miss.
    
        The test updated to not check the number of atoms if there is only
        one transaction in the history.
    
        Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.")
        Acked-by: Mike Pattrick <mkp>
        Acked-by: Han Zhou <hzhou>
        Signed-off-by: Ilya Maximets <i.maximets>
    
    Reported-at: https://bugzilla.redhat.com/2044621
    Signed-off-by: Ilya Maximets <i.maximets>

Comment 6 Jianlin Shi 2022-03-09 08:26:27 UTC
Hi Ilya,

Could you give some suggestions on how to test the patch? thanks

Comment 7 Ilya Maximets 2022-03-10 12:45:39 UTC
(In reply to Jianlin Shi from comment #6)
> Hi Ilya,
> 
> Could you give some suggestions on how to test the patch? thanks

Steps to reproduce:

1. Start OVN with clustered databases.
2. Enable debug logs for jsonrpc module on ovn-northd.
3. Add some amount of OVN resources in different OVN commands
   (different NbDB transactions), e.g. 20 Logical Switches.
4. Remove all these resources one by one.
5. Add a few more.
6. Remove them too.
7. Look for 'update3' notifications from the database in the
   ovn-northd.log.  They should look like this:

     jsonrpc|DBG|unix:db1.sock: received notification, method="update3",
       params=[["monid","OVN_Northbound"],"3210e3a5-94cf-4c3c-a427-652746be8c81",{<data>}]

   "3210e3a5-94cf-4c3c-a427-652746be8c81" is an example of the
   last transaction id.  There should be no update3 message where
   this uuid equals "00000000-0000-0000-0000-000000000000".

The issue was introduced in openvswitch2.16-2.16.0-27.el8fdp.

Comment 8 Jianlin Shi 2022-03-11 07:33:43 UTC
(In reply to Ilya Maximets from comment #7)
> (In reply to Jianlin Shi from comment #6)
> > Hi Ilya,
> > 
> > Could you give some suggestions on how to test the patch? thanks
> 
> Steps to reproduce:
> 
> 1. Start OVN with clustered databases.
> 2. Enable debug logs for jsonrpc module on ovn-northd.
> 3. Add some amount of OVN resources in different OVN commands
>    (different NbDB transactions), e.g. 20 Logical Switches.
> 4. Remove all these resources one by one.
> 5. Add a few more.
> 6. Remove them too.
> 7. Look for 'update3' notifications from the database in the
>    ovn-northd.log.  They should look like this:
> 
>      jsonrpc|DBG|unix:db1.sock: received notification, method="update3",
>       
> params=[["monid","OVN_Northbound"],"3210e3a5-94cf-4c3c-a427-652746be8c81",
> {<data>}]
> 
>    "3210e3a5-94cf-4c3c-a427-652746be8c81" is an example of the
>    last transaction id.  There should be no update3 message where
>    this uuid equals "00000000-0000-0000-0000-000000000000".
> 
> The issue was introduced in openvswitch2.16-2.16.0-27.el8fdp.

Hi Ilya,

I tried to reproduce with your steps as follows:

1. start cluster with script on 3 machines:

ctl_cmd="/usr/share/ovn/scripts/ovn-ctl"                                                              
ip_s=1.1.178.16                                                                                       
ip_c1=1.1.178.17                                                                                      
ip_c2=1.1.178.18                                                                                      
                                                                                                      
$ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \                                      
                        --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \                       
                        --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \           
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \           
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd
ovn-appctl  -t ovn-northd vlog/set jsonrpc dbg 

2. then add switch with:

for i in {1..20}                                                                                      
do                                                                                                    
        ovn-nbctl --wait=hv ls-add ls$i                                                               
done                                                                                                  
ovn-nbctl --wait=hv sync                                                                              
ovn-nbctl show                                                                                        
                                                                                                      
for i in {1..20}                                                                                      
do                                                                                                    
        ovn-nbctl ls-del ls$i                                                                         
done                                                                                                  
                                                                                                      
for i in {30..50}                                                                                     
do                                                                                                    
        ovn-nbctl --wait=hv ls-add ls$i                                                               
done                                                                                                  
ovn-nbctl --wait=hv sync                                                                              
for i in {30..50}                                                                                     
do                                                                                                    
        ovn-nbctl  ls-del ls$i                                                                        
done 

but failed to reproduce the issue on openvswitch2.16-27. no such uuid "00000000-0000-0000-0000-000000000000" found in the log.
[root@wsfd-advnetlab16 bz2044621]# rpm -qa | grep openvswitch2.16                                     
openvswitch2.16-2.16.0-27.el8fdp.x86_64

Comment 9 Ilya Maximets 2022-03-11 13:16:39 UTC
OK.  It turned out a bit harder to reproduce with a live OVN setup.
What we're trying to do is to create a single transaction that will
be larger than the database after execution of that transaction.

The following method should work:

1. Start clustered OVN.
2. Add a single logical switch and ~10 ports in it with a single
   ovn-nbctl command.  Run ovn-nbctl with jsonrpc debug enabled
   and look for the initial monitor reply (grep 'result=\[false'
   should show it).
3. Delete the logical switch.  This is our big transaction, it will
   delete the switch record and all the ports.  Resulted database
   will be empty.
4. Repeat step 2 checking the initial monitor reply to not have
   "00000000-0000-0000-0000-000000000000" as a last id.
   Clustered database should never send all-zeroes as a last id.

Example of a bad initial monitor reply:

  jsonrpc|DBG|unix:nb1.ovsdb: received reply,
    result=[false,"00000000-0000-0000-0000-000000000000",{<data>}], id=3

Comment 10 Jianlin Shi 2022-03-11 15:26:24 UTC
tested with following steps based on comment 9:

1. Start clustered OVN.
2. Add a single logical switch and ~10 ports in it with a single
   ovn-nbctl command.  Run ovn-nbctl with jsonrpc debug enabled: 
ovn-nbctl -vjsonrpc:dbg ls-add ls1 \                                                                  
        -- lsp-add ls1 ls1p1 \                                                                        
        -- lsp-add ls1 ls1p2 \                                                                        
        -- lsp-add ls1 ls1p3 \                                                                        
        -- lsp-add ls1 ls1p4 \                                                                        
        -- lsp-add ls1 ls1p5 \                                                                        
        -- lsp-add ls1 ls1p6 \                                                                        
        -- lsp-add ls1 ls1p7 \                                                                        
        -- lsp-add ls1 ls1p8 \                                                                        
        -- lsp-add ls1 ls1p9 \                                                                        
        -- lsp-add ls1 ls1p10

reproduced on openvswitch2.16-27:

2022-03-11T15:19:22Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name","parent_name","tag_request"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3                                                                                                                                                           2022-03-11T15:19:22Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4                                                                     2022-03-11T15:19:22Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"00000000-0000-0000-0000-000000000000",{"NB_Global":{"aa969934-68e6-4896-a6e6-9cbcd1b78060":{"initial":{}}}}], id=3

Verified on openvswitch2.16-58:

[root@wsfd-advnetlab16 bz2044621]# rpm -qa | grep -E "openvswitch2.16|ovn-2021"
ovn-2021-21.12.0-30.el8fdp.x86_64
ovn-2021-host-21.12.0-30.el8fdp.x86_64
openvswitch2.16-2.16.0-58.el8fdp.x86_64
ovn-2021-central-21.12.0-30.el8fdp.x86_64

2022-03-11T15:22:25Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name","parent_name","tag_request"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3
2022-03-11T15:22:25Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4
2022-03-11T15:22:25Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"9ddd60d2-3e71-455e-8744-2306fe632511",{"NB_Global":{"dbe7eaef-e3fb-43f3-af3f-478295c5bbf7":{"initial":{}}}}], id=3

Comment 11 Jianlin Shi 2022-03-11 16:48:29 UTC
also finishe the following steps in comment 9 on openvswitch2.16-58:

[root@wsfd-advnetlab16 bz2044621]# ovn-nbctl -vjsonrpc:dbg ls-del ls1
2022-03-11T16:46:40Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3                                                                                                                                                                                       2022-03-11T16:46:40Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4                                                                     2022-03-11T16:46:40Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"ebdfe97f-5501-4caf-a67f-b095c2598933"

[root@wsfd-advnetlab16 bz2044621]# bash -x rep.sh                                                                                                                                                           + ovn-nbctl -vjsonrpc:dbg ls-add ls1 -- lsp-add ls1 ls1p1 -- lsp-add ls1 ls1p2 -- lsp-add ls1 ls1p3 -- lsp-add ls1 ls1p4 -- lsp-add ls1 ls1p5 -- lsp-add ls1 ls1p6 -- lsp-add ls1 ls1p7 -- lsp-add ls1 ls1p8
 -- lsp-add ls1 ls1p9 -- lsp-add ls1 ls1p10 
2022-03-11T16:46:46Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name","parent_name","tag_request"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3
2022-03-11T16:46:46Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4
2022-03-11T16:46:46Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"f3c0ae30-63e6-4fbd-ac7b-99925822503d",{"NB_Global":{"b4726857-0395-483b-a185-fa8c1a5061a4":{"initial":{}}}}], id=3

Comment 12 Ilya Maximets 2022-03-11 16:57:27 UTC
Thanks, @jishi. LGTM.

Comment 13 Jianlin Shi 2022-03-14 02:11:44 UTC
set Verified per comment 10 and comment 11

Comment 15 errata-xmlrpc 2022-03-30 16:28:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.16 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1146