Bug 2044621
| Summary: | ovsdb-server: Clustered database may send zero uuid as a last transaction id | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Ilya Maximets <i.maximets> |
| Component: | ovsdb2.16 | Assignee: | Ilya Maximets <i.maximets> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | FDP 22.A | CC: | ctrautma, jhsiao, jishi, ralongi, tredaelli |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openvswitch2.16-2.16.0-45.el8fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-30 16:28:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ilya Maximets
2022-01-24 20:18:43 UTC
Patch posted for review: https://patchwork.ozlabs.org/project/openvswitch/patch/20211219140941.2279071-3-i.maximets@ovn.org/ * Tue Feb 01 2022 Ilya Maximets <i.maximets> - 2.16.0-45
- ovsdb: transaction: Keep one entry in the transaction history. [RH git: 7665f42d12] (#2044621)
commit 6e13565dd32fb2cf5517f51ca06956e2052c4bba
Author: Ilya Maximets <i.maximets>
Date: Sun Dec 19 15:09:38 2021 +0100
ovsdb: transaction: Keep one entry in the transaction history.
If a single transaction exceeds the size of the whole database (e.g.,
a lot of rows got removed and new ones added), transaction history will
be drained. This leads to sending UUID_ZERO to the clients as the last
transaction id in the next monitor update, because monitor doesn't
know what was the actual last transaction id. In case of a re-connect
that will cause re-downloading of the whole database, since the
client's last_id will be out of sync.
One solution would be to store the last transaction ID separately
from the actual transactions, but that will require a careful
management in cases where database gets reset and the history needs
to be cleared. Keeping the one last transaction instead to avoid
the problem. That should not be a big concern in terms of memory
consumption, because this last transaction will be removed from the
history once the next transaction appeared. This is also not a concern
for a fast re-sync, because this last transaction will not be used
for the monitor reply; it's either client already has it, so no need
to send, or it's a history miss.
The test updated to not check the number of atoms if there is only
one transaction in the history.
Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.")
Acked-by: Mike Pattrick <mkp>
Acked-by: Han Zhou <hzhou>
Signed-off-by: Ilya Maximets <i.maximets>
Reported-at: https://bugzilla.redhat.com/2044621
Signed-off-by: Ilya Maximets <i.maximets>
Hi Ilya, Could you give some suggestions on how to test the patch? thanks (In reply to Jianlin Shi from comment #6) > Hi Ilya, > > Could you give some suggestions on how to test the patch? thanks Steps to reproduce: 1. Start OVN with clustered databases. 2. Enable debug logs for jsonrpc module on ovn-northd. 3. Add some amount of OVN resources in different OVN commands (different NbDB transactions), e.g. 20 Logical Switches. 4. Remove all these resources one by one. 5. Add a few more. 6. Remove them too. 7. Look for 'update3' notifications from the database in the ovn-northd.log. They should look like this: jsonrpc|DBG|unix:db1.sock: received notification, method="update3", params=[["monid","OVN_Northbound"],"3210e3a5-94cf-4c3c-a427-652746be8c81",{<data>}] "3210e3a5-94cf-4c3c-a427-652746be8c81" is an example of the last transaction id. There should be no update3 message where this uuid equals "00000000-0000-0000-0000-000000000000". The issue was introduced in openvswitch2.16-2.16.0-27.el8fdp. (In reply to Ilya Maximets from comment #7) > (In reply to Jianlin Shi from comment #6) > > Hi Ilya, > > > > Could you give some suggestions on how to test the patch? thanks > > Steps to reproduce: > > 1. Start OVN with clustered databases. > 2. Enable debug logs for jsonrpc module on ovn-northd. > 3. Add some amount of OVN resources in different OVN commands > (different NbDB transactions), e.g. 20 Logical Switches. > 4. Remove all these resources one by one. > 5. Add a few more. > 6. Remove them too. > 7. Look for 'update3' notifications from the database in the > ovn-northd.log. They should look like this: > > jsonrpc|DBG|unix:db1.sock: received notification, method="update3", > > params=[["monid","OVN_Northbound"],"3210e3a5-94cf-4c3c-a427-652746be8c81", > {<data>}] > > "3210e3a5-94cf-4c3c-a427-652746be8c81" is an example of the > last transaction id. There should be no update3 message where > this uuid equals "00000000-0000-0000-0000-000000000000". > > The issue was introduced in openvswitch2.16-2.16.0-27.el8fdp. Hi Ilya, I tried to reproduce with your steps as follows: 1. start cluster with script on 3 machines: ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.178.16 ip_c1=1.1.178.17 ip_c2=1.1.178.18 $ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd ovn-appctl -t ovn-northd vlog/set jsonrpc dbg 2. then add switch with: for i in {1..20} do ovn-nbctl --wait=hv ls-add ls$i done ovn-nbctl --wait=hv sync ovn-nbctl show for i in {1..20} do ovn-nbctl ls-del ls$i done for i in {30..50} do ovn-nbctl --wait=hv ls-add ls$i done ovn-nbctl --wait=hv sync for i in {30..50} do ovn-nbctl ls-del ls$i done but failed to reproduce the issue on openvswitch2.16-27. no such uuid "00000000-0000-0000-0000-000000000000" found in the log. [root@wsfd-advnetlab16 bz2044621]# rpm -qa | grep openvswitch2.16 openvswitch2.16-2.16.0-27.el8fdp.x86_64 OK. It turned out a bit harder to reproduce with a live OVN setup.
What we're trying to do is to create a single transaction that will
be larger than the database after execution of that transaction.
The following method should work:
1. Start clustered OVN.
2. Add a single logical switch and ~10 ports in it with a single
ovn-nbctl command. Run ovn-nbctl with jsonrpc debug enabled
and look for the initial monitor reply (grep 'result=\[false'
should show it).
3. Delete the logical switch. This is our big transaction, it will
delete the switch record and all the ports. Resulted database
will be empty.
4. Repeat step 2 checking the initial monitor reply to not have
"00000000-0000-0000-0000-000000000000" as a last id.
Clustered database should never send all-zeroes as a last id.
Example of a bad initial monitor reply:
jsonrpc|DBG|unix:nb1.ovsdb: received reply,
result=[false,"00000000-0000-0000-0000-000000000000",{<data>}], id=3
tested with following steps based on comment 9: 1. Start clustered OVN. 2. Add a single logical switch and ~10 ports in it with a single ovn-nbctl command. Run ovn-nbctl with jsonrpc debug enabled: ovn-nbctl -vjsonrpc:dbg ls-add ls1 \ -- lsp-add ls1 ls1p1 \ -- lsp-add ls1 ls1p2 \ -- lsp-add ls1 ls1p3 \ -- lsp-add ls1 ls1p4 \ -- lsp-add ls1 ls1p5 \ -- lsp-add ls1 ls1p6 \ -- lsp-add ls1 ls1p7 \ -- lsp-add ls1 ls1p8 \ -- lsp-add ls1 ls1p9 \ -- lsp-add ls1 ls1p10 reproduced on openvswitch2.16-27: 2022-03-11T15:19:22Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name","parent_name","tag_request"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3 2022-03-11T15:19:22Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4 2022-03-11T15:19:22Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"00000000-0000-0000-0000-000000000000",{"NB_Global":{"aa969934-68e6-4896-a6e6-9cbcd1b78060":{"initial":{}}}}], id=3 Verified on openvswitch2.16-58: [root@wsfd-advnetlab16 bz2044621]# rpm -qa | grep -E "openvswitch2.16|ovn-2021" ovn-2021-21.12.0-30.el8fdp.x86_64 ovn-2021-host-21.12.0-30.el8fdp.x86_64 openvswitch2.16-2.16.0-58.el8fdp.x86_64 ovn-2021-central-21.12.0-30.el8fdp.x86_64 2022-03-11T15:22:25Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name","parent_name","tag_request"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3 2022-03-11T15:22:25Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4 2022-03-11T15:22:25Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"9ddd60d2-3e71-455e-8744-2306fe632511",{"NB_Global":{"dbe7eaef-e3fb-43f3-af3f-478295c5bbf7":{"initial":{}}}}], id=3 also finishe the following steps in comment 9 on openvswitch2.16-58: [root@wsfd-advnetlab16 bz2044621]# ovn-nbctl -vjsonrpc:dbg ls-del ls1 2022-03-11T16:46:40Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3 2022-03-11T16:46:40Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4 2022-03-11T16:46:40Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"ebdfe97f-5501-4caf-a67f-b095c2598933" [root@wsfd-advnetlab16 bz2044621]# bash -x rep.sh + ovn-nbctl -vjsonrpc:dbg ls-add ls1 -- lsp-add ls1 ls1p1 -- lsp-add ls1 ls1p2 -- lsp-add ls1 ls1p3 -- lsp-add ls1 ls1p4 -- lsp-add ls1 ls1p5 -- lsp-add ls1 ls1p6 -- lsp-add ls1 ls1p7 -- lsp-add ls1 ls1p8 -- lsp-add ls1 ls1p9 -- lsp-add ls1 ls1p10 2022-03-11T16:46:46Z|00006|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="monitor_cond_since", params=["OVN_Northbound",["monid","OVN_Northbound"],{"NB_Global":[{"columns":[]}],"Logical_Router":[{"columns":["name","ports"]}],"Logical_Switch":[{"columns":["name","ports"]}],"Logical_Switch_Port":[{"columns":["name","parent_name","tag_request"]}],"Logical_Router_Port":[{"columns":["name"]}]},"00000000-0000-0000-0000-000000000000"], id=3 2022-03-11T16:46:46Z|00007|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: send request, method="set_db_change_aware", params=[true], id=4 2022-03-11T16:46:46Z|00008|jsonrpc|DBG|unix:/var/run/ovn/ovnnb_db.sock: received reply, result=[false,"f3c0ae30-63e6-4fbd-ac7b-99925822503d",{"NB_Global":{"b4726857-0395-483b-a185-fa8c1a5061a4":{"initial":{}}}}], id=3 Thanks, @jishi. LGTM. set Verified per comment 10 and comment 11 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.16 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1146 |