Bug 1838334
Summary: | [OVN][Scale]SB-DB raft cluster doesn't recover at higher election time and 500+ nodes | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Anil Vishnoi <avishnoi> | ||||
Component: | OVN | Assignee: | Ilya Maximets <i.maximets> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Jianlin Shi <jishi> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | RHEL 7.7 | CC: | anivanov, ctrautma, dcbw, i.maximets, jhsiao, jtaleric, mmichels, qding, ralongi | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-09-10 10:44:38 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anil Vishnoi
2020-05-20 23:04:17 UTC
Created attachment 1690416 [details]
sb-db raft cluster nodes logs
I think this might be the same issue as in: https://bugzilla.redhat.com/show_bug.cgi?id=1834838 Looking at the logs I see the high memory consumption that mostly goes from the jsonrpc backlog: 2020-05-20T19:51:48Z|03350|memory|INFO|peak resident set size grew 147% in last 2126.7 seconds, from 5128996 kB to 12655768 kB 2020-05-20T19:51:48Z|03351|memory|INFO|backlog:3947087447 cells:2120596 monitors:2 sessions:114 triggers:1 Above lines shows that ovsdb-server consumes ~12 GB of system memory, ~4 GB of which are the messages that waits for sending inside jsonrpc for the database (monitors, probably) connections. I think that most of the other 8 GB are in jsonrpc backlog, but for the raft connections. I'm working on a patch to add raft memory consumption to the memory reports. This will show as a more clear picture. In general, for me it looks like huge amount of messages lends into raft rpc and destination ovsdb-server just not able to timely receive and process them. This mostly happens because of the huge install_snapshot_request's which contains the whole database. At the time remote ovsdb-server replies to votes or other control messages, they are already far outdated and this way this server is not able to ever leave 'candidate' state. Same, probably, happens to followers that are not able to select a new leader due to continuously outdated votes and inability to communicate normally. And it's possible that this condition will not recover itself just because each outdated vote generates vote_reply that lands to the same backlog and eventually generates other control messages. One possible workaround might be to monitor the size of a backlog and just drop connection if it reaches some critical value, but we need to think how to avoid such conditions in general. -- I changed component to OVN as we're usually opening raft/OVN related bugs for OVN component. Feel free to change back if needed. Anil, I prepared a patch to collect information about raft memory consumption: https://patchwork.ozlabs.org/project/openvswitch/patch/20200522173630.417106-1-i.maximets@ovn.org/ It's Acked, but not merged yet. Could you, please, run your tests with it and collect output of ovs-appctl memory/show while the system is unstable? This way we could identify if it's the same issue as in BZ 1834838. Sure, i will get to it by sometime tomorrow. do you have any github branch for this patch that i can pull to build my local ovn-kubernetes image? (In reply to Anil Vishnoi from comment #4) > Sure, i will get to it by sometime tomorrow. do you have any github branch > for this patch that i can pull to build my local ovn-kubernetes image? This patch is in upstream master branch already. BTW, in a couple of hours the second patch (that hopefully fixes the issue) will be in upstream master branch too: https://patchwork.ozlabs.org/project/openvswitch/patch/20200523173412.477681-1-i.maximets@ovn.org/ For your convenience, here is the branch with raft memory report and without the fix: https://github.com/igsilya/ovs/tree/tmp-raft-memory-report (In reply to Ilya Maximets from comment #5) > (In reply to Anil Vishnoi from comment #4) > > Sure, i will get to it by sometime tomorrow. do you have any github branch > > for this patch that i can pull to build my local ovn-kubernetes image? > > This patch is in upstream master branch already. > > BTW, in a couple of hours the second patch (that hopefully fixes the issue) > will be in upstream master branch too: > > https://patchwork.ozlabs.org/project/openvswitch/patch/20200523173412.477681- > 1-i.maximets/ > > For your convenience, here is the branch with raft memory report and without > the fix: https://github.com/igsilya/ovs/tree/tmp-raft-memory-report Looks like both the patches are merged upstream. I will try the master branch sometime this week and report. Hi, Anil. Did you have a chance to test with patches applied? Ilya's patches are in openvswitch2.13-2.13.0-29.el7fdp which is tagged into OCP 4.6 for the last week or so. |