Bug 2030239
| Summary: | named consumed too much memory and failed to reload. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Xueying Nie <xnie> | ||||||
| Component: | bind | Assignee: | Petr Menšík <pemensik> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Petr Sklenar <psklenar> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 8.3 | CC: | psklenar | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | bind-9.11.36-3.el8 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-05-10 15:29:44 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Xueying Nie
2021-12-08 09:33:32 UTC
We won't switch back to --tuning=small if that is requested. Some parameters can be changed on command line. For example OPTIONS+="-S 4096" in /etc/sysconfig/named would use less buffers. But I think this parameter is does not significantly change used resources. But createview in lib/dns/client.c uses parameter RESOLVER_NTASKS, which is higher with --tuning=large configure parameter. I expect that is responsible for more used memory instead. This parameter has no command line alternative. In bind 9.11, it does not scale. bind 9.16+ multiplies similar constant per used cpus. If customer would limit number of used CPUs (-n 4), it would limit also amount of used memory. We are preparing bind9.16 new package with new version of bind for RHEL 8.6. Could that work for the customer? Created attachment 1860358 [details]
gen-views.sh
Simple generator of small views in higher number. Used to generate include from named.conf, which creates enough of separate views. Uses just predefined zones for simplicity. It does not matter much.
Above script shows important differences only on 20 views. bind-9.11.36-2.el8.x86_64 reports Memory: 518.0M just with 20 views. It raises to Memory: 998.6M in systemctl status named after rndc reload.
Done on VM with 1 CPU.
Statistics created by rndc stats report just small memory consumption.
++ Cache Statistics ++
[View: 127.0.0.2 (Cache: 127.0.0.2)]
0 cache hits
78 cache misses
0 cache hits (from query)
0 cache misses (from query)
0 cache records deleted due to memory exhaustion
0 cache records deleted due to TTL expiration
0 cache database nodes
64 cache database hash buckets
279512 cache tree memory total
21624 cache tree memory in use
21680 cache tree highest memory in use
262144 cache heap memory total
1024 cache heap memory in use
1024 cache heap highest memory in use
bind-9.16.23-1.el9.x86_64 has much better consumption. Reports Memory: 160.6M after restart, Memory: 201.4M after rndc reload.
That even when cache statistics reports much higher memory usage:
++ Cache Statistics ++
[View: 127.0.0.2 (Cache: 127.0.0.2)]
0 cache hits
26 cache misses
0 cache hits (from query)
0 cache misses (from query)
0 cache records deleted due to memory exhaustion
0 cache records deleted due to TTL expiration
0 cache database nodes
524288 cache database hash buckets
4478184 cache tree memory total
4219136 cache tree memory in use
4219264 cache tree highest memory in use
262144 cache heap memory total
1088 cache heap memory in use
1088 cache heap highest memory in use
Fedora Rawhide has optimized memory usage further, it uses on build bind-9.16.25-2.fc36.x86_64 only Memory: 140.0M after restart, Memory: 180.6M after reload. But it has introduced issues with bind-dyndb-ldap, so rebase is not possible right now.
Used memory raises approximately in linear way to number of used views. With just 30 views, 769.8M and 1.4G used memory is reported without single external query on 9.11. 389.1M and 487.1M is used on RHEL9 with 50 views.
The change appeared in MR 3067 [1], commit 0d80266f. I guess we can backports such change also to 9.11 branch. It should help on machines with few CPUs. It might raise consumption on high-count of CPUs however. Already prepared bind9.16 would help without additional changes. 1. https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/3067 Adding also upstream issue link to refused runtime tuning change, from bug #1578051 Created attachment 1860395 [details]
candidate patch
Modified upstream change. Use per cpu count of tasks, but set high limit to number of used tasks. Starts with lower number of tasks, but ensure 16+ cpu machines use at most original amount of memory.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: bind security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:2092 |