Bug 2030239

Summary: named consumed too much memory and failed to reload.
Product: Red Hat Enterprise Linux 8 Reporter: Xueying Nie <xnie>
Component: bindAssignee: Petr Menšík <pemensik>
Status: CLOSED ERRATA QA Contact: Petr Sklenar <psklenar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.3CC: psklenar
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: bind-9.11.36-3.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 15:29:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gen-views.sh
none
candidate patch none

Description Xueying Nie 2021-12-08 09:33:32 UTC
Description of problem:

- named consumed much more memory and failed to reload after updating from RHEL6.5 to RHEL8.3 with the same configuration.

- As the number of view definitions increases, the memory used tends to increase linearly.

Version-Release number of selected component (if applicable):

bind-9.11.26-6

How reproducible:

Steps to Reproduce:

1.Configure views (above 10000) in "named.conf".
2.Run `systemctl reload named-chroot`
3.Run `systemctl reload named-chroot` again

Actual results:

named failed to reload at step 3.
named consumed about 15G memory.

Expected results:

Reduce the memory consumption of named.

Additional info:

- It seems that the issue is due to the option "--with-tuning=large" added in bind 9.10.

- Similar issue on RHEL7 was reported on Bugzilla 1578051, but it was closed with "CLOSED WONTFIX" status.

Comment 2 Petr Menšík 2022-02-10 11:39:13 UTC
We won't switch back to --tuning=small if that is requested.

Some parameters can be changed on command line. For example OPTIONS+="-S 4096" in /etc/sysconfig/named would use less buffers. But I think this parameter is does not significantly change used resources.

But createview in lib/dns/client.c uses parameter RESOLVER_NTASKS, which is higher with --tuning=large configure parameter. I expect that is responsible for more used memory instead. This parameter has no command line alternative.

In bind 9.11, it does not scale. bind 9.16+ multiplies similar constant per used cpus. If customer would limit number of used CPUs (-n 4), it would limit also amount of used memory. We are preparing bind9.16 new package with new version of bind for RHEL 8.6. Could that work for the customer?

Comment 3 Petr Menšík 2022-02-10 14:10:07 UTC
Created attachment 1860358 [details]
gen-views.sh

Simple generator of small views in higher number. Used to generate include from named.conf, which creates enough of separate views. Uses just predefined zones for simplicity. It does not matter much.

Comment 4 Petr Menšík 2022-02-10 14:35:25 UTC
Above script shows important differences only on 20 views. bind-9.11.36-2.el8.x86_64 reports Memory: 518.0M just with 20 views. It raises to Memory: 998.6M in systemctl status named after rndc reload.

Done on VM with 1 CPU.

Statistics created by rndc stats report just small memory consumption.

++ Cache Statistics ++
[View: 127.0.0.2 (Cache: 127.0.0.2)]
                   0 cache hits
                  78 cache misses
                   0 cache hits (from query)
                   0 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
                   0 cache records deleted due to TTL expiration
                   0 cache database nodes
                  64 cache database hash buckets
              279512 cache tree memory total
               21624 cache tree memory in use
               21680 cache tree highest memory in use
              262144 cache heap memory total
                1024 cache heap memory in use
                1024 cache heap highest memory in use

bind-9.16.23-1.el9.x86_64 has much better consumption. Reports Memory: 160.6M after restart, Memory: 201.4M after rndc reload.
That even when cache statistics reports much higher memory usage:
++ Cache Statistics ++
[View: 127.0.0.2 (Cache: 127.0.0.2)]
                   0 cache hits
                  26 cache misses
                   0 cache hits (from query)
                   0 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
                   0 cache records deleted due to TTL expiration
                   0 cache database nodes
              524288 cache database hash buckets
             4478184 cache tree memory total
             4219136 cache tree memory in use
             4219264 cache tree highest memory in use
              262144 cache heap memory total
                1088 cache heap memory in use
                1088 cache heap highest memory in use

Fedora Rawhide has optimized memory usage further, it uses on build bind-9.16.25-2.fc36.x86_64 only Memory: 140.0M after restart, Memory: 180.6M after reload. But it has introduced issues with bind-dyndb-ldap, so rebase is not possible right now.

Used memory raises approximately in linear way to number of used views. With just 30 views, 769.8M and 1.4G used memory is reported without single external query on 9.11. 389.1M and 487.1M is used on RHEL9 with 50 views.

Comment 5 Petr Menšík 2022-02-10 14:46:59 UTC
The change appeared in MR 3067 [1], commit  0d80266f. I guess we can backports such change also to 9.11 branch. It should help on machines with few CPUs. It might raise consumption on high-count of CPUs however. Already prepared bind9.16 would help without additional changes.

1. https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/3067

Comment 6 Petr Menšík 2022-02-10 14:55:38 UTC
Adding also upstream issue link to refused runtime tuning change, from bug #1578051

Comment 7 Petr Menšík 2022-02-10 18:40:31 UTC
Created attachment 1860395 [details]
candidate patch

Modified upstream change. Use per cpu count of tasks, but set high limit to number of used tasks. Starts with lower number of tasks, but ensure 16+ cpu machines use at most original amount of memory.

Comment 18 errata-xmlrpc 2022-05-10 15:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: bind security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:2092