1374917 – named-chroot unexpected memory usage with kernel 4.7.2

Bug 1374917 - named-chroot unexpected memory usage with kernel 4.7.2

Summary: named-chroot unexpected memory usage with kernel 4.7.2

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	bind
Sub Component:
Version:	24
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Petr Menšík
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-09-10 11:56 UTC by Sam Varshavchik
Modified:	2017-08-08 17:16 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-08-08 17:16:04 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Disable datasize option (718 bytes, application/mbox) 2016-09-19 17:18 UTC, Laura Abbott	no flags	Details
View All

Description Sam Varshavchik 2016-09-10 11:56:28 UTC

Description of problem:

After updating kernel-4.7.2-201.fc24.x86_64, named-chroot segfaults and fails to start.

Booting the previous kernel, kernel-4.6.7-300.fc24.x86_64, named-chroot starts up fine.

Version-Release number of selected component (if applicable):

bind-9.10.4-1.P2.fc24.src.rpm

How reproducible:

Always

Steps to Reproduce:
1. update and boot kernel-4.7.2-201.fc24.x86_64

Actual results:


Sep 10 07:38:23 shorty named[1651]: found 4 CPUs, using 4 worker threads
Sep 10 07:38:23 shorty named[1651]: using 3 UDP listeners per interface
Sep 10 07:38:23 shorty named[1651]: using up to 21000 sockets
Sep 10 07:38:23 shorty named[1651]: loading configuration from '/etc/named.conf'
Sep 10 07:38:23 shorty systemd: Started Process Core Dump (PID 1658/UID 0).
Sep 10 07:38:23 shorty audit: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1658-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 10 07:38:23 shorty systemd-coredump: Failed to compress /var/lib/systemd/coredump/.#core.named.25.ca634135c99040fb85a11c067afb8b57.1651.1473507503000000000000.lz47f8ef984309bb6f4: Invalid argument
Sep 10 07:38:23 shorty systemd-coredump: Failed to generate stack trace: invalid `Elf' handle
Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of user 25 dumped core.
Sep 10 07:38:23 shorty audit: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=named-chroot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Sep 10 07:38:23 shorty audit: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1658-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 10 07:38:23 shorty audit: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=named-chroot-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 10 07:38:23 shorty systemd: named-chroot.service: Control process exited, code=exited status=1
Sep 10 07:38:23 shorty systemd: Failed to start Berkeley Internet Name Domain (DNS).
Sep 10 07:38:23 shorty systemd: named-chroot.service: Unit entered failed state.
Sep 10 07:38:23 shorty systemd: named-chroot.service: Failed with result 'exit-code'.

Expected results:

named should start.

Additional info:

Comment 1 Sam Varshavchik 2016-09-10 15:07:38 UTC

I was able to reproduce the segfault on another server. It appears that removing the following directive from "options" in named.conf gets rid of the segfault:

datasize 20M;

I had this directive set for quite a while. Not sure what's about 4.7.2 that messes it up.

Comment 2 Sam Varshavchik 2016-09-13 22:32:47 UTC

Freshly started named-chroot:

Kernel 4.6.7:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
23257 named     20   0  399284  73460   6556 S  13.3  0.9   0:00.04 named       

Kernel 4.7.2:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
31823 named     20   0  634392  80808   6484 S  33.3  2.0   0:00.15 named      

VIRT has doubled in size, RES is slightly larger, SHR is slightly lower.

Comment 3 Sam Varshavchik 2016-09-13 22:43:39 UTC

I dumped /proc/pid/maps of named, running on both kernels.

All mappings of shared libraries are identical. The difference is entirely in the anonymous "rw-p" data mappings. named is using a lot more memory under kernel 4.7.2

Comment 4 Sam Varshavchik 2016-09-13 22:47:45 UTC

Slight correction. The different mappings are "rw-p" and "---p" mappings. "rw-p" mappings are slightly larger in size, under 4.7.2. The "---p" mappings are significantly larger in size under 4.7.2.

Comment 5 Brent 2016-09-15 19:05:53 UTC

Same issue for me.  Needed to either remove the "datasize 20M;" directive in named.conf, or increase it until it would start.  For me, I needed to go to 110M to be able to start without error.  This was happening starting in the 4.7 kernel releases and is present on the latest stable Fedora kernel, 4.7.3-200.fc24.x86_64.

Comment 6 Jason Tibbitts 2016-09-16 01:33:55 UTC

Some observations:

This has nothing to do with named-chroot.  The issue presents with just plain old named.

I noticed this in dmesg the first time named fails to start:

[    9.243186] mmap: named (593): VmData 27566080 exceed data ulimit 20971520. Update limits or use boot option ignore_rlimit_data

I booted to the oldest F24 kernel I had on hand (4.5.5) and it does indeed start, but when it does, I see this in dmesg:

[    9.344632] mmap: named (664): VmData 27566080 exceed data ulimit 20971520. Will be forbidden soon.

So... the warning appeared somewhere around 4.5 as far as I can figure, and the actual enforcement of the limit was turned on in 4.7.  Before this the kernel wasn't actually enforcing the limit.

I've no idea whether the kernel is accounting correctly, if named is allocating too much memory, or something else.  I don't have any old machines to check to see what the memory footprint of a running named actually is.  But currently it's quite a bit bigger than 20M regardless, and I suspect that it's been that way for some time.

Comment 7 Jason Tibbitts 2016-09-16 01:38:11 UTC

(Note that I wasn't having this issue at all because I wasn't using datasize; I added it in order to obtain the messages I showed.  For me there hasn't been any significant change in the running footprint of named (as shown by the 'ps' command) in quite some time, and I certainly can't see any difference when I boot between kernel versions (tried 4.7.3, 4.7.2, 4.5.5 and 4.6.7).  I can try some older kernels tomorrow but it's getting late now.

Comment 8 Laura Abbott 2016-09-19 17:18:07 UTC

Created attachment 1202559 [details]
Disable datasize option

I reported this to the kernel upstream and the conclusion was that while yes this is technically a regression it's so easy to work around that the change isn't going to be reverted. Ideally this should get fixed in upstream bind. I reported a bug to the upstream bind project as well but the response seemed to be lukewarm and I don't know how quickly it's going to be fixed. I'd propose just disabling the datasize option completely for now, something like the attached patch. I'll leave this up to the bind maintainers to fully figure out what should be done though.

Comment 9 Tomáš Hozza 2016-09-20 07:36:02 UTC

(In reply to Laura Abbott from comment #8)
> Created attachment 1202559 [details]
> Disable datasize option
> 
> I reported this to the kernel upstream and the conclusion was that while yes
> this is technically a regression it's so easy to work around that the change
> isn't going to be reverted. Ideally this should get fixed in upstream bind.
> I reported a bug to the upstream bind project as well but the response
> seemed to be lukewarm and I don't know how quickly it's going to be fixed.
> I'd propose just disabling the datasize option completely for now, something
> like the attached patch. I'll leave this up to the bind maintainers to fully
> figure out what should be done though.

Thank you for reporting this to BIND upstream. Can you please provide the ISC Bug # you've received in the automatic reply from their ticketing system? ISC's bug tracking system is not public, so there is no way for me to find the report by myself.

Thank you.

Comment 10 Laura Abbott 2016-09-20 21:54:18 UTC

ISC-Bugs #43220

Comment 11 Fedora Admin XMLRPC Client 2016-12-01 14:20:52 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 12 Fedora End Of Life 2017-07-25 22:57:20 UTC

This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2017-08-08 17:16:04 UTC

Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.