Description of problem: After updating kernel-4.7.2-201.fc24.x86_64, named-chroot segfaults and fails to start. Booting the previous kernel, kernel-4.6.7-300.fc24.x86_64, named-chroot starts up fine. Version-Release number of selected component (if applicable): bind-9.10.4-1.P2.fc24.src.rpm How reproducible: Always Steps to Reproduce: 1. update and boot kernel-4.7.2-201.fc24.x86_64 Actual results: Sep 10 07:38:23 shorty named[1651]: found 4 CPUs, using 4 worker threads Sep 10 07:38:23 shorty named[1651]: using 3 UDP listeners per interface Sep 10 07:38:23 shorty named[1651]: using up to 21000 sockets Sep 10 07:38:23 shorty named[1651]: loading configuration from '/etc/named.conf' Sep 10 07:38:23 shorty systemd: Started Process Core Dump (PID 1658/UID 0). Sep 10 07:38:23 shorty audit: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1658-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Sep 10 07:38:23 shorty systemd-coredump: Failed to compress /var/lib/systemd/coredump/.#core.named.25.ca634135c99040fb85a11c067afb8b57.1651.1473507503000000000000.lz47f8ef984309bb6f4: Invalid argument Sep 10 07:38:23 shorty systemd-coredump: Failed to generate stack trace: invalid `Elf' handle Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of user 25 dumped core. Sep 10 07:38:23 shorty audit: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=named-chroot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Sep 10 07:38:23 shorty audit: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1658-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Sep 10 07:38:23 shorty audit: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=named-chroot-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Sep 10 07:38:23 shorty systemd: named-chroot.service: Control process exited, code=exited status=1 Sep 10 07:38:23 shorty systemd: Failed to start Berkeley Internet Name Domain (DNS). Sep 10 07:38:23 shorty systemd: named-chroot.service: Unit entered failed state. Sep 10 07:38:23 shorty systemd: named-chroot.service: Failed with result 'exit-code'. Expected results: named should start. Additional info:
I was able to reproduce the segfault on another server. It appears that removing the following directive from "options" in named.conf gets rid of the segfault: datasize 20M; I had this directive set for quite a while. Not sure what's about 4.7.2 that messes it up.
Freshly started named-chroot: Kernel 4.6.7: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23257 named 20 0 399284 73460 6556 S 13.3 0.9 0:00.04 named Kernel 4.7.2: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31823 named 20 0 634392 80808 6484 S 33.3 2.0 0:00.15 named VIRT has doubled in size, RES is slightly larger, SHR is slightly lower.
I dumped /proc/pid/maps of named, running on both kernels. All mappings of shared libraries are identical. The difference is entirely in the anonymous "rw-p" data mappings. named is using a lot more memory under kernel 4.7.2
Slight correction. The different mappings are "rw-p" and "---p" mappings. "rw-p" mappings are slightly larger in size, under 4.7.2. The "---p" mappings are significantly larger in size under 4.7.2.
Same issue for me. Needed to either remove the "datasize 20M;" directive in named.conf, or increase it until it would start. For me, I needed to go to 110M to be able to start without error. This was happening starting in the 4.7 kernel releases and is present on the latest stable Fedora kernel, 4.7.3-200.fc24.x86_64.
Some observations: This has nothing to do with named-chroot. The issue presents with just plain old named. I noticed this in dmesg the first time named fails to start: [ 9.243186] mmap: named (593): VmData 27566080 exceed data ulimit 20971520. Update limits or use boot option ignore_rlimit_data I booted to the oldest F24 kernel I had on hand (4.5.5) and it does indeed start, but when it does, I see this in dmesg: [ 9.344632] mmap: named (664): VmData 27566080 exceed data ulimit 20971520. Will be forbidden soon. So... the warning appeared somewhere around 4.5 as far as I can figure, and the actual enforcement of the limit was turned on in 4.7. Before this the kernel wasn't actually enforcing the limit. I've no idea whether the kernel is accounting correctly, if named is allocating too much memory, or something else. I don't have any old machines to check to see what the memory footprint of a running named actually is. But currently it's quite a bit bigger than 20M regardless, and I suspect that it's been that way for some time.
(Note that I wasn't having this issue at all because I wasn't using datasize; I added it in order to obtain the messages I showed. For me there hasn't been any significant change in the running footprint of named (as shown by the 'ps' command) in quite some time, and I certainly can't see any difference when I boot between kernel versions (tried 4.7.3, 4.7.2, 4.5.5 and 4.6.7). I can try some older kernels tomorrow but it's getting late now.
Created attachment 1202559 [details] Disable datasize option I reported this to the kernel upstream and the conclusion was that while yes this is technically a regression it's so easy to work around that the change isn't going to be reverted. Ideally this should get fixed in upstream bind. I reported a bug to the upstream bind project as well but the response seemed to be lukewarm and I don't know how quickly it's going to be fixed. I'd propose just disabling the datasize option completely for now, something like the attached patch. I'll leave this up to the bind maintainers to fully figure out what should be done though.
(In reply to Laura Abbott from comment #8) > Created attachment 1202559 [details] > Disable datasize option > > I reported this to the kernel upstream and the conclusion was that while yes > this is technically a regression it's so easy to work around that the change > isn't going to be reverted. Ideally this should get fixed in upstream bind. > I reported a bug to the upstream bind project as well but the response > seemed to be lukewarm and I don't know how quickly it's going to be fixed. > I'd propose just disabling the datasize option completely for now, something > like the attached patch. I'll leave this up to the bind maintainers to fully > figure out what should be done though. Thank you for reporting this to BIND upstream. Can you please provide the ISC Bug # you've received in the automatic reply from their ticketing system? ISC's bug tracking system is not public, so there is no way for me to find the report by myself. Thank you.
ISC-Bugs #43220
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.