Created attachment 1091262 [details] Kernel OOPS Description of problem: restart of bind in chroot environment generates a kernel oops (see screenshot attached). Hanging commands afterwards: root 12186 0.0 0.8 119636 3044 ? Ss 14:19 0:00 /bin/bash /usr/libexec/setup-named-chroot.sh /var/named/chroot on root 12220 0.0 0.4 129908 1676 ? D 14:19 0:00 mount --bind --make-private /var/named /var/named/chroot/var/named Version-Release number of selected component (if applicable): 4.2.5-300.fc23.x86_64 bind-chroot-9.10.3-2.fc23.x86_64 How reproducible: systemctl restart named-chroot Reproduceable on 2 machines. Steps to Reproduce: 1.systemctl restart named-chroot Actual results: Kernel panic Expected results: No kernel panic Additional info: See screenshot
Any update on this? Really ugly bug, crashes the machine on latest update. Even on dist upgrade this is really ugly.
I've been getting it as well, ever since upgrading to f23, or in fact the first one was while upgrading to f23.
There is a similar bug reported for Ubuntu, which has been pushed up as a upstream Kernel issue (not surprising). You can find the details here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1439849 In their case they have isolated it to having two chroot sessions doing a mount within them, although as I haven't tried to duplicate it.
Happens also with 4.3.3 series: 4.3.3-301.fc23.x86_64
Okay, now I've created a VM that I can duplicate this on (basically cloned my server) I can at least test this, I've noticed a few other things. Firstly, and not surprisingly, you only need to run: setup-named-chroot.sh /var/named/chroot on setup-named-chroot.sh /var/named/chroot off to cause it, although it may need a slight pause between them. Secondly, if you don't mount the directories as "private" then you don't crash (that is probably a quick temporary fix), Finally, and what may give more indication of the issue, running without "private" then has issues unmounting /var/named/chroot/var/named, although rerunning it will unmount it fine.
Created attachment 1118749 [details] Fix for setup-named-chroot.sh While the issue does appear to be a kernel bug, this patch reorders the mouts to avoid triggering the bug. In particular, the umount of /var/named should be last, as it also includes /var/named/chroot and any other bind mounts.
For the kernel maintaniners, the problem seems to be that if the private bind mount for /var/named under /var/named/chroot (i.e. mount --bind --make-private /var/named /var/named/chroot/var/named) failed during umount as other mounts still existed, and when a attempt was made to perform the same mount again it caused the kernel panic. Why this doesn't occur in all cases, I don't know. It may be the related to other system mounts, and in particular, in my case, that "/var" is a separate mount point, so we end up with "/var/named/chroot/var/named/chroot/.....", but I haven't explored it.
Still not fixed in: Linux vps01 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4 23:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Can that bug be fixed/patch integrated to bind, so that updates in fc24 work well?
(In reply to Gerhard Wiesinger from comment #8) > Still not fixed in: Linux vps01 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4 > 23:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux > > Can that bug be fixed/patch integrated to bind, so that updates in fc24 work > well? I included the patch from comment #6 in bind-9.10.3-14.P4.fc2{3,4,5}
Thnx
Out of interest, do you know if the underlying kernel bug has been reported anywhere?
Except here and the link above (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1439849) I'm not aware of any further bug reports. After searching I found (not exactly the same): https://lkml.org/lkml/2016/5/9/124 5ec0811d30378ae104f250bfc9b3640242d81e3f https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.10
(In reply to Gerhard Wiesinger from comment #12) Thanks, I'll try it out sometime on a 4.4.10 or later kernel.
Kernel 4.5.4 has 5ec0811d30378ae104f250bfc9b3640242d81e3f also included, already available at: dnf --enablerepo updates-testing update kernel\* https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.5.4 Will try it, too.
Seems to be fixed kernel-4.5.4-200.fc23.x86_64 Can someone verify, too please.
i can not boot from kernel-4.5.4-200.fc23.x86_64 kernel kernel-4.4.9-300.fc23.x86_64 boots ok
kernel-4.5.5-201.fc23.x86_64 is ok
Please add update instructions for FC24 update: Either kernel or bind-chroot must be at the latest version.
Thanks to all of your for your diligence on this bug.
Only the workaround is in place, the kernel OOPS still happens even with kernel 4.7.2 from Fedora: To reproduce use the following systemd config (foreground mode) and restart several times: Type=simple ExecStart=/usr/sbin/named -f -u named -t /var/named/chroot $OPTIONS See also the attachment.
Created attachment 1198019 [details] Unmount crash See previous comment
4.7.2-201.fc24.x86_64
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
Is this still an issue on the latest 4.8 kernels?
Yes, still a problem when patch is reverted: Attachment #1118749 [details]: Fix for setup-named-chroot.sh for bug #1279188 and restarting several times: systemctl restart named-chroot stalls the machine for minutes Message from syslogd@maschine at Dec 20 20:57:56 ... kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [perl:5394] Message from syslogd@maschine at Dec 20 20:57:56 ... kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:23602] Message from syslogd@maschine at Dec 20 20:58:51 ... kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [rs:main Q:Reg:1479] Message from syslogd@maschine at Dec 20 20:58:51 ... kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [umount:23618] Message from syslogd@maschine at Dec 20 20:58:51 ... kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [rs:main Q:Reg:1479] Message from syslogd@maschine at Dec 20 20:58:51 ... kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:23618] [92472.070062] [<ffffffff8b272da7>] __lookup_mnt_last+0x17/0x70 [92472.070062] [<ffffffff8b281f5f>] propagate_umount+0x17f/0x290 [92472.070062] [<ffffffff8b2718ca>] umount_tree+0x28a/0x2a0 [92472.070062] [<ffffffff8b272963>] do_umount+0x2c3/0x330 [92472.070062] [<ffffffff8b2737ae>] SyS_umount+0x10e/0x120 [92472.070062] [<ffffffff8b803b72>] entry_SYSCALL_64_fastpath+0x1a/0xa4 [92472.067020] [<ffffffff8b25e7bb>] path_lookupat+0x1b/0x120 [92472.067020] [<ffffffff8b260de1>] filename_lookup+0xb1/0x180 [92472.067020] [<ffffffff8b125b34>] ? do_futex+0x2c4/0xaf0 [92472.067020] [<ffffffff8b2265b3>] ? kmem_cache_alloc+0xe3/0x1b0 [92472.067020] [<ffffffff8b2609df>] ? getname_flags+0x4f/0x1f0 [92472.067020] [<ffffffff8b260f86>] user_path_at_empty+0x36/0x40 [92472.067020] [<ffffffff8b24e4a4>] SyS_access+0xb4/0x220 [92472.067020] [<ffffffff8b251d79>] ? SyS_write+0x79/0xc0 [92472.067020] [<ffffffff8b803b72>] entry_SYSCALL_64_fastpath+0x1a/0xa4 ... 4.8.15-300.fc25.x86_64 #1 SMP Thu Dec 15 23:10:23 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs. Fedora 25 has now been rebased to 4.10.9-100.fc24. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26. If you experience different issues, please open a new bug report for those.
No kernel dev did something, so still active.
This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.