1279188 – bind-chroot causes kernel to crash on restart (mount with bind option)

Bug 1279188 - bind-chroot causes kernel to crash on restart (mount with bind option)

Summary: bind-chroot causes kernel to crash on restart (mount with bind option)

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	24
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-11-08 13:43 UTC by Gerhard Wiesinger
Modified:	2017-08-08 12:22 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-08-08 12:22:26 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Kernel OOPS (224.26 KB, image/png) 2015-11-08 13:43 UTC, Gerhard Wiesinger	no flags	Details
Fix for setup-named-chroot.sh (604 bytes, patch) 2016-01-27 10:26 UTC, Frank Crawford	no flags	Details \| Diff
Unmount crash (203.54 KB, image/png) 2016-09-05 19:45 UTC, Gerhard Wiesinger	no flags	Details
View All

Description Gerhard Wiesinger 2015-11-08 13:43:36 UTC

Created attachment 1091262 [details]
Kernel OOPS

Description of problem:
restart of bind in chroot environment generates a kernel oops (see screenshot attached).
Hanging commands afterwards:
root     12186  0.0  0.8 119636  3044 ?        Ss   14:19   0:00 /bin/bash /usr/libexec/setup-named-chroot.sh /var/named/chroot on
root     12220  0.0  0.4 129908  1676 ?        D    14:19   0:00 mount --bind --make-private /var/named /var/named/chroot/var/named

Version-Release number of selected component (if applicable):
4.2.5-300.fc23.x86_64
bind-chroot-9.10.3-2.fc23.x86_64

How reproducible:
systemctl restart named-chroot
Reproduceable on 2 machines.

Steps to Reproduce:
1.systemctl restart named-chroot

Actual results:
Kernel panic

Expected results:
No kernel panic

Additional info:
See screenshot

Comment 1 Gerhard Wiesinger 2015-12-20 11:18:14 UTC

Any update on this?

Really ugly bug, crashes the machine on latest update. Even on dist upgrade this is really ugly.

Comment 2 Frank Crawford 2015-12-24 05:59:14 UTC

I've been getting it as well, ever since upgrading to f23, or in fact the first one was while upgrading to f23.

Comment 3 Frank Crawford 2016-01-01 03:20:36 UTC

There is a similar bug reported for Ubuntu, which has been pushed up as a upstream Kernel issue (not surprising).

You can find the details here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1439849

In their case they have isolated it to having two chroot sessions doing a mount within them, although as I haven't tried to duplicate it.

Comment 4 Gerhard Wiesinger 2016-01-24 07:16:40 UTC

Happens also with 4.3.3 series: 4.3.3-301.fc23.x86_64

Comment 5 Frank Crawford 2016-01-27 06:59:03 UTC

Okay, now I've created a VM that I can duplicate this on (basically cloned my server) I can at least test this, I've noticed a few other things.

Firstly, and not surprisingly, you only need to run:

setup-named-chroot.sh /var/named/chroot on
setup-named-chroot.sh /var/named/chroot off

to cause it, although it may need a slight pause between them.

Secondly, if you don't mount the directories as "private" then you don't crash (that is probably a quick temporary fix),

Finally, and what may give more indication of the issue, running without "private" then has issues unmounting /var/named/chroot/var/named, although rerunning it will unmount it fine.

Comment 6 Frank Crawford 2016-01-27 10:26:29 UTC

Created attachment 1118749 [details]
Fix for setup-named-chroot.sh

While the issue does appear to be a kernel bug, this patch reorders the mouts to avoid triggering the bug.

In particular, the umount of /var/named should be last, as it also includes /var/named/chroot and any other bind mounts.

Comment 7 Frank Crawford 2016-01-27 10:34:45 UTC

For the kernel maintaniners, the problem seems to be that if the private bind mount for /var/named under /var/named/chroot (i.e. mount --bind --make-private /var/named /var/named/chroot/var/named) failed during umount as other mounts still existed, and when a attempt was made to perform the same mount again it caused the kernel panic.

Why this doesn't occur in all cases, I don't know.  It may be the related to other system mounts, and in particular, in my case, that "/var" is a separate mount point, so we end up with "/var/named/chroot/var/named/chroot/.....", but I haven't explored it.

Comment 8 Gerhard Wiesinger 2016-05-18 18:15:40 UTC

Still not fixed in: Linux vps01 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4 23:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Can that bug be fixed/patch integrated to bind, so that updates in fc24 work well?

Comment 9 Tomáš Hozza 2016-05-20 14:22:18 UTC

(In reply to Gerhard Wiesinger from comment #8)
> Still not fixed in: Linux vps01 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4
> 23:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> 
> Can that bug be fixed/patch integrated to bind, so that updates in fc24 work
> well?

I included the patch from comment #6 in bind-9.10.3-14.P4.fc2{3,4,5}

Comment 10 Gerhard Wiesinger 2016-05-20 15:29:27 UTC

Thnx

Comment 11 Frank Crawford 2016-05-21 04:58:16 UTC

Out of interest, do you know if the underlying kernel bug has been reported anywhere?

Comment 12 Gerhard Wiesinger 2016-05-21 05:06:39 UTC

Except here and the link above (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1439849) I'm not aware of any further bug reports.

After searching I found (not exactly the same):
https://lkml.org/lkml/2016/5/9/124
5ec0811d30378ae104f250bfc9b3640242d81e3f
https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.10

Comment 13 Frank Crawford 2016-05-21 05:46:43 UTC

(In reply to Gerhard Wiesinger from comment #12)

Thanks, I'll try it out sometime on a 4.4.10 or later kernel.

Comment 14 Gerhard Wiesinger 2016-05-21 06:24:00 UTC

Kernel 4.5.4 has 5ec0811d30378ae104f250bfc9b3640242d81e3f also included, already available at:
dnf --enablerepo updates-testing update kernel\*
https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.5.4

Will try it, too.

Comment 15 Gerhard Wiesinger 2016-05-21 06:31:18 UTC

Seems to be fixed kernel-4.5.4-200.fc23.x86_64

Can someone verify, too please.

Comment 16 Flóki Pálsson 2016-05-27 21:11:17 UTC

i can not boot from kernel-4.5.4-200.fc23.x86_64

kernel kernel-4.4.9-300.fc23.x86_64
 boots ok

Comment 17 Flóki Pálsson 2016-05-27 21:21:05 UTC

kernel-4.5.5-201.fc23.x86_64 is ok

Comment 18 Gerhard Wiesinger 2016-05-28 06:10:14 UTC

Please add update instructions for FC24 update:
Either kernel or bind-chroot must be at the latest version.

Comment 19 Josh Boyer 2016-06-02 13:08:01 UTC

Thanks to all of your for your diligence on this bug.

Comment 20 Gerhard Wiesinger 2016-09-05 19:43:13 UTC

Only the workaround is in place, the kernel OOPS still happens even with kernel 4.7.2 from Fedora:

To reproduce use the following systemd config (foreground mode) and restart several times:

Type=simple
ExecStart=/usr/sbin/named -f -u named -t /var/named/chroot $OPTIONS

See also the attachment.

Comment 21 Gerhard Wiesinger 2016-09-05 19:45:24 UTC

Created attachment 1198019 [details]
Unmount crash

See previous comment

Comment 22 Gerhard Wiesinger 2016-09-05 19:46:12 UTC

4.7.2-201.fc24.x86_64

Comment 23 Fedora End Of Life 2016-12-20 15:31:47 UTC

Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 24 Laura Abbott 2016-12-20 17:56:22 UTC

Is this still an issue on the latest 4.8 kernels?

Comment 25 Gerhard Wiesinger 2016-12-20 20:01:41 UTC

Yes, still a problem when patch is reverted: Attachment #1118749 [details]: Fix for setup-named-chroot.sh for bug #1279188

and restarting several times:
systemctl restart named-chroot

stalls the machine for minutes
Message from syslogd@maschine at Dec 20 20:57:56 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [perl:5394]

Message from syslogd@maschine at Dec 20 20:57:56 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:23602]

Message from syslogd@maschine at Dec 20 20:58:51 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [rs:main Q:Reg:1479]

Message from syslogd@maschine at Dec 20 20:58:51 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [umount:23618]

Message from syslogd@maschine at Dec 20 20:58:51 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [rs:main Q:Reg:1479]

Message from syslogd@maschine at Dec 20 20:58:51 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:23618]

[92472.070062]  [<ffffffff8b272da7>] __lookup_mnt_last+0x17/0x70
[92472.070062]  [<ffffffff8b281f5f>] propagate_umount+0x17f/0x290
[92472.070062]  [<ffffffff8b2718ca>] umount_tree+0x28a/0x2a0
[92472.070062]  [<ffffffff8b272963>] do_umount+0x2c3/0x330
[92472.070062]  [<ffffffff8b2737ae>] SyS_umount+0x10e/0x120
[92472.070062]  [<ffffffff8b803b72>] entry_SYSCALL_64_fastpath+0x1a/0xa4

[92472.067020]  [<ffffffff8b25e7bb>] path_lookupat+0x1b/0x120
[92472.067020]  [<ffffffff8b260de1>] filename_lookup+0xb1/0x180
[92472.067020]  [<ffffffff8b125b34>] ? do_futex+0x2c4/0xaf0
[92472.067020]  [<ffffffff8b2265b3>] ? kmem_cache_alloc+0xe3/0x1b0
[92472.067020]  [<ffffffff8b2609df>] ? getname_flags+0x4f/0x1f0
[92472.067020]  [<ffffffff8b260f86>] user_path_at_empty+0x36/0x40
[92472.067020]  [<ffffffff8b24e4a4>] SyS_access+0xb4/0x220
[92472.067020]  [<ffffffff8b251d79>] ? SyS_write+0x79/0xc0
[92472.067020]  [<ffffffff8b803b72>] entry_SYSCALL_64_fastpath+0x1a/0xa4

...
4.8.15-300.fc25.x86_64 #1 SMP Thu Dec 15 23:10:23 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Comment 26 Justin M. Forbes 2017-04-11 15:01:13 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-100.fc24.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 27 Gerhard Wiesinger 2017-04-11 18:02:16 UTC

No kernel dev did something, so still active.

Comment 28 Fedora End Of Life 2017-07-25 19:27:18 UTC

This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 29 Fedora End Of Life 2017-08-08 12:22:26 UTC

Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.