RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1432211 - open_count / holders not being decremented at unmount of private nested mount points
Summary: open_count / holders not being decremented at unmount of private nested mount...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.2
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Michal Sekletar
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
: 1286242 1489552 (view as bug list)
Depends On:
Blocks: 1263648 1549617 1551061
TreeView+ depends on / blocked
 
Reported: 2017-03-14 19:14 UTC by John Pittman
Modified: 2021-12-10 14:57 UTC (History)
20 users (show)

Fixed In Version: systemd-219-58.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-30 11:32:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
3 test cases showing issue (6.85 KB, application/x-gzip)
2017-03-22 16:25 UTC, John Pittman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2961861 0 None None None 2017-03-16 14:59:22 UTC

Description John Pittman 2017-03-14 19:14:34 UTC
Description of problem:

mapped_device->open_count / mapped_device->holders not being decremented at unmount of private mount point.  I have not been able to root cause why they are not decremented at unmount.  The result is logical volumes that are left open even though the filesystems are unmounted.  A reboot is needed to get the logical volumes to a non-open state.

Issue happens with nested mounts, --make-private run on the parent directory

Version-Release number of selected component (if applicable):

kernel-3.10.0-514.10.2.el7

How reproducible:

[root@host ~]# mount /dev/mapper/testvg-testlv1 /mnt
[root@host ~]# mount /dev/mapper/testvg-testlv2 /mnt/dir1/dir2/
[root@host ~]# mount --make-private /www/
[root@host ~]# umount -f /mnt/dir1/dir2/
[root@host ~]# umount -f /mnt

Verify logical volumes are still open with 'lvs -o +devices'

Actual results:

Logical volumes remain open

Expected results:

dm_blk_close() should be called to decrement &md->open_count and &md_holders

Additional info:

Added logging to a test kernel to see when dm_blk_close() and dm_blk_open() are being called, and we find that when make-private is used, the functions are not called.

Patch and results found at:  https://gist.github.com/jtpittman195/c5bf1a4d82cc6c670ed96522ab5a0f34

Comment 4 John Pittman 2017-03-16 15:33:22 UTC
Failed to add versions earlier:

RHEL 6.8 - 7.1 all behave as expected
RHEL 7.2 & 7.3 issue is reproducible

Changed version to 7.2 and added regression keyword

Comment 6 Miklos Szeredi 2017-03-17 10:29:52 UTC
(In reply to John Pittman from comment #0)
> Description of problem:
> 
> mapped_device->open_count / mapped_device->holders not being decremented at
> unmount of private mount point.  I have not been able to root cause why they
> are not decremented at unmount.

The (very likely) reason is that the mount has already been propagated to another mount namespace.  Making the parent mount private *after* that will result in the mount surviving in the other mount ns, keeping the ref on the superblock.

Solution: Mark the parent mount private *before* mounting anything on it.

Comment 7 John Pittman 2017-03-17 14:29:11 UTC
Tested and marking the parent mount private before mounting anything on it does indeed work around the issue.

Comment 8 Miklos Szeredi 2017-03-17 14:37:30 UTC
Thanks for confirming.

In this case there's no bug to speak of in the kernel, because it's working as expected.

Not sure if it's a good idea to make mounts in the initial namespace be shared by default is a good idea, but that's not a kernel issue.

Comment 9 John Pittman 2017-03-17 14:43:47 UTC
Sure, glad to help.  Unfortunately I need to reopen.  There is no remediation for this issue once it occurs outside of a reboot.  This escalates severity within the relevant environment.  

Requesting a check for this condition, along with an error return and message.  If this needs to be re-routed please feel free.

Comment 10 Miklos Szeredi 2017-03-17 15:10:07 UTC
There's some process which is running in a cloned namespace.  If that process is killed, then that will release the superblock.  So no need to reboot.

Again, this is how it should work.  There's no way to fix this as it's not a bug.

To understand better try the following:

mount /dev/mapper/testvg-testlv1 /mnt
mkdir /tmp/mnt
mount --bind /mnt /tmp/mnt
cd /tmp/mnt
umount -l .
umount /mnt

There will be no mounts referring to this superblock, yet it will still be accessible by the shell.

Similar thing happens when it is mounted into a shared mount, which results in the new mount being cloned into any namespace which also shares the parent mount.

When you remove that sharing by --make-private, you stop the umount from propagating to other namespaces.

So you shouldn't do that.

BTW there are plenty of other ways to shoot yourself in the foot with root privs.  In much much worse ways...

Comment 11 Jacob Shivers 2017-03-17 18:24:17 UTC
(In reply to Miklos Szeredi from comment #6)
> (In reply to John Pittman from comment #0)
> > Description of problem:
> > 
> > mapped_device->open_count / mapped_device->holders not being decremented at
> > unmount of private mount point.  I have not been able to root cause why they
> > are not decremented at unmount.
> 
> The (very likely) reason is that the mount has already been propagated to
> another mount namespace.  Making the parent mount private *after* that will
> result in the mount surviving in the other mount ns, keeping the ref on the
> superblock.
> 
> Solution: Mark the parent mount private *before* mounting anything on it.

When I was first testing this issue, this was my original suspicion.
However, the above actions are not necessary on RHEL 7.0 and 7.1.

Behavior changed within the major version. If the above is seemingly a "feature" why was the feature not present at the 7 GA or even at the 7.1 y release?

Comment 12 Miklos Szeredi 2017-03-17 18:52:07 UTC
The behavior regarding shared mounts hasn't changed since the feature was introduced (which was a *long* time ago).

I'm quite sure the difference in behavior can be explained with some userspace changes:

 - no sharing by default on older releases
 - no private mount namespace used by system utils on older releases

Comment 13 John Pittman 2017-03-20 13:20:48 UTC
Hi Miklos,

Here is the listing of processes on my virtual system during the issue; which ones should we kill?  I assumed ext/jdb were the ones:

root      2271  0.0  0.0      0     0 ?        S    08:55   0:00 [jbd2/dm-2-8]
root      2272  0.0  0.0      0     0 ?        S<   08:55   0:00 [ext4-rsv-conver]
root      2274  0.0  0.0      0     0 ?        S    08:55   0:00 [jbd2/dm-3-8]
root      2275  0.0  0.0      0     0 ?        S<   08:55   0:00 [ext4-rsv-conver]

But I have been unable to kill these with 'kill -9'.

ps_listing:  https://gist.github.com/jtpittman195/fdf8e580d19b04c32f6d69ca7ab0e8cf

Any suggestions are appreciated.

Also, just as a test I installed the 7.0 and 7.1 versions of util-linux (listed below) on my 7.3 system and the behavior persisted.

util-linux-2.23.2-16.el7.x86_64.rpm
util-linux-2.23.2-21.el7.x86_64.rpm

Comment 14 Miklos Szeredi 2017-03-20 15:15:57 UTC
You need to look at /proc/*/task/*/mountinfo.  You can see in there if the offending mount is in the task's mount list or not.

You can also see in /proc/self/mountinfo how the mounts in the inital namespace are set up.  Look for the shared:NN field on mounts.

Comment 15 John Pittman 2017-03-20 16:31:48 UTC
Downgrading systemd to 7.1 level.  Issue did not persist.

Comment 16 John Pittman 2017-03-20 20:01:11 UTC
Miklos,

You were exactly right!  By searching for the mount points in /proc/$pid/mountinfo, I was able to find the 3 processes still holding on.

root       536  0.0  0.5  46716  5016 ?        Ss   14:28   0:00 /usr/lib/systemd/systemd-udevd
root       702  0.0  1.1 587028 10180 ?        Ssl  14:28   0:00 /usr/sbin/NetworkManager --no-daemon
root       815  0.0  1.7 112880 15828 ?        S    14:28   0:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-ens3.pid -lf /var/lib/NetworkManager/dhclient-cb4ca6f9-274e-498d-867f-229dcc6c71b7-ens3.lease -cf /var/lib/NetworkManager/dhclient-ens3.conf ens3

I got rid of 536 and 702, restarted 815 and the logical volumes were released.

Looking further into the systemd portion.  It seems strange networking processes would be involved.

Comment 17 Jacob Shivers 2017-03-21 17:28:38 UTC
Miklos,

Thank you _very_ much for pointing us in the right direction.
I was able to direct the customer on how to identify the errant
process(es) whose view of the mounted namespace includes the backing logical volume for the recently unmounted child directory of the private parent directory.

They can then either restart the errant process or kill it if necessary.

Comment 19 John Pittman 2017-03-22 16:24:44 UTC
Could someone from the systemd team please have a quick look at this?  

- Is it evident what is causing this issue?
- Is this working as designed?
  - If yes, we would like to request a warning as remediation is troublesome and varies per environment.

I will attach a few test runs (tests.tgz) with debug udev/systemd logging enabled that show the different results/remediation.  Miklos's remediation procedure worked in 2 of the 3 tests.

Comment 20 John Pittman 2017-03-22 16:25:35 UTC
Created attachment 1265434 [details]
3 test cases showing issue

Comment 21 Michal Sekletar 2017-05-05 13:52:58 UTC
(In reply to John Pittman from comment #19)
> Could someone from the systemd team please have a quick look at this?  
> 
> - Is it evident what is causing this issue?

Udev runs in its own mount namespace with propagation set to slave. Hence it receives mount events from the initial mount namespace. Note that initial mount namespace has propagation set to shared by systemd. This configuration is in place for a long time and isn't anything new in RHEL-7.2 or RHEL-7.3. Slave mount propagation for udev was introduced in RHEL-7.2. However, default is shared so mounts were propagated to udev's table in RHEL-7.1 as well. 

> - Is this working as designed?

Honestly, I don't know. I have no idea why devices are kept around. Udev certainly doesn't use /www directory for anything, so it shouldn't keep device busy. Also unmount events should be propagated to its namespace as well, hence it (kernel) should remove/unref this mount from udev's mountable once the mount is gone from initial mount namespace.

Comment 32 Miklos Szeredi 2017-09-27 06:42:01 UTC
*** Bug 1489552 has been marked as a duplicate of this bug. ***

Comment 37 Michal Sekletar 2018-01-17 09:17:22 UTC
(In reply to Miklos Szeredi from comment #6)

> Making the parent mount private *after* that will
> result in the mount surviving in the other mount ns, keeping the ref on the
> superblock.

You are right, but I believe that reason why parent stays around in the other namespace is *not* due to --make-private (as that disables the propagation of mount/umount events for all *immediate sub-mounts* of the private mount and does not influence propagation of the mount itself as that is governed by its parent). 

Kernel determines whether or not to propagate unmount event by looking at parent mount. Hence if you have two mounts /X, /X/Y/Z  and you mark /X as private and then unmount them, they will *both* stay around in e.g. udev's mount namespace. This is because /X/Y/Z umount event won't be propagated because /X is private. /X unmount event should have been propagated to namespace because / is shared (slave of peer group 1 usually and parent of /X) however kernel can't get rid of the /X because /X/Y/Z is still present.

The net result is that both mounts will be visible in namespace after unmounts in the initial namespace. /X/Y/Z because of disabled propagation by --make-private on the parent (/X) and /X because kernel can't unmount it while there are mounted submounts.

Comment 38 Miklos Szeredi 2018-01-17 09:44:19 UTC
(In reply to Michal Sekletar from comment #37)
> (In reply to Miklos Szeredi from comment #6)
> 
> > Making the parent mount private *after* that will
> > result in the mount surviving in the other mount ns, keeping the ref on the
> > superblock.
> 
> You are right, but I believe that reason why parent stays around in the
> other namespace is *not* due to --make-private (as that disables the
> propagation of mount/umount events for all *immediate sub-mounts* of the
> private mount and does not influence propagation of the mount itself as that
> is governed by its parent). 
> 
> Kernel determines whether or not to propagate unmount event by looking at
> parent mount. Hence if you have two mounts /X, /X/Y/Z  and you mark /X as
> private and then unmount them, they will *both* stay around in e.g. udev's
> mount namespace. This is because /X/Y/Z umount event won't be propagated
> because /X is private. /X unmount event should have been propagated to
> namespace because / is shared (slave of peer group 1 usually and parent of
> /X) however kernel can't get rid of the /X because /X/Y/Z is still present.
> 
> The net result is that both mounts will be visible in namespace after
> unmounts in the initial namespace. /X/Y/Z because of disabled propagation by
> --make-private on the parent (/X) and /X because kernel can't unmount it
> while there are mounted submounts.

Correct.

Comment 46 Michal Sekletar 2018-01-19 11:58:41 UTC
https://github.com/lnykryn/systemd-rhel/pull/185

Comment 49 Lukáš Nykrýn 2018-06-19 13:41:27 UTC
fix merged to staging branch -> https://github.com/lnykryn/systemd-rhel/pull/185 -> post

Comment 52 Lukáš Nykrýn 2018-10-29 13:17:15 UTC
*** Bug 1286242 has been marked as a duplicate of this bug. ***

Comment 54 errata-xmlrpc 2018-10-30 11:32:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3245


Note You need to log in before you can comment on or make changes to this bug.