Bug 1432211
Summary: | open_count / holders not being decremented at unmount of private nested mount points | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | John Pittman <jpittman> | ||||
Component: | systemd | Assignee: | Michal Sekletar <msekleta> | ||||
Status: | CLOSED ERRATA | QA Contact: | Frantisek Sumsal <fsumsal> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.2 | CC: | ableisch, alexng, aviro, bxue, dkinkead, esandeen, fdeutsch, fsumsal, jbrassow, jopoulso, jpittman, jshivers, linuxkidd, mmatsuya, mowens, msekleta, mszeredi, systemd-maint-list, vkuznets, zlang | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | systemd-219-58.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-10-30 11:32:10 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1263648, 1549617, 1551061 | ||||||
Attachments: |
|
Description
John Pittman
2017-03-14 19:14:34 UTC
Failed to add versions earlier: RHEL 6.8 - 7.1 all behave as expected RHEL 7.2 & 7.3 issue is reproducible Changed version to 7.2 and added regression keyword (In reply to John Pittman from comment #0) > Description of problem: > > mapped_device->open_count / mapped_device->holders not being decremented at > unmount of private mount point. I have not been able to root cause why they > are not decremented at unmount. The (very likely) reason is that the mount has already been propagated to another mount namespace. Making the parent mount private *after* that will result in the mount surviving in the other mount ns, keeping the ref on the superblock. Solution: Mark the parent mount private *before* mounting anything on it. Tested and marking the parent mount private before mounting anything on it does indeed work around the issue. Thanks for confirming. In this case there's no bug to speak of in the kernel, because it's working as expected. Not sure if it's a good idea to make mounts in the initial namespace be shared by default is a good idea, but that's not a kernel issue. Sure, glad to help. Unfortunately I need to reopen. There is no remediation for this issue once it occurs outside of a reboot. This escalates severity within the relevant environment. Requesting a check for this condition, along with an error return and message. If this needs to be re-routed please feel free. There's some process which is running in a cloned namespace. If that process is killed, then that will release the superblock. So no need to reboot. Again, this is how it should work. There's no way to fix this as it's not a bug. To understand better try the following: mount /dev/mapper/testvg-testlv1 /mnt mkdir /tmp/mnt mount --bind /mnt /tmp/mnt cd /tmp/mnt umount -l . umount /mnt There will be no mounts referring to this superblock, yet it will still be accessible by the shell. Similar thing happens when it is mounted into a shared mount, which results in the new mount being cloned into any namespace which also shares the parent mount. When you remove that sharing by --make-private, you stop the umount from propagating to other namespaces. So you shouldn't do that. BTW there are plenty of other ways to shoot yourself in the foot with root privs. In much much worse ways... (In reply to Miklos Szeredi from comment #6) > (In reply to John Pittman from comment #0) > > Description of problem: > > > > mapped_device->open_count / mapped_device->holders not being decremented at > > unmount of private mount point. I have not been able to root cause why they > > are not decremented at unmount. > > The (very likely) reason is that the mount has already been propagated to > another mount namespace. Making the parent mount private *after* that will > result in the mount surviving in the other mount ns, keeping the ref on the > superblock. > > Solution: Mark the parent mount private *before* mounting anything on it. When I was first testing this issue, this was my original suspicion. However, the above actions are not necessary on RHEL 7.0 and 7.1. Behavior changed within the major version. If the above is seemingly a "feature" why was the feature not present at the 7 GA or even at the 7.1 y release? The behavior regarding shared mounts hasn't changed since the feature was introduced (which was a *long* time ago). I'm quite sure the difference in behavior can be explained with some userspace changes: - no sharing by default on older releases - no private mount namespace used by system utils on older releases Hi Miklos, Here is the listing of processes on my virtual system during the issue; which ones should we kill? I assumed ext/jdb were the ones: root 2271 0.0 0.0 0 0 ? S 08:55 0:00 [jbd2/dm-2-8] root 2272 0.0 0.0 0 0 ? S< 08:55 0:00 [ext4-rsv-conver] root 2274 0.0 0.0 0 0 ? S 08:55 0:00 [jbd2/dm-3-8] root 2275 0.0 0.0 0 0 ? S< 08:55 0:00 [ext4-rsv-conver] But I have been unable to kill these with 'kill -9'. ps_listing: https://gist.github.com/jtpittman195/fdf8e580d19b04c32f6d69ca7ab0e8cf Any suggestions are appreciated. Also, just as a test I installed the 7.0 and 7.1 versions of util-linux (listed below) on my 7.3 system and the behavior persisted. util-linux-2.23.2-16.el7.x86_64.rpm util-linux-2.23.2-21.el7.x86_64.rpm You need to look at /proc/*/task/*/mountinfo. You can see in there if the offending mount is in the task's mount list or not. You can also see in /proc/self/mountinfo how the mounts in the inital namespace are set up. Look for the shared:NN field on mounts. Downgrading systemd to 7.1 level. Issue did not persist. Miklos, You were exactly right! By searching for the mount points in /proc/$pid/mountinfo, I was able to find the 3 processes still holding on. root 536 0.0 0.5 46716 5016 ? Ss 14:28 0:00 /usr/lib/systemd/systemd-udevd root 702 0.0 1.1 587028 10180 ? Ssl 14:28 0:00 /usr/sbin/NetworkManager --no-daemon root 815 0.0 1.7 112880 15828 ? S 14:28 0:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-ens3.pid -lf /var/lib/NetworkManager/dhclient-cb4ca6f9-274e-498d-867f-229dcc6c71b7-ens3.lease -cf /var/lib/NetworkManager/dhclient-ens3.conf ens3 I got rid of 536 and 702, restarted 815 and the logical volumes were released. Looking further into the systemd portion. It seems strange networking processes would be involved. Miklos, Thank you _very_ much for pointing us in the right direction. I was able to direct the customer on how to identify the errant process(es) whose view of the mounted namespace includes the backing logical volume for the recently unmounted child directory of the private parent directory. They can then either restart the errant process or kill it if necessary. Could someone from the systemd team please have a quick look at this? - Is it evident what is causing this issue? - Is this working as designed? - If yes, we would like to request a warning as remediation is troublesome and varies per environment. I will attach a few test runs (tests.tgz) with debug udev/systemd logging enabled that show the different results/remediation. Miklos's remediation procedure worked in 2 of the 3 tests. Created attachment 1265434 [details]
3 test cases showing issue
(In reply to John Pittman from comment #19) > Could someone from the systemd team please have a quick look at this? > > - Is it evident what is causing this issue? Udev runs in its own mount namespace with propagation set to slave. Hence it receives mount events from the initial mount namespace. Note that initial mount namespace has propagation set to shared by systemd. This configuration is in place for a long time and isn't anything new in RHEL-7.2 or RHEL-7.3. Slave mount propagation for udev was introduced in RHEL-7.2. However, default is shared so mounts were propagated to udev's table in RHEL-7.1 as well. > - Is this working as designed? Honestly, I don't know. I have no idea why devices are kept around. Udev certainly doesn't use /www directory for anything, so it shouldn't keep device busy. Also unmount events should be propagated to its namespace as well, hence it (kernel) should remove/unref this mount from udev's mountable once the mount is gone from initial mount namespace. *** Bug 1489552 has been marked as a duplicate of this bug. *** (In reply to Miklos Szeredi from comment #6) > Making the parent mount private *after* that will > result in the mount surviving in the other mount ns, keeping the ref on the > superblock. You are right, but I believe that reason why parent stays around in the other namespace is *not* due to --make-private (as that disables the propagation of mount/umount events for all *immediate sub-mounts* of the private mount and does not influence propagation of the mount itself as that is governed by its parent). Kernel determines whether or not to propagate unmount event by looking at parent mount. Hence if you have two mounts /X, /X/Y/Z and you mark /X as private and then unmount them, they will *both* stay around in e.g. udev's mount namespace. This is because /X/Y/Z umount event won't be propagated because /X is private. /X unmount event should have been propagated to namespace because / is shared (slave of peer group 1 usually and parent of /X) however kernel can't get rid of the /X because /X/Y/Z is still present. The net result is that both mounts will be visible in namespace after unmounts in the initial namespace. /X/Y/Z because of disabled propagation by --make-private on the parent (/X) and /X because kernel can't unmount it while there are mounted submounts. (In reply to Michal Sekletar from comment #37) > (In reply to Miklos Szeredi from comment #6) > > > Making the parent mount private *after* that will > > result in the mount surviving in the other mount ns, keeping the ref on the > > superblock. > > You are right, but I believe that reason why parent stays around in the > other namespace is *not* due to --make-private (as that disables the > propagation of mount/umount events for all *immediate sub-mounts* of the > private mount and does not influence propagation of the mount itself as that > is governed by its parent). > > Kernel determines whether or not to propagate unmount event by looking at > parent mount. Hence if you have two mounts /X, /X/Y/Z and you mark /X as > private and then unmount them, they will *both* stay around in e.g. udev's > mount namespace. This is because /X/Y/Z umount event won't be propagated > because /X is private. /X unmount event should have been propagated to > namespace because / is shared (slave of peer group 1 usually and parent of > /X) however kernel can't get rid of the /X because /X/Y/Z is still present. > > The net result is that both mounts will be visible in namespace after > unmounts in the initial namespace. /X/Y/Z because of disabled propagation by > --make-private on the parent (/X) and /X because kernel can't unmount it > while there are mounted submounts. Correct. fix merged to staging branch -> https://github.com/lnykryn/systemd-rhel/pull/185 -> post *** Bug 1286242 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3245 |