Bug 189535
Summary: | /net automount of systems exporting more than 1 mountpoint do not unmount | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Brendan Lynch <brendanplynch> |
Component: | autofs | Assignee: | Ian Kent <ikent> |
Status: | CLOSED WONTFIX | QA Contact: | Brock Organ <borgan> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | jkf385, jmoyer, k.georgiou, triage |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | bzcl34nup | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-05-06 15:49:11 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Brendan Lynch
2006-04-20 19:43:29 UTC
(In reply to comment #0) > Description of problem: > Looking at the "syslong output from the autofs client (with debug logging turned > on) one sees every minute or so: > > Apr 20 15:37:42 megamama automount[1675]: sig 14 switching from 1 to 2 > Apr 20 15:37:42 megamama automount[1675]: get_pkt: state 1, next 2 > Apr 20 15:37:42 megamama automount[1675]: st_expire(): state = 1 > Apr 20 15:37:42 megamama automount[1675]: expire_proc: exp_proc=5528 > Apr 20 15:37:42 megamama automount[1675]: handle_packet: type = 2 > Apr 20 15:37:42 megamama automount[1675]: handle_packet_expire_multi: token 214, > name gamester > Apr 20 15:37:42 megamama automount[5529]: expiring path /net/gamester > Apr 20 15:37:42 megamama automount[5529]: umount_multi: path=/net/gamester incl=1 > Apr 20 15:37:42 megamama automount[5529]: umount_multi: unmounting > dir=/net/gamester/sbin > Apr 20 15:37:42 megamama automount[5529]: umount_multi: unmounting > dir=/net/gamester/var > Apr 20 15:37:42 megamama automount[5529]: rm_unwanted: /net/gamester/var > Apr 20 15:37:42 megamama automount[5529]: rm_unwanted: /net/gamester/sbin > Apr 20 15:37:42 megamama automount[5529]: rm_unwanted: /net/gamester > Apr 20 15:37:42 megamama automount[5529]: expired /net/gamester > Apr 20 15:37:42 megamama automount[1675]: handle_child: got pid 5529, sig 0 (0), > stat 0 > Apr 20 15:37:42 megamama automount[1675]: sig_child: found pending iop pid 5529: > signalled 0 (sig 0), exit status 0 > Apr 20 15:37:42 megamama automount[1675]: send_ready: token=214 > Apr 20 15:37:42 megamama automount[1675]: handle_packet: type = 0 > Apr 20 15:37:42 megamama automount[1675]: handle_packet_missing: token 215, name > gamester > Apr 20 15:37:42 megamama automount[1675]: attempting to mount entry /net/gamester > Apr 20 15:37:42 megamama automount[5532]: lookup(program): looking up gamester > Apr 20 15:37:42 megamama automount[5532]: lookup(program): gamester -> > -fstype=nfs,tcp,hard,intr,nodev,nosuid /sbin gamester:/sbin /var gamester:/var > Apr 20 15:37:42 megamama automount[5532]: parse(sun): expanded entry: > -fstype=nfs,tcp,hard,intr,nodev,nosuid /sbin gamester:/sbin /var gamester:/var > Apr 20 15:37:42 megamama automount[5532]: parse(sun): > dequote("fstype=nfs,tcp,hard,intr,nodev,nosuid") -> > fstype=nfs,tcp,hard,intr,nodev,nosuid > Apr 20 15:37:42 megamama automount[5532]: parse(sun): gathered options: > fstype=nfs,tcp,hard,intr,nodev,nosuid > Apr 20 15:37:42 megamama automount[5532]: parse(sun): dequote("/sbin") -> /sbin > Apr 20 15:37:42 megamama automount[1675]: handle_child: got pid 5528, sig 0 (0), > stat 0 > Apr 20 15:37:42 megamama automount[5532]: parse(sun): dequote("gamester:/sbin") > -> gamester:/sbin > Apr 20 15:37:42 megamama automount[1675]: sigchld: exp 5528 finished, switching > from 2 to 1 > Apr 20 15:37:42 megamama automount[5532]: parse(sun): dequote("/var") -> /var > Apr 20 15:37:42 megamama automount[1675]: get_pkt: state 2, next 1 > Apr 20 15:37:42 megamama automount[5532]: parse(sun): dequote("gamester:/var") > -> gamester:/var > Apr 20 15:37:42 megamama automount[1675]: st_ready(): state = 2 > Apr 20 15:37:42 megamama automount[5532]: parse(sun): multimount: gamester:/var > on /var with options fstype=nfs,tcp,hard,intr,nodev,nosuid > Apr 20 15:37:42 megamama automount[5532]: parse(sun): mounting root > /net/gamester, mountpoint var, what gamester:/var, fstype nfs, > options tcp,hard,intr,nodev,nosuid > Apr 20 15:37:42 megamama automount[5532]: mount(nfs): root=/net/gamester > name=var what=gamester:/var, fstype=nfs, options=tcp,hard,intr,nodev,nosuid > Apr 20 15:37:42 megamama automount[5532]: mount(nfs): nfs > options="tcp,hard,intr,nodev,nosuid", nosymlink=0, ro=0 > Apr 20 15:37:42 megamama automount[5532]: mount(nfs): is_local_mount: gamester:/var > Apr 20 15:37:42 megamama automount[5532]: mount(nfs): from gamester:/var elected > gamester:/var > Apr 20 15:37:42 megamama automount[5532]: mount(nfs): calling mkdir_path > /net/gamester/var > Apr 20 15:37:42 megamama automount[5532]: mount(nfs): calling mount -t nfs -s > -o tcp,hard,intr,nodev,nosuid gamester:/var /net/gamester/var > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): mounted gamester:/var on > /net/gamester/var > Apr 20 15:37:43 megamama automount[5532]: parse(sun): multimount: gamester:/sbin > on /sbin with options fstype=nfs,tcp,hard,intr,nodev,nosuid > Apr 20 15:37:43 megamama automount[5532]: parse(sun): mounting root > /net/gamester, mountpoint sbin, what gamester:/sbin, fstype nfs, options > tcp,hard,intr,nodev,nosuid > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): root=/net/gamester > name=sbin what=gamester:/sbin, fstype=nfs, options=tcp,hard,intr,nodev,nosuid > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): nfs > options="tcp,hard,intr,nodev,nosuid", nosymlink=0, ro=0 > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): is_local_mount: gamester:/sbin > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): from gamester:/sbin > elected gamester:/sbin > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): calling mkdir_path > /net/gamester/sbin > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): calling mount -t nfs -s > -o tcp,hard,intr,nodev,nosuid gamester:/sbin /net/gamester/sbin > Apr 20 15:37:43 megamama automount[5532]: mount(nfs): mounted gamester:/sbin on > /net/gamester/sbin > Apr 20 15:37:43 megamama automount[1675]: handle_child: got pid 5532, sig 0 (0), > stat 0 > Apr 20 15:37:43 megamama automount[1675]: sig_child: found pending iop pid 5532: > signalled 0 (sig 0), exit status 0 > Apr 20 15:37:43 megamama automount[1675]: send_ready: token=215 This log trace is typical what we see when remounting occurs due to file system scanning by a GUI. Can you setup a test system that has this problem and take it to run level 3 (ie. init 3) and see if the problem still occurs please. In the past, when I has had this problem I have identified the file that holds the list of directories to be scanned and set it's permissions to 000. Strangely enough it this doesn't seem to make a difference to general operation. Ian I'm afraid it was running at level 3; the logs above were from an init 3 run (the default for machine "megamama". No program should have been referencing the "/net/gamester" mount; and the forced unmount stooped the behavior suggesting that indeed no program was trying to access the mount. Brendan The problem is that right after the mount expires and the filesystems are unmounted, the check_rm_dirs function is called which references the filesystem again and causes it to be remounted. I'm not exactly sure what check_rm_dirs is trying to accomplish so I can't suggest a fix. However the following change to automount.c (from the autofs module) does stop the remounting from occuring but should only be considered a workaround and not a fix. *** automount.c.orig 2006-04-22 19:36:03.000000000 -0700 --- automount.c 2006-04-22 19:38:39.000000000 -0700 *************** *** 297,304 **** free_mnt_list(mntlist); /* Delete detritus like unwanted mountpoints and symlinks */ if (left == 0) ! check_rm_dirs(path, incl); return left; } --- 297,314 ---- free_mnt_list(mntlist); /* Delete detritus like unwanted mountpoints and symlinks */ + + /* + * doing this references the mount we just umounted + * and causes it to be remounted, so this needs to be + * done in a different way + * -john foderaro 4/22/06 if (left == 0) ! check_rm_dirs(path, incl); ! ! */ ! ! return left; } (In reply to comment #3) > The problem is that right after the mount expires and the filesystems are > unmounted, the check_rm_dirs function is called which references the filesystem > again and causes it to be remounted. > > I'm not exactly sure what check_rm_dirs is trying to accomplish so I can't > suggest a fix. > > However the following change to automount.c (from the autofs module) does stop > the remounting from occuring but should only be considered a workaround and not > a fix. That's very strange. The autofs4 module shouldn't be doing that at all because filesystem requests from processes in the same process group should be recognised and mount requests not sent. That's fundamental to normal operation. I'll have a look and see if I can work out what's causing this. Ian I can confirm this bug with FC5 and autofs-4.1.4-16.2.2. automount[13554]: attempting to mount entry /net/merlin automount[13627]: rm_unwanted: /net/merlin automount[13627]: expired /net/merlin automount[13554]: attempting to mount entry /net/merlin An expired umount still follows a mount request, not trigged by any access to this mount point. See also bug 186454 where the same problem is being discussed. I can verify that this bug and 186454 are indeed duplicates of each other; my problem goes away when I stop hald. Looking at the code for hald I see in hald/linux2/blockdev.c:blockdev_refresh_mount_state(): 204 /* loop over /proc/mounts */ 205 while ((mnte = getmntent_r (f, &mnt, buf, sizeof(buf))) != NULL) { 206 struct stat statbuf; 207 208 /* check the underlying device of the mount point */ 209 if (stat (mnt.mnt_dir, &statbuf) != 0) 210 continue; 211 if (major(statbuf.st_dev) == 0) 212 continue; 213 214 HAL_INFO (("* found mounts dev %s (%i:%i)", mnt.mnt_fsname, major(statbuf.st_dev), minor(statbuf.st_dev))); which means hald will stat the mountpoint of *any* filesystem found in /proc/mounts which will causes a being-unmounted automount filesystem to be remounted. This will be called from hald/linux2/osspec.c:mount_tree_changed_event() every time a filesystem is unmounted; so in the case of a /net system reference with two or more directories mounted the second directory will be probed as the first is being unmounted by autofs. I think hald needs to look at the fs type of the mount entry and ignore FS types it does not care about (particularly NFS). (In reply to comment #7) > I can verify that this bug and 186454 are indeed duplicates of each other; my > problem goes away when I stop hald. > > Looking at the code for hald I see in > hald/linux2/blockdev.c:blockdev_refresh_mount_state(): > > 204 /* loop over /proc/mounts */ > 205 while ((mnte = getmntent_r (f, &mnt, buf, sizeof(buf))) != NULL) { > 206 struct stat statbuf; > 207 > 208 /* check the underlying device of the mount point */ > 209 if (stat (mnt.mnt_dir, &statbuf) != 0) > 210 continue; > 211 if (major(statbuf.st_dev) == 0) > 212 continue; > 213 > 214 HAL_INFO (("* found mounts dev %s (%i:%i)", > mnt.mnt_fsname, major(statbuf.st_dev), minor(statbuf.st_dev))); > Yes. I subscribed to the HAL mailing list and they mentioned they stat mount points. The cause of what we are seeing is a little more subtle though. Stat()ing an autofs mount point shouldn't cause it to be mounted, however, stating an intermediate mount trigger will. This is the case with /net mounts (eg. stat()ing /net/host/mp will trigger /net/host but only when it expires as that the time it will be able to mount it again). I duplicated the problem in version 5 and posted the debug log on the HAL list following which all went quite. I've added the requesting process pid to debug output in v5 now so it was quite obvious hald was doing it. Sorry I can't get further with this. (In reply to comment #8) > Yes. I subscribed to the HAL mailing list and they mentioned > they stat mount points. > > The cause of what we are seeing is a little more subtle though. > > Stat()ing an autofs mount point shouldn't cause it to be mounted, > however, stating an intermediate mount trigger will. This is > the case with /net mounts (eg. stat()ing /net/host/mp will > trigger /net/host but only when it expires as that the time it > will be able to mount it again). You are correct, but this is guaranteed to happen in the case of a "/net" mount of a machine exporting more than one nfs mount, or with indirect maps if you every move to something more complex like autofsNG. The /proc/mounts entries created for a /net/-mounted system will by definition appear below the trigger point, and as soon as one such mount is removed you hit a race (which hald normally wins) of hald reading /proc/mounts before autofs has unmounted all the other nfs mount points for the system and removed them from /proc/mounts. And when hald tries to stat the still-existing mount point the kernel walks through the trigger point and remounts the entire expoerted set again. Looking at the HAL specification attached to the source code, it is pretty clear that the HAL definiton of "block devices" was not intended to include network filesystem mounts: "block namespace Device objects representing addressable block devices, such as drives and partitions, will have info.bus set to block and will export a number of properties in the block namespace. " As you say, this is an issue that needs to be solved upstream in HAL. However I can report that adding in code to disregard network filesystem mount types in blockdev_refresh_mount_state() fixes the problem without causing any side effects I can notice in hald. Thanks Brendan, Can you check to see if this is still a problem with the latest version of the HAL package please. Ian The latest package in testing, hal-0.5.7-3.fc5.3, now no longer exhibits this problem with the default "auto.net" file or, in fact, with any nfs automount filesystem. However the fix fails if you use "nfs4" automounted filesystems (as we do, as all our Unix and Linux boxes are nfs4-capable). The behavior stays the same. You would see the same problem were you to use an autofs map to mount a multiple-entry "cifs" filesystem (as one could) or any other autofs-mountable filesystem. Wile this could theoretically be almost any filesystem type, in practice it is normall a network-mounted filesystem, and IMHO hal has no need to examine network-mounted filesystems. The check for "nfs" filesystem type in the hal-0.5.7-fix-for-nfs-and-autofs.patch: @@ -205,6 +206,43 @@ while ((mnte = getmntent_r (f, &mnt, buf, sizeof(buf))) != NULL) { struct stat statbuf; + /* If this is a nfs mount or autofs + * (fstype == 'nfs' || fstype == 'autofs') + * ignore the mount. Reason: + * 1. we don't list nfs devices in HAL + * 2. more problematic: stat on mountpoints with + * 'stale nfs handle' never come + * back and block complete HAL and all applications + * using HAL fail. + * 3. autofs and HAL butt heads causing drives to never + * be unmounted + */ + if (strcmp(mnt.mnt_type, "nfs") == 0 || + strcmp(mnt.mnt_type, "autofs") == 0) + continue; + should I think instead read: + if (strcmp(mnt.mnt_type, "nfs") == 0 || + strcmp(mnt.mnt_type, "nfs4") == 0 || + strcmp(mnt.mnt_type, "cifs") == 0 || + strcmp(mnt.mnt_type, "autofs") == 0) + continue; + As an aside, below the code fragment I give above there follows this: + /* If this is an autofs mount (fstype == 'autofs') + * store the mount in a list for later use. + * On mounts managed by autofs accessing files below the mount + * point cause the mount point to be remounted after an + * unmount. We keep the list so we do not check for + * the .created-by-hal file on mounts under autofs mount points + */ + if (strcmp(mnt.mnt_type, "autofs") == 0) { + char *mnt_dir; + + if (mnt.mnt_dir[strlen (mnt.mnt_dir) - 1] != '/') + mnt_dir = g_strdup_printf ("%s/", mnt.mnt_dir); + else + mnt_dir = g_strdup (mnt.mnt_dir); + + autofs_mounts = g_slist_append (autofs_mounts, + mnt_dir); + + + continue; + } --- but this code will never be executed since the "if (... strcmp(mnt.mnt_type, "autofs") == 0)" qutoed above will have matched and the "continue" statement will have completely skipped this second test. The first strcmp(mnt.mnt_type, "autofs") == 0 needs to come out. Ian, I am at GUADEC until July 3rd and won't be able to get around to fixing it. Can you patch and build HAL and then test to see if this fixes the issue? If not the quick work around would be to add the blacklist above but I would rather have a more generic solution. Thanks. (In reply to comment #12) > The first strcmp(mnt.mnt_type, "autofs") == 0 needs to come out. Ian, I am at > GUADEC until July 3rd and won't be able to get around to fixing it. Can you > patch and build HAL and then test to see if this fixes the issue? If not the > quick work around would be to add the blacklist above but I would rather have a > more generic solution. Thanks. I'll have a go at it but I can't guarrantee it will be what your after. Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |