Description of problem: amd automounter fails to start all mount points properly with 2.6.22.1-41.fc7. Tested on i686 and x86_64. Have tried udev-113-8.fc7 on i686 with no change. amd works fine on both ARCH with 2.6.21-1.3194.fc7. Version-Release number of selected component (if applicable): 2.6.22.1-41.fc7 udev-113-8.fc7 udev-106-4.1.fc7 am-utils-6.1.5-6.fc7 How reproducible: Always with 2.6.22.1-41.fc7. Works fine with 2.6.21-1.3194.fc7 Steps to Reproduce: 1. install am-utils and 2.6.22.1-41.fc7 2. boot kernel. start amd (with more than 2 mount points) after kernel booted or from rc scripts. 3. Try amq to observe strange mount information. Actual results: Only one automount point seems to be created properly. Expected results: All mount points should be created. Additional info:
Created attachment 160585 [details] log of bad amd Attached is a log with a simple configuration. My actual amd configuration is quite complex. However the problem is reproducible with two simple /net and /net2 mount points. Copy the default /etc/amd.net file to /etc/amd.net2 and add the following lines on the bottom of your default /etc/amd.conf file: [ /net2 ] map_name = amd.net2 map_type = file
Using the same amd setup with 2.6.21-1.3194.fc7, the simple test in the above attachment looks like: alan ~ # amq amq: localhost: RPC: Program not registered alan ~ # service amd start Starting amd: Aug 3 17:17:45 alan amd[2612]/info: using configuration file /etc/amd.conf [ OK ] alan ~ # amq / root "root" /net toplvl /etc/amd.net /net /net2 toplvl /etc/amd.net2 /net2 alan ~ # uname -a Linux alan.une.edu.au 2.6.21-1.3194.fc7 #1 SMP Wed May 23 22:35:01 EDT 2007 i686 i686 i386 GNU/Linux
An additional data point, my laptop has a 2.6.22.1-33.fc7 kernel and the above simple amd configuration works, just like 2.6.21-1.3194.fc7
Created attachment 160723 [details] syslog messages with a bad amd startup - kernel 2.6.22.1-33.fc7 The above messages show the output of amq after a good amd start. This is output after a bad start: [root@rig etc]# amq amq: localhost: RPC: Program not registered [root@rig etc]# service amd start Starting amd: Aug 6 11:49:19 rig amd[3533]/info: using configuration file /etc/amd.conf [ OK ] [root@rig etc]# amq / root "root" /net error . //nil// /net2 toplvl /etc/amd.net /net2 If you wait a minute or two, the amq output changes to: [root@rig etc]# amq / root "root" /net2 toplvl /etc/amd.net /net2 [root@rig etc]# Attached is the syslog output of amd during a bad startup.
(In reply to comment #3) > An additional data point, my laptop has a 2.6.22.1-33.fc7 kernel and the above > simple amd configuration works, just like 2.6.21-1.3194.fc7 Hm, I could be accused of smoking something! I can't reproduce a working amd on 2.6.22.1-33.fc7. I seems to fail now. It still works on 2.6.21-1.3194.fc7
Out of interest I tried 2.6.23-0.74.rc2.git1.fc8 fresh out of Koji on my F7 laptop. It has the same am-utils problem as above. So in summary, so far I have: 2.6.21-1.3194.fc7 works on several i686 and a x86_64 that I have available. On the same systems, 2.6.22.1-33.fc7, 2.6.22.1-41.fc7, am-utils does not work. On a i686 laptop 2.6.23-0.74.rc2.git1.fc8, am-utils also does not work. Would love some feedback here. Does any one else use am-utils in F7? Is this a kernel bug or an am-utils bug? I have posted a message to the am-utils list but no one replied there either. Is there any information anyone would like that I have not provided? What do other people smoke?
It's not just you -- I use am-utils on F7 and have hit the same bug.
(In reply to comment #6) > Out of interest I tried 2.6.23-0.74.rc2.git1.fc8 fresh out of Koji on my F7 > laptop. It has the same am-utils problem as above. So in summary, so far I have: > > 2.6.21-1.3194.fc7 works on several i686 and a x86_64 that I have available. > > On the same systems, 2.6.22.1-33.fc7, 2.6.22.1-41.fc7, am-utils does not work. > > On a i686 laptop 2.6.23-0.74.rc2.git1.fc8, am-utils also does not work. > > Would love some feedback here. Does any one else use am-utils in F7? Is this a > kernel bug or an am-utils bug? I have posted a message to the am-utils list but > no one replied there either. Is there any information anyone would like that I > have not provided? What do other people smoke? I've not had much to do with amd at all so I don't know how to collect debug information from it. Is there some way to get messages with an increased logging into syslog? Can we see some please? I'm also not sure how amd actually works but I suspect it makes multiple (internal) mounts into the same file system and uses them to trigger automounts. If this is the case you may be seeing a problem with a recent patch included in the Fedora kernel. I know it's a hassle but, if possible, try obtaining the kernel srpm, comment out the line "Patch1030: linux-2.6-nfs-nosharecache.patch", try building and installing it and see if the problem disappears. Ian
(In reply to comment #8) > I've not had much to do with amd at all so I don't know how > to collect debug information from it. Is there some way > to get messages with an increased logging into syslog? > Can we see some please? I'll attach some straces of a good and bad amd startup I did a few days ago. That may be enough for you. The difference seems to be in the return from the mount() sys call. The startup was not with the test setup above but with a more complex set of maps. I'll look into more syslog info. > I'm also not sure how amd actually works but I suspect it > makes multiple (internal) mounts into the same file system > and uses them to trigger automounts. If this is the case > you may be seeing a problem with a recent patch included > in the Fedora kernel. I know it's a hassle but, if possible, > try obtaining the kernel srpm, comment out the line > "Patch1030: linux-2.6-nfs-nosharecache.patch", try building > and installing it and see if the problem disappears. This will take me a day or two. Thanks for looking at this problem.
Created attachment 161037 [details] strace of a bad amd startup strace -f -o /tmp/amd-bad.strace /usr/sbin/amd -F /etc/amd.conf
Created attachment 161038 [details] strace of a good amd startup strace -f -o /tmp/amd-good.strace /usr/sbin/amd -F /etc/amd.conf 2.6.21-1.3194.fc7 #1 SMP Wed May 23 22:35:01 EDT 2007 i686 i686 i386 GNU/Linux
I seem to remember that amd uses NFS mounts to do its automounting and the trace appears to confirm that. The traces show mount returning EBUSY for the fail case and 0 for the success case. This may well be the issue I mentioned above as that patch introduces an EBUSY return during mount if certain conditions are met. Still there are other places where EBUSY is returned from NFS but I don't see how they could happen during a mount. So my recommendation remains the same. Build a kernel without the patch above. I don't think there are any other dependencies on it. Ian
2751 <... mount resumed> ) = -1 EBUSY (Device or resource busy) 2743 <... mount resumed> ) = 0 2751 close(8 <unfinished ...> 2740 <... mount resumed> ) = -1 EBUSY (Device or resource busy) 2743 open("/etc/mtab", O_RDWR|O_CREAT, 0644 <unfinished ...> 2751 <... close resumed> ) = 0 2743 <... open resumed> ) = 9 2740 close(8 <unfinished ...> 2732 <... mount resumed> ) = -1 EBUSY (Device or resource busy) This sure looks like the NFS nosharecache patch is the problem! And here is the snippet from the amd log provided above: Aug 6 11:49:19 rig amd[3534]: creating mountpoint directory '/net2' Aug 6 11:49:19 rig amd[3534]: creating mountpoint directory '/net' Aug 6 11:49:19 rig amd[3534]: initializing amd.conf map /etc/amd.net of type file Aug 6 11:49:19 rig amd[3534]: first time load of map /etc/amd.net succeeded Aug 6 11:49:19 rig amd[3534]: /etc/amd.net mounted fstype toplvl on /net2 Aug 6 11:49:19 rig amd[3534]: /net2 set to never timeout Aug 6 11:49:19 rig amd[3536]: '/net': mount: Device or resource busy So it definitely looks like the second mount fails. I'm pretty confident that removing the nosharecache patch should resolve this problem.
Unless I've done something wrong with rebuilding the kernel, removing the nosharecache patch did not allow me to start more than one nfs mount point. I followed the bouncing ball on http://fedoraproject.org/wiki/Docs/CustomKernel and got the kernel source rpm, edited the SPEC file: diff kernel-2.6.spec kernel-2.6.spec-old 15c15 < %define buildid .amnfs --- > #% define buildid .local 625c625 < #Patch1030: linux-2.6-nfs-nosharecache.patch --- > Patch1030: linux-2.6-nfs-nosharecache.patch rpmbuild -bb --with baseonly --without debuginfo --target=`uname -m` kernel-2.6.spec rpm --oldpackage -hiv kernel-2.6.22.1-41.amnfs.fc7.i686.rpm And still amd starts with only one mount point. However, there is another work around. If, in /etc/amd.conf, I put a: mount_type = autofs and amd will start all the mount point points (as autofs instead of nfs). What can't be done now is to restart amd when some mount points are busy. The amd release notes (README.autofs in /usr/share/doc/am-utils*) says: - Implement the restarting of autofs mount points. This is already doable on Solaris; on Linux, the kernel needs to be patched to allow it.
(In reply to comment #14) > Unless I've done something wrong with rebuilding the kernel, removing the > nosharecache patch did not allow me to start more than one nfs mount point. Maybe not, as I think I gave incomplete instructions above, sorry. > > I followed the bouncing ball on http://fedoraproject.org/wiki/Docs/CustomKernel > and got the kernel source rpm, edited the SPEC file: > > diff kernel-2.6.spec kernel-2.6.spec-old > 15c15 > < %define buildid .amnfs > --- > > #% define buildid .local > 625c625 > < #Patch1030: linux-2.6-nfs-nosharecache.patch > --- > > Patch1030: linux-2.6-nfs-nosharecache.patch Commenting out this line and not the ApplyPatch linux-2.6-nfs-nosharecache.patch line probably leads to the patch still being applied. If the old "%patch1030" form was used to apply the patch the build would have failed. Ian
(In reply to comment #15) > > > Patch1030: linux-2.6-nfs-nosharecache.patch > > Commenting out this line and not the > ApplyPatch linux-2.6-nfs-nosharecache.patch > line probably leads to the patch still being applied. That was it. It was there in the instructions on the Fine Wiki Page, I just missed it. So now amd will work as before if the kernel is rebuilt without the nfs-nosharecache patch. However, I'm sensing that this patch will be in mainline and fedora in the future and that autofs is the way we should be going anyway. I'm happy to switch my systems over to autofs. It would also be good if reloading of autofs maps could be implemented sometime. Perhaps the fix for this bug is a change to the default /etc/amd.conf file in the am-utils package?
(In reply to comment #16) > (In reply to comment #15) > > > > Patch1030: linux-2.6-nfs-nosharecache.patch > > > > Commenting out this line and not the > > ApplyPatch linux-2.6-nfs-nosharecache.patch > > line probably leads to the patch still being applied. > > That was it. It was there in the instructions on the Fine Wiki Page, I just > missed it. > > So now amd will work as before if the kernel is rebuilt without the > nfs-nosharecache patch. Great. > > However, I'm sensing that this patch will be in mainline and fedora in the > future and that autofs is the way we should be going anyway. I'm happy to switch > my systems over to autofs. There's certainly a problem that needs to be solved but I don't think the patch in it's current form is entirely the right way to go. There's also a mount option that can be used to restore the previous behavior which was inadvertently not added to nfs-utils when the patch was included in the kernel. > > It would also be good if reloading of autofs maps could be implemented sometime. Perhaps, but I'm not an amd person, I'm the autofs person. Ian
Hello folks, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage The following commit is from 2.6.23-rc5 commit e89a5a43b95cdc4305b7c8e8121a380f02476636 Author: Trond Myklebust <Trond.Myklebust> Date: Fri Aug 31 10:45:17 2007 -0400 NFS: Fix the mount regression This avoids the recent NFS mount regression (returning EBUSY when mounting the same filesystem twice with different parameters). Please could you test this with the latest kernel from development and see if this fixes the problem for you. I am aware the patch mentioned above is still included in the current 2.6.22 kernel so updating to the latest Fedora 7 kernel is probably not a solution.
Hello, I just installed the test kernel 2.6.23 and am-utils is now working with two maps. Thanks a lot for the help!
(In reply to comment #19) > Hello, > > I just installed the test kernel 2.6.23 and am-utils is now working with two maps. > Thanks a lot for the help! Okay, thanks for the update. Closing then as this appears resolved.