Bug 250716 - amd (am-utils) not working with 2.6.22.1-41.fc7
Summary: amd (am-utils) not working with 2.6.22.1-41.fc7
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-08-03 06:40 UTC by Norman Gaywood
Modified: 2007-12-12 16:27 UTC (History)
8 users (show)

Fixed In Version: 2.6.23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-12-12 16:27:54 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
log of bad amd (1.54 KB, text/plain)
2007-08-03 07:01 UTC, Norman Gaywood
no flags Details
syslog messages with a bad amd startup - kernel 2.6.22.1-33.fc7 (3.49 KB, text/plain)
2007-08-06 02:05 UTC, Norman Gaywood
no flags Details
strace of a bad amd startup (144.19 KB, text/plain)
2007-08-10 04:44 UTC, Norman Gaywood
no flags Details
strace of a good amd startup (145.13 KB, text/plain)
2007-08-10 04:46 UTC, Norman Gaywood
no flags Details

Description Norman Gaywood 2007-08-03 06:40:06 UTC
Description of problem:

amd automounter fails to start all mount points properly with 2.6.22.1-41.fc7.
Tested on i686 and x86_64. Have tried udev-113-8.fc7 on i686 with no change.

amd works fine on both ARCH with 2.6.21-1.3194.fc7.

Version-Release number of selected component (if applicable):
2.6.22.1-41.fc7
udev-113-8.fc7
udev-106-4.1.fc7
am-utils-6.1.5-6.fc7


How reproducible:
Always with 2.6.22.1-41.fc7.
Works fine with 2.6.21-1.3194.fc7

Steps to Reproduce:
1. install am-utils and 2.6.22.1-41.fc7
2. boot kernel. start amd (with more than 2 mount points) after kernel booted or
from rc scripts.
3. Try amq to observe strange mount information.
  
Actual results: Only one automount point seems to be created properly.


Expected results: All mount points should be created.


Additional info:

Comment 1 Norman Gaywood 2007-08-03 07:01:50 UTC
Created attachment 160585 [details]
log of bad amd

Attached is a log with a simple configuration. My actual amd configuration is
quite complex. However the problem is reproducible with two simple /net and
/net2 mount points.

Copy the  default /etc/amd.net file to /etc/amd.net2 and add the following
lines on the bottom of your default /etc/amd.conf file:

[ /net2 ]
map_name =		amd.net2
map_type =		file

Comment 2 Norman Gaywood 2007-08-03 07:12:40 UTC
Using the same amd setup with 2.6.21-1.3194.fc7, the simple test in the above
attachment looks like:


alan ~ # amq
amq: localhost: RPC: Program not registered
alan ~ # service amd start
Starting amd: Aug  3 17:17:45 alan amd[2612]/info:  using configuration file
/etc/amd.conf
                                                           [  OK  ]
alan ~ # amq
/      root    "root"         
/net   toplvl  /etc/amd.net   /net
/net2  toplvl  /etc/amd.net2  /net2
alan ~ # uname -a
Linux alan.une.edu.au 2.6.21-1.3194.fc7 #1 SMP Wed May 23 22:35:01 EDT 2007 i686
i686 i386 GNU/Linux


Comment 3 Norman Gaywood 2007-08-03 07:26:12 UTC
An additional data point, my laptop has a 2.6.22.1-33.fc7 kernel and the above
simple amd configuration works, just like 2.6.21-1.3194.fc7

Comment 4 Norman Gaywood 2007-08-06 02:05:23 UTC
Created attachment 160723 [details]
syslog messages with a bad amd startup - kernel 2.6.22.1-33.fc7

The above messages show the output of amq after a good amd start. This is
output after a bad start:

[root@rig etc]# amq

amq: localhost: RPC: Program not registered

[root@rig etc]# service amd start

Starting amd: Aug  6 11:49:19 rig amd[3533]/info:  using configuration file
/etc/amd.conf

							   [  OK  ]

[root@rig etc]# amq

/      root    "root"	     

/net   error   .	     //nil//

/net2  toplvl  /etc/amd.net  /net2


If you wait a minute or two, the amq output changes to:
[root@rig etc]# amq

/      root    "root"	     

/net2  toplvl  /etc/amd.net  /net2

[root@rig etc]# 

Attached is the syslog output of amd during a bad startup.

Comment 5 Norman Gaywood 2007-08-06 06:37:02 UTC
(In reply to comment #3)
> An additional data point, my laptop has a 2.6.22.1-33.fc7 kernel and the above
> simple amd configuration works, just like 2.6.21-1.3194.fc7

Hm, I could be accused of smoking something! I can't reproduce a working amd on
2.6.22.1-33.fc7. I seems to fail now.

It still works on 2.6.21-1.3194.fc7

Comment 6 Norman Gaywood 2007-08-08 05:30:44 UTC
Out of interest I tried 2.6.23-0.74.rc2.git1.fc8 fresh out of Koji on my F7
laptop. It has the same am-utils problem as above. So in summary, so far I have:

2.6.21-1.3194.fc7 works on several i686 and a x86_64 that I have available.

On the same systems, 2.6.22.1-33.fc7, 2.6.22.1-41.fc7, am-utils does not work.

On a i686 laptop 2.6.23-0.74.rc2.git1.fc8, am-utils also does not work.

Would love some feedback here. Does any one else use am-utils in F7? Is this a
kernel bug or an am-utils bug? I have posted a message to the am-utils list but
no one replied there either. Is there any information anyone would like that I
have not provided? What do other people smoke?

Comment 7 Martin Simmons 2007-08-09 19:34:45 UTC
It's not just you -- I use am-utils on F7 and have hit the same bug.

Comment 8 Ian Kent 2007-08-10 03:05:45 UTC
(In reply to comment #6)
> Out of interest I tried 2.6.23-0.74.rc2.git1.fc8 fresh out of Koji on my F7
> laptop. It has the same am-utils problem as above. So in summary, so far I have:
> 
> 2.6.21-1.3194.fc7 works on several i686 and a x86_64 that I have available.
> 
> On the same systems, 2.6.22.1-33.fc7, 2.6.22.1-41.fc7, am-utils does not work.
> 
> On a i686 laptop 2.6.23-0.74.rc2.git1.fc8, am-utils also does not work.
> 
> Would love some feedback here. Does any one else use am-utils in F7? Is this a
> kernel bug or an am-utils bug? I have posted a message to the am-utils list but
> no one replied there either. Is there any information anyone would like that I
> have not provided? What do other people smoke?

I've not had much to do with amd at all so I don't know how
to collect debug information from it. Is there some way
to get messages with an increased logging into syslog?
Can we see some please?

I'm also not sure how amd actually works but I suspect it
makes multiple (internal) mounts into the same file system
and uses them to trigger automounts. If this is the case
you may be seeing a problem with a recent patch included
in the Fedora kernel. I know it's a hassle but, if possible,
try obtaining the kernel srpm, comment out the line
"Patch1030: linux-2.6-nfs-nosharecache.patch", try building
and installing it and see if the problem disappears.

Ian

Comment 9 Norman Gaywood 2007-08-10 04:42:02 UTC
(In reply to comment #8)
> I've not had much to do with amd at all so I don't know how
> to collect debug information from it. Is there some way
> to get messages with an increased logging into syslog?
> Can we see some please?

I'll attach some straces of a good and bad amd startup I did a few days ago.
That may be enough for you. The difference seems to be in the return from the
mount() sys call. The startup was not with the test setup above but with a more
complex set of maps.

I'll look into more syslog info.

> I'm also not sure how amd actually works but I suspect it
> makes multiple (internal) mounts into the same file system
> and uses them to trigger automounts. If this is the case
> you may be seeing a problem with a recent patch included
> in the Fedora kernel. I know it's a hassle but, if possible,
> try obtaining the kernel srpm, comment out the line
> "Patch1030: linux-2.6-nfs-nosharecache.patch", try building
> and installing it and see if the problem disappears.

This will take me a day or two.

Thanks for looking at this problem.

Comment 10 Norman Gaywood 2007-08-10 04:44:49 UTC
Created attachment 161037 [details]
strace of a bad amd startup

strace -f -o /tmp/amd-bad.strace /usr/sbin/amd -F /etc/amd.conf

Comment 11 Norman Gaywood 2007-08-10 04:46:50 UTC
Created attachment 161038 [details]
strace of a good amd startup

strace -f -o /tmp/amd-good.strace /usr/sbin/amd -F /etc/amd.conf

2.6.21-1.3194.fc7 #1 SMP Wed May 23 22:35:01 EDT 2007 i686 i686 i386 GNU/Linux

Comment 12 Ian Kent 2007-08-10 05:49:27 UTC
I seem to remember that amd uses NFS mounts to do its
automounting and the trace appears to confirm that.

The traces show mount returning EBUSY for the fail case
and 0 for the success case.

This may well be the issue I mentioned above as that patch
introduces an EBUSY return during mount if certain conditions
are met. Still there are other places where EBUSY is returned
from NFS but I don't see how they could happen during a mount.

So my recommendation remains the same.
Build a kernel without the patch above.
I don't think there are any other dependencies on it.

Ian


Comment 13 Jeff Moyer 2007-08-10 14:15:54 UTC
2751  <... mount resumed> )             = -1 EBUSY (Device or resource busy)
2743  <... mount resumed> )             = 0
2751  close(8 <unfinished ...>
2740  <... mount resumed> )             = -1 EBUSY (Device or resource busy)
2743  open("/etc/mtab", O_RDWR|O_CREAT, 0644 <unfinished ...>
2751  <... close resumed> )             = 0
2743  <... open resumed> )              = 9
2740  close(8 <unfinished ...>
2732  <... mount resumed> )             = -1 EBUSY (Device or resource busy)

This sure looks like the NFS nosharecache patch is the problem!  And here is the
snippet from the amd log provided above:

Aug  6 11:49:19 rig amd[3534]: creating mountpoint directory '/net2'
Aug  6 11:49:19 rig amd[3534]: creating mountpoint directory '/net'
Aug  6 11:49:19 rig amd[3534]: initializing amd.conf map /etc/amd.net of type file
Aug  6 11:49:19 rig amd[3534]: first time load of map /etc/amd.net succeeded
Aug  6 11:49:19 rig amd[3534]: /etc/amd.net mounted fstype toplvl on /net2
Aug  6 11:49:19 rig amd[3534]: /net2 set to never timeout
Aug  6 11:49:19 rig amd[3536]: '/net': mount: Device or resource busy

So it definitely looks like the second mount fails.  I'm pretty confident that
removing the nosharecache patch should resolve this problem.

Comment 14 Norman Gaywood 2007-08-15 06:55:18 UTC
Unless I've done something wrong with rebuilding the kernel, removing the
nosharecache patch did not allow me to start more than one nfs mount point.

I followed the bouncing ball on http://fedoraproject.org/wiki/Docs/CustomKernel
and got the kernel source rpm, edited the SPEC file:

diff kernel-2.6.spec kernel-2.6.spec-old
15c15
< %define buildid .amnfs
---
> #% define buildid .local
625c625
< #Patch1030: linux-2.6-nfs-nosharecache.patch
---
> Patch1030: linux-2.6-nfs-nosharecache.patch

rpmbuild -bb --with baseonly --without debuginfo --target=`uname -m` kernel-2.6.spec

rpm --oldpackage -hiv kernel-2.6.22.1-41.amnfs.fc7.i686.rpm

And still amd starts with only one mount point.

However, there is another work around.

If, in /etc/amd.conf, I put a:

mount_type = autofs

and amd will start all the mount point points (as autofs instead of nfs).

What can't be done now is to restart amd when some mount points are busy. The
amd release notes (README.autofs in /usr/share/doc/am-utils*) says:

- Implement the restarting of autofs mount points. This is already
  doable on Solaris; on Linux, the kernel needs to be patched to allow it.


Comment 15 Ian Kent 2007-08-15 07:52:58 UTC
(In reply to comment #14)
> Unless I've done something wrong with rebuilding the kernel, removing the
> nosharecache patch did not allow me to start more than one nfs mount point.

Maybe not, as I think I gave incomplete instructions above,
sorry.

> 
> I followed the bouncing ball on http://fedoraproject.org/wiki/Docs/CustomKernel
> and got the kernel source rpm, edited the SPEC file:
> 
> diff kernel-2.6.spec kernel-2.6.spec-old
> 15c15
> < %define buildid .amnfs
> ---
> > #% define buildid .local
> 625c625
> < #Patch1030: linux-2.6-nfs-nosharecache.patch
> ---
> > Patch1030: linux-2.6-nfs-nosharecache.patch

Commenting out this line and not the
ApplyPatch linux-2.6-nfs-nosharecache.patch
line probably leads to the patch still being applied.

If the old "%patch1030" form was used to apply the
patch the build would have failed.

Ian


Comment 16 Norman Gaywood 2007-08-15 11:12:01 UTC
(In reply to comment #15)
> > > Patch1030: linux-2.6-nfs-nosharecache.patch
> 
> Commenting out this line and not the
> ApplyPatch linux-2.6-nfs-nosharecache.patch
> line probably leads to the patch still being applied.
 
That was it. It was there in the instructions on the Fine Wiki Page, I just
missed it.

So now amd will work as before if the kernel is rebuilt without the
nfs-nosharecache patch.

However, I'm sensing that this patch will be in mainline and fedora in the
future and that autofs is the way we should be going anyway. I'm happy to switch
my systems over to autofs.

It would also be good if reloading of autofs maps could be implemented sometime.

Perhaps the fix for this bug is a change to the default /etc/amd.conf file in
the am-utils package?




Comment 17 Ian Kent 2007-08-15 12:35:33 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > > > Patch1030: linux-2.6-nfs-nosharecache.patch
> > 
> > Commenting out this line and not the
> > ApplyPatch linux-2.6-nfs-nosharecache.patch
> > line probably leads to the patch still being applied.
>  
> That was it. It was there in the instructions on the Fine Wiki Page, I just
> missed it.
> 
> So now amd will work as before if the kernel is rebuilt without the
> nfs-nosharecache patch.

Great.

> 
> However, I'm sensing that this patch will be in mainline and fedora in the
> future and that autofs is the way we should be going anyway. I'm happy to switch
> my systems over to autofs.

There's certainly a problem that needs to be solved
but I don't think the patch in it's current form is
entirely the right way to go.

There's also a mount option that can be used to restore
the previous behavior which was inadvertently not added
to nfs-utils when the patch was included in the kernel.

> 
> It would also be good if reloading of autofs maps could be implemented sometime.

Perhaps, but I'm not an amd person, I'm the autofs person.

Ian

Comment 18 Christopher Brown 2007-09-23 21:00:59 UTC
Hello folks,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

The following commit is from 2.6.23-rc5

commit e89a5a43b95cdc4305b7c8e8121a380f02476636
Author: Trond Myklebust <Trond.Myklebust>
Date:   Fri Aug 31 10:45:17 2007 -0400

    NFS: Fix the mount regression
    
    This avoids the recent NFS mount regression (returning EBUSY when
    mounting the same filesystem twice with different parameters).
    
Please could you test this with the latest kernel from development and see if
this fixes the problem for you. I am aware the patch mentioned above is still
included in the current 2.6.22 kernel so updating to the latest Fedora 7 kernel
is probably not a solution.

Comment 19 Martha 2007-10-19 21:47:17 UTC
Hello,

I just installed the test kernel 2.6.23 and am-utils is now working with two maps.
Thanks a lot for the help!


Comment 20 Christopher Brown 2007-12-12 16:27:54 UTC
(In reply to comment #19)
> Hello,
> 
> I just installed the test kernel 2.6.23 and am-utils is now working with two maps.
> Thanks a lot for the help!

Okay, thanks for the update. Closing then as this appears resolved.



Note You need to log in before you can comment on or make changes to this bug.