Bug 219383

Summary: /net /auto.net stopped working after update from beta1 to beta2
Product: Red Hat Enterprise Linux 5 Reporter: Jeff Needle <jneedle>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: jmoyer
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: autofs-5.0.1-rc2.39 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-05 12:36:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Debug log
none
Fix nonstrict multi-mount failure handling none

Comment 2 RHEL Program Management 2006-12-12 21:20:31 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 5 Ian Kent 2006-12-13 03:58:49 UTC
(In reply to comment #4)
> Address is 10.12.32.46, which is covered by 10.12.0.0/15.

Sure but this is the first mount that's attempted

>> mount: bigpapi.boston.redhat.com:/vol/engineering failed, reason given by
server: Permission denied

> > /vol/engineering          172.16.76.11

Which I fails due to access and is removed because of this
when using the "-hosts" map.

Since the "nonstrict" option is the default it should continue
regardless of the fail but it looks like the whole mount request
thread just goes away. Strange, but as I say, we've seen this before
and it was a problem with mount so I want to check that out before
going further.

Ian


Comment 6 Jeff Needle 2006-12-13 15:03:13 UTC
These are the versions I have:

util-linux-2.13-0.43.el5
nfs-utils-1.0.9-10.el5
nfs-utils-lib-1.0.8-7.2

These are the versions from the latest build.

util-linux-2.13-0.43.2.el5.x86_64.rpm
nfs-utils-1.0.9-14.el5.x86_64.rpm
nfs-utils-lib-1.0.8-7.2.i386.rpm

So I updated nfs-utils and util-linux and still see the same behavior.  See
debug log with latest packages.

Comment 9 Jeff Needle 2006-12-13 21:13:31 UTC
No core dumps.

Comment 10 Ian Kent 2006-12-14 09:58:15 UTC
(In reply to comment #9)
> No core dumps.

Thought so.

Another big ask.
Could you get a tehereal dump of a failed mount attempt
please.

Ian


Comment 13 Ian Kent 2006-12-15 08:45:46 UTC
Unfortunately I found nothing obvious in the packet traces
that you posted.

Before I start pulling my hair out over this could you update
your autofs to a recent snapshot please and see if that resolves
the issue. autofs-5.0.1-0.rc.30 is present in the 12/7 build.

Ian


Comment 14 Jeff Needle 2006-12-19 16:07:40 UTC
I updated all packages on the machine to 20061218.1 and the problematic behavior
persists.

Comment 15 Ian Kent 2006-12-19 17:15:35 UTC
(In reply to comment #14)
> I updated all packages on the machine to 20061218.1 and the problematic behavior
> persists.

I've been able to duplicate this behaviour.
I do get all the log messages so I don't think it is a problem
with mount any more.

Not having access to the export /vol/engineering is preventing
the "on denand" mounting of the export tree from continuing
further down.

This doesn't happen with the "-hosts" map because it prunes
exports for which the system doesn't have access when it first
reads the export list.

I'm not sure this can be overcome without compromising our
goal of multi-vendor environment compatibility.

Let me think about this for a while and experiment.

Ian

Comment 16 Ian Kent 2006-12-19 17:18:52 UTC
(In reply to comment #15)
> 
> I'm not sure this can be overcome without compromising our
> goal of multi-vendor environment compatibility.
> 
> Let me think about this for a while and experiment.

Of course I could enhance the example program map, auto.net,
to take account of exports the sytstem doesn't have access.
Could be a bit messy but possible.

Ian


Comment 17 Jeff Moyer 2006-12-20 22:36:55 UTC
Jeff, I seem to recall you saying that autofs *did* work with auto.net prior to
the most recent update you performed.  Is that right?  If so, this is a
regression from previous behaviour.  If not, we should probably just put this
behind us, as it seems that program maps in general still function as designed.
 As Ian mentions, this may simply be a bug in the auto.net script.

Comment 18 Jeff Needle 2006-12-20 23:30:59 UTC
This was definitely new behavior that I saw after updating systems from beta1 to
beta2.  Can't blame it on the nfs server changing because I did the yum update
using /net/, then after the yum update an ls of the directory I used to do the
update from failed.  Not sure exactly when between beta1 and beta2 the behavior
was introduced.  I can try to reinstall a b1 system if there would be value in that.

Comment 19 Jeff Moyer 2006-12-20 23:46:19 UTC
No need to reinstall the entire system.  Autofs, util-linux, and nfs-utils are
the three packages worth downgrading.  I'd probably start with autofs, and then
move util-linux and nfs-utils in lock-step.

If this is easy for you to do, then by all means give it a try.  If not, I'll
install a system with Beta1 and see how things go.

Comment 21 Ian Kent 2006-12-21 10:16:05 UTC
Here are my findings on investigating this problem.

The cause of the change in behaviour is because I broke the
way autofs handles mount fails for multi-mounts along the way.
This made multi-mounts like the entry returned from auto.net
stop as soon as a mount failed where previously it continued.

The other thing that I discovered is that I believe the expire
in this specific case (failure of a mount corresponding to a
nesting point) has never worked and noone (including me) has
noticed it.

So I have a patch.

It appears to function correctly and there aren't any regressions
identified testing the many different cases of our test suite.

I'll do a little more testing as well.

Ian

Comment 22 Ian Kent 2006-12-21 10:19:55 UTC
Created attachment 144173 [details]
Fix nonstrict multi-mount failure handling

It would be good if you could apply this to the source
rpm and test it JeffN.

I'll try to come up with variations to test also.

Ian

Comment 23 RHEL Program Management 2006-12-21 11:05:41 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 25 Jeff Needle 2006-12-21 12:58:41 UTC
I can confirm that after applying this patch, using auto.net behaves as
expected.  Let me know when there's an official package and I'll give it a run.

Comment 26 Ian Kent 2006-12-21 13:39:34 UTC
(In reply to comment #25)
> I can confirm that after applying this patch, using auto.net behaves as
> expected.  Let me know when there's an official package and I'll give it a run.

OK.

I'll try a little more testing and then add it to CVS assuming
all goes as expected.
Thanks
Ian

Comment 27 Ian Kent 2006-12-27 04:53:37 UTC
The patch included here has been added to CVS.
It is in autofs-5.0.1-rc2.39.