199977 – Autofs5 requested in RHEL4

Bug 199977 - Autofs5 requested in RHEL4

Summary: Autofs5 requested in RHEL4

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ian Kent
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:	notFC5
Depends On:	138363
Blocks:	198694 211071 231166 233547
TreeView+	depends on / blocked

Reported:	2006-07-24 18:26 UTC by Rod Nayfield
Modified:	2008-08-12 04:29 UTC (History)
CC List:	8 users (show)
Fixed In Version:	RHBA-2007-0304
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-05-08 02:54:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Connectathon test log from unpatched RHEL4 kernel. (43.23 KB, patch) 2006-10-30 12:23 UTC, Ian Kent	no flags	Details \| Diff
Connectathon test log from autofs v5 patched RHEL4 kernel. (43.23 KB, patch) 2006-10-30 12:25 UTC, Ian Kent	no flags	Details \| Diff
Connectathon test log from autofs-5.0.1-0.rc2.20 against v5 patched RHEL4 kernel. (41.71 KB, patch) 2006-10-30 12:51 UTC, Ian Kent	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0304	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5	2007-04-28 18:58:50 UTC

Comment 1 Rod Nayfield 2006-07-24 18:28:10 UTC

autofs5 direct map and other enhancements requested in RHEL4.

Comment 3 Jeff Moyer 2006-08-09 18:01:07 UTC

Ian and I think that this request should wait until autofs v5 is stabilized in
RHEL 5 before an attempt is made to backport it to RHEL 4.  We will evaluate
this request after RHEL 5 GA.

Comment 5 Ian Kent 2006-08-10 03:39:06 UTC

(In reply to comment #4)
> Jeff, this request is for RHEL 4.5 which is planned to be released after 5 GA.
> Waiting till 4.6 would mean waiting for roughly one year from now (unless we
> find a different release vehicle).
> 
> Perhaps we can keep it in the planning, do the work if feasible and decide about
> support vs. tech preview later?

Fact is that I've done some of the work on porting the patches already.
I took the oppertunity to check to see how much effort would be
needed when preparing a patch for another issue. This went quite
well but there will quite a bit of time needed to verify correctness
and to test functionality.

I'll put a little effort into this as time permits so the patches
will be ready in "short order" when the time comes.

But I must stress that I'm having some difficulty resolving issues
exposed by running the Connectathon tests. Until autofs performs
solidly under these stress conditions these issues are the priority.

When I have something, what can we do for testing, keeping in mind
that this task is low priority?

Ian

Comment 9 Rod Nayfield 2006-08-10 18:56:08 UTC

(In reply to comment #7)
>> Does autofs4 pass the connectathon tests?
>No.

Is there a BZ open on this?

Comment 10 Ian Kent 2006-08-10 18:59:53 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > Does autofs4 pass the connectathon tests?
>
> No.

v4, no it doesn't.
v5 basically does but that was the easy bit.

>
> > Lehman is very concerned about stability here.  They are willing to go to
> > autofs5 to get failover r/o mounts but their biggest driver is stability /
> > reliable restartability.

I'm focused on this now using the Connectathon test bed to stress
autofs and I'm finding quite a bit to fix. It's very difficult but
it's very useful in terms of improving stability and reliability
of autofs. I'll be stressing it in as many ways as I can think of.

>
> Autofs v5 does not give them failover of readonly mounts.  If their concern is
> stability, then I'd question making such a large change in their production
> environment.  Autofs v5 is a quite new and unproven code base.  Autofs v5 does
> not guarantee /reliable/ restartability yet, either (Ian, correct me if I'm
wrong).

Basically yes.

v5 will allow you to shutdown and startup the daemon when mounts
are active but we don't have evidence of the possible effects of
this in a production setting yet. The method used at the moment is
to unlink umount active mounts that have been left at shutdown and
mount the autofs mounts. This actually appears to work well.

There are a couple of concerns:

1) applications are not able to access new mounts during the restart
   time window. There is basically no way around this as when
   there is no daemon to answer mount requests they simply can't
   happen. Existing open files on mounts left at shutdown remain
   until not in use and new requests go to the new mounts as expected.

2) There is a finite window between when the filesystem is unlink
   umounted and when the new filesystem is mounted. Similar to above
   but the daemon is running. Ideally we would add a remount option
   to the kernel module and this window would disappear but I haven't
   yet worked out how to do this in the userspace daemon. Basically I
   can't get hold of what I need when the NFS (or other) mount is
   atop the autofs mount.
3) As Jeff has pointed out recently we don't know the affects of
   unlink umounting on file operations such as locking but this
   should be ok.

So when autofs has finished the test cycle we are going to have to
give it some very specific testing to clearify the impact of these
possible problem areas.

I'd like to think that v5 will be more stable and reliable than v4
from the outset but the fact is that a lot of work has been done on
this in v4 and there's a lot of new code in v5.

Ian

Comment 12 Jeff Moyer 2006-08-10 19:11:15 UTC

(In reply to comment #9)
> (In reply to comment #7)
> >> Does autofs4 pass the connectathon tests?
> >No.
> 
> Is there a BZ open on this?

No.  The connectathon test suite does not give just one PASS/FAIL result.  It is
a set of tests that stresses different parts of the automounter.  Much of this
has to do with the parsing of maps.  As the parser has been historically
fragile, I've been hesitant to make changes to it as it can have unforseen ill
effects on real customer installations.

The fact of the matter is that we don't 100% comply with the Sun map format.  We
don't claim to (this is something that is hopefully being addressed in v5). 
Changing v4 to be compliant *will* break existing installations.  It's simply
not a good idea to fix things just so the tests pass.

Comment 13 Ian Kent 2006-08-10 19:24:16 UTC

(In reply to comment #8)
> Lehman doesn't care about direct map support short term.  They have already
> created two distinct maps worlds within lehman.  They would like to get back to
> one for solaris and linux, but that can't happen for a year or so anyway, so
> it's a soft desire.
>
> The most important piece is stability.  They have issues restarting autofs (big
> regression from autofs3 to autofs4 here).  They also have issues with program
> maps failing (suddenly autofs starts passing the entire path not the
> mountpoint).  If a daemon is hung in the "D" state they need to be able to have
> a new one start up.  Of course if autofs never failed they would not be so
> concerned about how to handle failures.

I wish!

The issue regarding the paths is a bad one.

It could still happen in v5 if a mount(8) fails but returns a success
code back to the daemon. So we may have to remove the "sloppy" mount
option to make sure bad mounts do return a fail to guard against this.
Vitually all the NFS options that are needed are now supported by
Linux mount so the "sloppy" mount option may not be so crucial any
more.

The daemon getting into a "D" state is much more of a worry and I'd
really like to get more information on it. I've heard about it form
time to time over the years but noone has been willing to help gather
information on it so I can try and resolve it. To be honest I thought
it had gone away at some point with bug fixes made to the 2.6 kernel
module. So, more info please.

>
> They are very interested in having failover work (not just at initial mount
> time, but at server failure).  When did our plans change?  I have email from Ian
> on 6/14 saying that he was working on it...

Indeed I did.

And I have put time into it and I believe I can implement it but this
is an NFS issue and autofs proper has the priority. When I'm happy
with the stability of autofs I'll be able to spend more time on it.

I must point out that while I'm familiar with the NFS code I've not
spend time on enhancements before so it will be a fairly slow process.

Very sorry.

>
> So if the failover is not in the plans for autofs5 I will close this ticket as
> all Lehman wants from what we're giving is reliablity in autofs4.  I can then
> open a new ticket against autofs4 for each issue they have.

It's on the plans but it's not possible for autofs to provide this
function. It has to be done in the NFS client kernel module.

Once again, sorry, I wish I could get through this stuff more quickly
so I could do this but that's really almost always the way.

Ian

Comment 14 Daniel Riek 2006-08-14 23:05:06 UTC

Downgrading priority to "high" in order to better reflect priorities which are:
1) make autofs5 stable in RHEL5
2) investigate a backport to RHEL4

Comment 15 Ian Kent 2006-09-04 11:28:52 UTC

(In reply to comment #14)
> Downgrading priority to "high" in order to better reflect priorities which are:
> 1) make autofs5 stable in RHEL5
> 2) investigate a backport to RHEL4

I now have a set of kernel patches for REHL4 2.6.9-42.2.EL.
As yet untested.

Ian

Comment 16 Ian Kent 2006-10-30 12:20:14 UTC

(In reply to comment #15)
> (In reply to comment #14)
> > Downgrading priority to "high" in order to better reflect priorities which are:
> > 1) make autofs5 stable in RHEL5
> > 2) investigate a backport to RHEL4
> 
> I now have a set of kernel patches for REHL4 2.6.9-42.2.EL.
> As yet untested.

I've updated my RHEL4 kernel (2.6.9-42.20.EL) and run the
connectathon tests with autofs-4.1.4-197 against the standard
and patched kernels.

The results where the same for both which is an indication
that adding the autofs version 5 kernel patches won't introduce
regressions for autofs version 4.

More testing needs to be done with a patched kernel against
version 4 in an actual usage environment before we can really
be confident that we won't be introducing regressions though.

Any thoughts as to who would be able to help with this testing?

Ian

Comment 17 Ian Kent 2006-10-30 12:23:32 UTC

Created attachment 139707 [details]
Connectathon test log from unpatched RHEL4 kernel.

Comment 18 Ian Kent 2006-10-30 12:25:26 UTC

Created attachment 139708 [details]
Connectathon test log from autofs v5 patched RHEL4 kernel.

Comment 19 Ian Kent 2006-10-30 12:51:47 UTC

Created attachment 139711 [details]
Connectathon test log from autofs-5.0.1-0.rc2.20 against v5 patched RHEL4 kernel.

These test results are as expected from autofs version 5.

Comment 21 Ian Kent 2006-11-09 11:16:57 UTC

(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > Downgrading priority to "high" in order to better reflect priorities which
are:
> > > 1) make autofs5 stable in RHEL5
> > > 2) investigate a backport to RHEL4
> > 
> > I now have a set of kernel patches for REHL4 2.6.9-42.2.EL.
> > As yet untested.
> 
> I've updated my RHEL4 kernel (2.6.9-42.20.EL) and run the
> connectathon tests with autofs-4.1.4-197 against the standard
> and patched kernels.
> 
> The results where the same for both which is an indication
> that adding the autofs version 5 kernel patches won't introduce
> regressions for autofs version 4.
> 
> More testing needs to be done with a patched kernel against
> version 4 in an actual usage environment before we can really
> be confident that we won't be introducing regressions though.
> 

Hi all,

I have created a CVS private kernel branch for this and built
a test kernel into dust-4E-scratch. I've tested against autofs
version 4 and 5 all appears fine.

So, initially we need those interested in testing version 5 to
install this kernel and verify that their test machines still
function as expected. While this is done I will put together a
test plan. For the impatient who wish to install autofs version
5 prior to receiving the test plan I recommend
autofs-5.0.1-0.rc2.23 or above.

Feedback is welcome.

Ian

Comment 26 Ian Kent 2007-01-16 02:15:48 UTC

(In reply to comment #25)
> Ian, the 4.5 beta kernel has been built.
> Why is this bugzilla still in POST?

Sorry. I didn't merge the patches but I did check the merge.
Setting to MODIFIED.

Ian

Comment 28 Mike Gahagan 2007-02-28 22:58:59 UTC

It doesn't look like autofs 5 has made it into 4.5 at least as of 49.EL. I see
autofs 4 and another autofs directory in the kernel source which appears to be
the old autofs which we do not build.

Comment 29 Ian Kent 2007-03-01 02:09:01 UTC

(In reply to comment #28)
> It doesn't look like autofs 5 has made it into 4.5 at least as of 49.EL. I see
> autofs 4 and another autofs directory in the kernel source which appears to be
> the old autofs which we do not build. 

Your mistaken.

The kernel module "autofs4" supports "autofs kernel protocols
3,4 and 5", not to be confused with application version 4 or 5.

The reason it was done this way is, first there are one
too many autofs modules in the kernel already. I think the
autofs4 module should be renamed to autofs and the existing
autofs module removed. While I'd like to start pushing that
I'm still reluctant to do so because of the potential for
disruption and, until recently, there was a long standing
unresolved bug. I'll have to wait for a while and see if
that known problem has gone away, as the first question
will almost certainly be about stability.

Ian

Comment 31 Tom Coughlan 2007-03-05 19:33:25 UTC

Ian, 

I see that the draft release notes for RHEL 4.5 do not mention Autofs v5. Would
you kindly draft something for the release notes that describes what is new in
RHEL 4.5 autofs, anything users should be aware of, and answers the questions
about nomenclature and version numbers mentioned above.

You can post a draft here. I've set requires_release_note=? to get Don involved. 

Tom

Comment 36 Red Hat Bugzilla 2007-05-08 02:54:38 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.