Bug 808019

Summary: Condors MOUNT_UNDER_SCRATCH and autofs do no combine
Product: [Fedora] Fedora Reporter: Bert DeKnuydt <Bert.Deknuydt>
Component: condorAssignee: Timothy St. Clair <tstclair>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 16CC: bbockelm, ikent, jehan.procaccia, matt, steved, tomspur, tstclair
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-31 17:22:05 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
/proc/self/mountinfo on one of the compute nodes none

Description Bert DeKnuydt 2012-03-29 07:07:04 EDT
Description of problem:

When MOUNT_UNDER_SCRATCH is used in one of condor's configuration 
values, directories to be automounted to run the user's job, do not
automount, but error out with a 'Too many levels of symbolic links'.

Version-Release number of selected component (if applicable):

condor-7.7.5-0.2.fc16.x86_64
autofs-5.0.6-5.fc16.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Set MOUNT_UNDER_SCRATCH to any existing directory. Do a condor_reconfig
2. Run a job, that needs to automount (over NFS here) anything apart from
   your home directory.
3. Save the job's error output
  
Actual results:

In the job's error log:
execvp: Too many levels of symbolic links 

Expected results:

The job should simply be run.

Additional info:

* It does not matter what directory is specified as MOUNT_UNDER_SCRATCH

* The directory _is_ actually mounted, but new mounts are not seen by
  the user's job.  Automounted directories that were mounted previously
  to job creation are visible.
  (So users report you have to run your job twice before it succeeds).
Comment 1 Brian Bockelman 2012-03-29 09:15:55 EDT
Hi Bert,

Thanks for reporting the bug.

Unfortunately, I think this is an issue with autofs.

The MOUNT_UNDER_SCRATCH feature works by creating a separate mount namespace for each job, then bind mounting the sysadmin-specified directories (such as /tmp or /var/tmp) within the namespace.  For more info, see http://osgtech.blogspot.com/2012/02/file-isolation-using-bind-mounts-and.html.  Call the system's namespace A and the job's namespace B.

Autofs appears to work by having a stub mount of type autofs; when it sees filesystem activity inside its mount, it replaces the stub mount with the real mount.

The problem is that autofs sees activity in namespace B, but mounts the filesystem in namespace A - invisible to the job.

There are two ways to fix this (that I can think of, at least):
1) Autofs can mark its mounts as "shared sub-tree".  That means changes in namespace A automatically show up in namespace B.
2) Autofs can use the new "setns" call (available in Linux kernel 3.0) to temporarily associate itself with namespace B and do the mount only in namespace B.

However, I don't know enough about the autofs to speculate which approach would be doable upstream.

Matt - meta question: do you know how to CC the autofs maintainers on this ticket?

Brian
Comment 2 Ian Kent 2012-04-26 04:54:36 EDT
(In reply to comment #1)
> Hi Bert,
> 
> Thanks for reporting the bug.
> 
> Unfortunately, I think this is an issue with autofs.
> 
> The MOUNT_UNDER_SCRATCH feature works by creating a separate mount namespace
> for each job, then bind mounting the sysadmin-specified directories (such as
> /tmp or /var/tmp) within the namespace.  For more info, see
> http://osgtech.blogspot.com/2012/02/file-isolation-using-bind-mounts-and.html. 
> Call the system's namespace A and the job's namespace B.
> 
> Autofs appears to work by having a stub mount of type autofs; when it sees
> filesystem activity inside its mount, it replaces the stub mount with the real
> mount.

Actually mounts over the top of the trigger.

> 
> The problem is that autofs sees activity in namespace B, but mounts the
> filesystem in namespace A - invisible to the job.
> 
> There are two ways to fix this (that I can think of, at least):
> 1) Autofs can mark its mounts as "shared sub-tree".  That means changes in
> namespace A automatically show up in namespace B.

Not sure about that, I'll have to look around and see if I
can evaluate the implications of that.

> 2) Autofs can use the new "setns" call (available in Linux kernel 3.0) to
> temporarily associate itself with namespace B and do the mount only in
> namespace B.

Sounds good but my initial impression is that autofs is not
aware of the different namespace when it needs to be.
Again I'll need to look around.

> 
> However, I don't know enough about the autofs to speculate which approach would
> be doable upstream.

The question of namespace support within autofs is actually
pretty difficult AFAICT from previous interactions with folks
involved with the implementation.

What's really needed is for me to consult with one of the
namespace code maintainers, but I'm not sure who that should
be.

Ian
Comment 3 Brian Bockelman 2012-04-26 08:27:31 EDT
If the autofs folks are against namespace support (either due to a difficult implementation, or a technical disagreement), I think Condor could itself mark the mount directory as shared-subtree.  Perhaps it's inevitable in order to properly support our local RHEL6 boxes.

We already have to parse the relevant files as "/" itself is a shared-subtree in F16, and we want to have a per-job "/tmp".  The approach is pretty distasteful as I am mucking around with mounts that don't belong to us, but I think should actually work.
Comment 4 Brian Bockelman 2012-04-26 21:48:07 EDT
A ticket has been filed upstream:

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2959

Patch is attached there.
Comment 5 Ian Kent 2012-04-27 04:28:15 EDT
(In reply to comment #4)
> A ticket has been filed upstream:
> 
> https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2959
> 
> Patch is attached there.

Are you sure this will work around the problem?

I thought the shared mount attribute was propagated to child
mounts from the parent mount and systemd marks "/" as shared
already IIRC.
Comment 6 Brian Bockelman 2012-04-27 07:31:01 EDT
I was able to reproduce (not out of the box though - I had to manually make the autofs mounts non-shared), and this patch fixed it.

Your comment about "/" being shared already is correct (on RHEL6 and prior Fedoras, this is not true).

As mentioned on the Condor ticket, I also noticed a buildsys issue that should have made the detection of "/" as shared broken - meaning, out-of-the-box on F16, the MOUNT_UNDER_SCRATCH feature should have been broken.

So, I'm now curious about how the original bug reporter got their box into this state.  It would be nice to see /proc/self/mountinfo from Bert's box.
Comment 7 Ian Kent 2012-04-27 07:50:00 EDT
(In reply to comment #6)
> I was able to reproduce (not out of the box though - I had to manually make the
> autofs mounts non-shared), and this patch fixed it.
> 
> Your comment about "/" being shared already is correct (on RHEL6 and prior
> Fedoras, this is not true).

That's true, and I had to make a change to F16 autofs to make
autofs work with the shared subtree root filesystem, which is
not present in RHEL-6. So, keep that in mind too.
Comment 8 Bert DeKnuydt 2012-04-27 08:08:14 EDT
Created attachment 580747 [details]
/proc/self/mountinfo on one of the compute nodes

This is as seen from outside all things condor.
Comment 9 Brian Bockelman 2012-04-27 08:27:37 EDT
Hi Bert,

Did you do anything special to make "/" not shared?  From your attachment, it seems we have diagnosed the bug correctly.

I think we're going to do a new build for F16/F17/rawhide today; I'll try to make sure this patch gets reviewed and added.

Brian
Comment 10 Bert DeKnuydt 2012-04-27 09:57:50 EDT
Hi Brian,

Nope, nothing special there, neither /etc/fstab nor automount configs.  Might be
a side effect of something, but what?

Bert.
Comment 11 Fedora Update System 2012-04-27 17:09:35 EDT
condor-7.9.0-0.1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/condor-7.9.0-0.1.fc16
Comment 12 Fedora Update System 2012-04-27 17:10:38 EDT
condor-7.9.0-0.1.fc17.2 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/condor-7.9.0-0.1.fc17.2
Comment 13 Fedora Update System 2012-04-27 19:35:46 EDT
Package condor-7.9.0-0.1.fc17.2:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing condor-7.9.0-0.1.fc17.2'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-6880/condor-7.9.0-0.1.fc17.2
then log in and leave karma (feedback).
Comment 14 Bert DeKnuydt 2012-05-02 06:25:35 EDT
Hi all,

I just tried condor-7.9.0-0.1.fc16 from updates-testing. The results are nearly same as with
condor-7.7.5-0.2.fc16:

(stderr)
/users/visics/deknuydt/condor/testscript.sh: line 3: cd: /software/matlab/2012a/bin: Too many levels of symbolic links
ls: cannot access /software/matlab/2012a/bin/matlab: Too many levels of symbolic links

Which is the outcome of a simple:
> cd  /software/matlab/2012a/bin

From inside the condor job, no automounts, except those that were already available
Filesystem                               1K-blocks       Used Available Use% Mounted on
rootfs                                     8362320    2402964   5539928  31% /
[...]
asgard:/visics/homedirs/visics/deknuydt 2113787904 1124507648 881906688  57% /users/visics/deknuydt
/dev/mapper/vg_yingchang-LogVol02        698257668   10453424 652855536   2% /tmp
/dev/mapper/vg_yingchang-LogVol02        698257668   10453424 652855536   2% /var/tmp

From outside the condor job, you see the thing properly mounted:

> df
Filesystem                               1K-blocks       Used Available Use% Mounted on
rootfs                                     8362320    2402960   5539932  31% /
[...]
tmpfs                                      3108924          0   3108924   0% /tmp
tmpfs                                      3108924          0   3108924   0% /var/tmp
[...]
asgard:/visics/homedirs/visics/deknuydt 2113787904 1124228096 882186240  57% /users/visics/deknuydt
bocq:/softw/matlab                       264224768  253955072   7584768  98% /software/matlab

Do you get any wiser with the outputs of /proc/self/mountinfo?
Comment 15 Brian Bockelman 2012-05-10 20:14:10 EDT
Hi Bert,

I see what's wrong.  Look at my latest update here:

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2959,0

I fat-fingered the creation of a patch from my git branch, causing the fix to be #ifdef'd out.

Working on it.

Brian
Comment 16 Timothy St. Clair 2012-05-11 09:04:58 EDT
Updated, will be in next spin.
Comment 17 Bert DeKnuydt 2012-06-28 07:46:07 EDT
Any movement to be expected here?  Fedora 1{6,7} are still at 7.7.5,
while more recent stuff is out upstream ...
Comment 18 Fedora Update System 2012-08-16 17:37:46 EDT
condor-7.9.1-0.1.fc17.2 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/condor-7.9.1-0.1.fc17.2
Comment 19 Fedora Update System 2012-08-17 21:27:50 EDT
Package condor-7.9.1-0.1.fc17.2:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing condor-7.9.1-0.1.fc17.2'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-12127/condor-7.9.1-0.1.fc17.2
then log in and leave karma (feedback).
Comment 20 Fedora Update System 2012-08-31 17:22:05 EDT
condor-7.9.1-0.1.fc17.2 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 21 procaccia 2012-10-06 05:08:28 EDT
I have a similar problem "Too many levels of symbolic links" (not using condor !) 
cf https://bugzilla.redhat.com/show_bug.cgi?id=833535

in comment #7 above,  Ian kent said
"I had to make a change to F16 autofs to make
autofs work with the shared subtree root filesystem"

I would like to try that, 
how do you configure autofs to work with shared subtree !?

Thanks .
Comment 22 Ian Kent 2012-10-09 21:34:26 EDT
(In reply to comment #21)
> I have a similar problem "Too many levels of symbolic links" (not using
> condor !) 
> cf https://bugzilla.redhat.com/show_bug.cgi?id=833535
> 
> in comment #7 above,  Ian kent said
> "I had to make a change to F16 autofs to make
> autofs work with the shared subtree root filesystem"
> 
> I would like to try that, 
> how do you configure autofs to work with shared subtree !?

Use autofs-5.0.6-3 or later.
Comment 23 procaccia 2012-10-10 06:34:53 EDT
I am already on : 
autofs-5.0.6-22.fc17.i686 , kernel-PAE-3.5.4-1.fc17.i686 , fedora17

but maybe there's an option, variable or whatever to set (in /etc/sysconfig/autofs ?) to force the use of shared subtree ?
how can I check if I use shared subtree ?

Thanks .
Comment 24 Ian Kent 2012-10-10 08:37:40 EDT
(In reply to comment #23)
> I am already on : 
> autofs-5.0.6-22.fc17.i686 , kernel-PAE-3.5.4-1.fc17.i686 , fedora17
> 
> but maybe there's an option, variable or whatever to set (in
> /etc/sysconfig/autofs ?) to force the use of shared subtree ?
> how can I check if I use shared subtree ?

There isn't, the issue was that systemd sets "/" as shared and
that caused a problem with the construction of mount trees.

When the containing file system is shared moving the mount
(tree) to the final location after construction wouldn't work.
But this mechanism isn't needed with the vfs-automount
intrastrucure so it was removed, actually just disabled in
fc17.

Just what is causing the problem your seeing is puzzling. No
matter how I look at it, it has to mean that there's a call
to perform a mount that returns success but no mount is actually
present.

I haven't ruled out the possiblity this is reated in some way
to the fact that "/" is shared. If you don't rely on this you
could "mount --make-private /", stop autofs and make sure there's
no autofs stuff mounted, then start it and see if the problem
remains.
Comment 25 Ian Kent 2012-10-10 08:42:29 EDT
I might also add I have a patch that is yet to be posted that
solves a problem with recurring mount requests but I can't see
how that would cause this either, but maybe.

I'll get to posting that soon as I can and then see about a
test kernel.

Probably the better place to continue this is bug 833535
rather than hijacking this bug.