800204 – cron.hourly job finds no automounted file

Bug 800204 - cron.hourly job finds no automounted file

Summary: cron.hourly job finds no automounted file

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	autofs
Sub Component:
Version:	16
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Ian Kent
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-05 23:36 UTC by Peter Glassenbury
Modified:	2013-02-13 14:50 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-02-13 14:50:16 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Peter Glassenbury 2012-03-05 23:36:23 UTC

Description of problem:
We have a cron.hourly file
#!/bin/bash
exec /netfs/share/bin/changes-f16

We have a couple of hundred machines that will run this command at the same time. The directory /netfs/share/bin is an automounted directory from a RedHat
server (Linux hostname 2.6.18-274.18.1.el5 #1 SMP Fri Jan 20 15:11:18 EST 2012 x86_64 GNU/Linux)

most of the time it works. Occasionally we get from the Cron daemon a mail message....
/etc/cron.hourly/changes:
/etc/cron.hourly/changes: line 2: /netfs/share/bin/changes-f16: No such file or directory

This is an annoying bug ( that has been in F16 and F14 and possibly earlier)
but we have been unable to tie down where the issue is or what.

Version-Release number of selected component (if applicable):
We update nightly so all versions of whatever components you think it might be in F14 and F16.

How reproducible:
Not very ... it happens often (daily) but on different machines and different hours... And it doesn't usually happen to the same machine (the next hour)

Steps to Reproduce:
1.
2.
3.

Actual results:
/etc/cron.hourly/changes:
/etc/cron.hourly/changes: line 2: /netfs/share/bin/changes-f16: No such file or directory

Expected results:
automount the directory and run the script.

Additional info:
We saw it happen on one machine and as quick as possible logged in (root that doesn't mount those directories) but the share was there and the command could be run by hand. Have put a "cd /netfs/share/bin; sleep 1; /netfs/share/bin/changes-f16" but this had no effect. currently trying a random sleep in case 200 machines doing an automount all at once means some "miss out".

Comment 1 Marcela Mašláňová 2012-03-06 11:53:33 UTC

If I understand correctly, you have one RHEL-5 server and ~200 clients on Fedora-16. 
It is possible that you have small outage, so cron can't access the script in the particular minute.

Cronie can't be blamed, because it's only run scripts nothing more. It's more probable that it's a bug of automount. Could you provide versions of clients and server autofs (or whatever is taking care of it)?

Comment 2 Ian Kent 2012-03-06 12:26:18 UTC

(In reply to comment #0)
> Additional info:
> We saw it happen on one machine and as quick as possible logged in (root that
> doesn't mount those directories)  but the share was there and the command could
> be run by hand. Have put a "cd /netfs/share/bin; sleep 1;
> /netfs/share/bin/changes-f16" but this had no effect. currently trying a random
> sleep in case 200 machines doing an automount all at once means some "miss
> out".

It's quite possible the server isn't responding quickly enough.

Pick a machine (or even a few if you are willing) and enable debug
logging on it and wait until you get a failure then post the log.
That should at least tell us what's happening from the daemons
POV, getting kernel info is another story.

To enable debug logging you need to ensure that syslog is sending
daemon.* to a log file (it doesn't by default) and also set
LOGGING="debug" in /etc/sysconfig/autofs.

Comment 3 Ian Kent 2012-03-06 12:29:03 UTC

(In reply to comment #2)
> 
> It's quite possible the server isn't responding quickly enough.

Or the server isn't available at that particular time.

But neither of these possibilities are likely if you're
not doing maintenance, or shouldn't be anyway.

Comment 4 Peter Glassenbury 2012-03-07 01:27:25 UTC

The small outage could be nfsd not answering autofs in time??
We have our machines synced with NTP so time accurate..and a cron job running
at exactly on the hour so they were all running at the same time...
Server and client and ethernet etc remain going... so it isn't a
"hardware unavailable" reason. Ian and Marcela comments seem to suggest this should move to an autofs issue and I'm quite happy with that..."cron" was a best guess because that is where we see the error messages because we get the cron mail messages. In normal interactive usage we haven't had it reported..

Are there ways of extending timeouts in autofs of nfsd before it reports "disk not found"??

I am loath to put the debugging on yet -- of the 24 hourly runs, over 200 odd machines, we would on average get one or two doing it. I would have to put all machines into debugging for a day or two to be able to be sure of getting one... Is this going to be reasonable -- or generate too much debugging per machine? If it isn't going to fill the disk in that time, I can change all of them.

versions below
RHEL 5 -
Name : nfs-utils
Arch : x86_64
Epoch : 1
Version : 1.0.9
Release : 54.el5

Fedora 16 --
Name : autofs
Arch : x86_64
Epoch : 1
Version : 5.0.6
Release : 5.fc16

Fedora 14 --
Name : autofs Relocations: (not relocatable)
Version : 5.0.5 Vendor: Fedora Project
Release : 31.fc14 Build Date: Fri 27 Aug 2010 16:03:37 NZST
Build Host: x86-04.phx2.fedoraproject.org
Group : System Environment/Daemons Source RPM: autofs-5.0.5-31.fc14.src.rpm

Comment 5 Tomas Mraz 2012-03-07 08:15:42 UTC

My guess based on the F14->F16 changes is whether it might somehow be related to the cgroups support that is provided by systemd when it is starting the individual daemons (autofs and crond) in separate cgroups.

Comment 6 Fedora End Of Life 2013-01-16 13:54:36 UTC

This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Fedora End Of Life 2013-02-13 14:50:18 UTC

Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.