Hide Forgot
Description of problem: We have a cron.hourly file #!/bin/bash exec /netfs/share/bin/changes-f16 We have a couple of hundred machines that will run this command at the same time. The directory /netfs/share/bin is an automounted directory from a RedHat server (Linux hostname 2.6.18-274.18.1.el5 #1 SMP Fri Jan 20 15:11:18 EST 2012 x86_64 GNU/Linux) most of the time it works. Occasionally we get from the Cron daemon a mail message.... /etc/cron.hourly/changes: /etc/cron.hourly/changes: line 2: /netfs/share/bin/changes-f16: No such file or directory This is an annoying bug ( that has been in F16 and F14 and possibly earlier) but we have been unable to tie down where the issue is or what. Version-Release number of selected component (if applicable): We update nightly so all versions of whatever components you think it might be in F14 and F16. How reproducible: Not very ... it happens often (daily) but on different machines and different hours... And it doesn't usually happen to the same machine (the next hour) Steps to Reproduce: 1. 2. 3. Actual results: /etc/cron.hourly/changes: /etc/cron.hourly/changes: line 2: /netfs/share/bin/changes-f16: No such file or directory Expected results: automount the directory and run the script. Additional info: We saw it happen on one machine and as quick as possible logged in (root that doesn't mount those directories) but the share was there and the command could be run by hand. Have put a "cd /netfs/share/bin; sleep 1; /netfs/share/bin/changes-f16" but this had no effect. currently trying a random sleep in case 200 machines doing an automount all at once means some "miss out".
If I understand correctly, you have one RHEL-5 server and ~200 clients on Fedora-16. It is possible that you have small outage, so cron can't access the script in the particular minute. Cronie can't be blamed, because it's only run scripts nothing more. It's more probable that it's a bug of automount. Could you provide versions of clients and server autofs (or whatever is taking care of it)?
(In reply to comment #0) > Additional info: > We saw it happen on one machine and as quick as possible logged in (root that > doesn't mount those directories) but the share was there and the command could > be run by hand. Have put a "cd /netfs/share/bin; sleep 1; > /netfs/share/bin/changes-f16" but this had no effect. currently trying a random > sleep in case 200 machines doing an automount all at once means some "miss > out". It's quite possible the server isn't responding quickly enough. Pick a machine (or even a few if you are willing) and enable debug logging on it and wait until you get a failure then post the log. That should at least tell us what's happening from the daemons POV, getting kernel info is another story. To enable debug logging you need to ensure that syslog is sending daemon.* to a log file (it doesn't by default) and also set LOGGING="debug" in /etc/sysconfig/autofs.
(In reply to comment #2) > > It's quite possible the server isn't responding quickly enough. Or the server isn't available at that particular time. But neither of these possibilities are likely if you're not doing maintenance, or shouldn't be anyway.
The small outage could be nfsd not answering autofs in time?? We have our machines synced with NTP so time accurate..and a cron job running at exactly on the hour so they were all running at the same time... Server and client and ethernet etc remain going... so it isn't a "hardware unavailable" reason. Ian and Marcela comments seem to suggest this should move to an autofs issue and I'm quite happy with that..."cron" was a best guess because that is where we see the error messages because we get the cron mail messages. In normal interactive usage we haven't had it reported.. Are there ways of extending timeouts in autofs of nfsd before it reports "disk not found"?? I am loath to put the debugging on yet -- of the 24 hourly runs, over 200 odd machines, we would on average get one or two doing it. I would have to put all machines into debugging for a day or two to be able to be sure of getting one... Is this going to be reasonable -- or generate too much debugging per machine? If it isn't going to fill the disk in that time, I can change all of them. versions below RHEL 5 - Name : nfs-utils Arch : x86_64 Epoch : 1 Version : 1.0.9 Release : 54.el5 Fedora 16 -- Name : autofs Arch : x86_64 Epoch : 1 Version : 5.0.6 Release : 5.fc16 Fedora 14 -- Name : autofs Relocations: (not relocatable) Version : 5.0.5 Vendor: Fedora Project Release : 31.fc14 Build Date: Fri 27 Aug 2010 16:03:37 NZST Build Host: x86-04.phx2.fedoraproject.org Group : System Environment/Daemons Source RPM: autofs-5.0.5-31.fc14.src.rpm
My guess based on the F14->F16 changes is whether it might somehow be related to the cgroups support that is provided by systemd when it is starting the individual daemons (autofs and crond) in separate cgroups.
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.