After using a Fedora 34 workstation with automounted filesystems for 1-2 days, new mounts become unresponsive with autofs running at high CPU. The syslog shows this soon after the last successful mount: May 28 09:37:10 clpc80 automount[202813]: do_init: parse(sun): init gathered global options: nosuid,intr May 28 09:37:10 clpc80 automount[202813]: read_one_map: map read not needed, so not done May 28 09:37:10 clpc80 automount[202813]: lookup_prune_one_cache: removed map entry detected, mark negative May 28 09:37:10 clpc80 automount[202813]: lookup_prune_one_cache: removed map entry detected, mark negative May 28 09:37:10 clpc80 automount[202813]: lookup_prune_one_cache: prune posponed, /auto/users/imc mounted May 28 09:37:10 clpc80 automount[202813]: lookup_prune_one_cache: removed map entry detected, mark negative May 28 09:37:10 clpc80 automount[202813]: lookup_prune_one_cache: prune posponed, /auto/users/imc mounted May 28 09:37:10 clpc80 automount[202813]: lookup_prune_one_cache: removed map entry detected, mark negative with the last two entries repeated ad infinitum. The thread that is looping appears to be at: #3 log_debug (logopt=<optimized out>, msg=msg@entry=0x560a9e10f058 "%s: removed map entry detected, mark negative") at /usr/src/debug/autofs-5.1.7-12.fc34.x86_64/lib/log.c:262 #4 0x0000560a9e109719 in lookup_prune_one_cache (ap=ap@entry=0x560a9fb82100, mc=0x560a9fb8f550, age=age@entry=233729) at /usr/src/debug/autofs-5.1.7-12.fc34.x86_64/daemon/lookup.c:1406 #5 0x0000560a9e109843 in lookup_prune_cache (ap=0x560a9fb82100, age=233729) at /usr/src/debug/autofs-5.1.7-12.fc34.x86_64/daemon/lookup.c:1504 #6 0x0000560a9e10a52f in do_readmap (arg=0x7fd5cc000dc0) at /usr/src/debug/autofs-5.1.7-12.fc34.x86_64/daemon/state.c:462 #7 0x00007fd5f8e0d299 in start_thread () from /lib64/libpthread.so.0 and the thread is in the loop which starts at lookup.c:1335 "while (me) {" and is hitting the "continue" at line 1419 without ever changing the value of "me".
Arrgh, yes, I moved a statement and didn't properly account for it. I'll sort that out straight away, thanks for reporting it.
Btw, what does you map look like, does it have a wildcard entry by any chance?
The maps (including auto.master) are managed by IPA and there is no wildcard entry that I am aware of. So each entry has an explicit key and info. $ ipa automountkey-find cs auto-users -------------------------- 656 automount keys matched -------------------------- ... Key: imc Mount information: nfs-server:/homes/staff/imc ...
BTW on a tangentially related topic, there's a spelling error in "posponed" in the debug log exhibited above. :-)
(In reply to Ian Collier from comment #4) > BTW on a tangentially related topic, there's a spelling error in "posponed" > in the debug log exhibited above. :-) Thanks, I'll fix that while I'm there. At the moment I'm trying to write a simple reproducer script. It's easiest to reproduce using a wildcard map entry. It looks like maps that have explicit <key, location> entries would need a map entry removal (or a memory allocation failure) for the loop behavior to trigger.
Hmm, so there might be another bug somewhere. Although we do remove map entries from time to time, that hasn't happened in the last couple of days as far as I am aware. Could a map entry be considered stale if the system had been left idle for several hours and then resumed activity? Let me know if there's something I should look at next time it crashes.
Or would it be considered stale if any one entry in the map changes? Because we are certainly doing that at the moment on a few accounts (not necessarily ones that are currently mounted, though).
(In reply to Ian Collier from comment #7) > Or would it be considered stale if any one entry in the map changes? Because > we are certainly doing that at the moment on a few accounts (not necessarily > ones that are currently mounted, though). I'm still thinking about it, essentially what happens is the map entries have a time stamp that is set when the map is read. The cache prune that occurs after reading the map should pass over entries with a time stamp that matches the one used for the current map read. Also entries younger than 2 minutes are passed over (I've considered making that 2 minutes configurable so people can tune cache behavior, but haven't done so yet). Since entries that have been created due to a wild card map entry aren't actually present in the map a wildcard map is the easiest way to reproduce the problem. An attempt to prune them will be made on a map read once they are older than 2 minutes. If they have an active mount on them they are left alone. For now I'm most concerned with fixing that looping behavior.
(In reply to Ian Kent from comment #8) > (In reply to Ian Collier from comment #7) > > Or would it be considered stale if any one entry in the map changes? Because > > we are certainly doing that at the moment on a few accounts (not necessarily > > ones that are currently mounted, though). > > I'm still thinking about it, essentially what happens is the map entries > have a time stamp that is set when the map is read. The cache prune that > occurs after reading the map should pass over entries with a time stamp > that matches the one used for the current map read. Also entries younger > than 2 minutes are passed over (I've considered making that 2 minutes > configurable so people can tune cache behavior, but haven't done so yet). > > Since entries that have been created due to a wild card map entry aren't > actually present in the map a wildcard map is the easiest way to reproduce > the problem. An attempt to prune them will be made on a map read once they > are older than 2 minutes. If they have an active mount on them they are > left alone. > > For now I'm most concerned with fixing that looping behavior. I've had a look around and there's a lot more to consider when re-loading the map. So, for now, I'm just going to fix the regression introduced by commit 256963d6b (autofs-5.1.7 - refactor lookup_prune_one_cache() a bit) and look at the map re-read code later.
FEDORA-2021-99a8333987 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-99a8333987
FEDORA-2021-99a8333987 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-99a8333987` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-99a8333987 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-99a8333987 has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report.