Description of problem: I have moved house and in the new house the internal network is different (no NFS available yet). I have a Nautilus bookmark for one of the old NFS mounts. This seems to be causing automount to flail around looking for the old mount, taking up most of the CPU time (according to top) as a result. Version-Release number of selected component (if applicable): autofs-5.0.2-12 How reproducible: Every time Steps to Reproduce: 1. Login via GDM 2. 3. Actual results: automount process hogs the CPU Expected results: Additional info: I haven't made any alterations to /etc/auto.master or any other files related to this.
(In reply to comment #0) > Description of problem: > I have moved house and in the new house the internal network is different (no > NFS available yet). I have a Nautilus bookmark for one of the old NFS mounts. > This seems to be causing automount to flail around looking for the old mount, > taking up most of the CPU time (according to top) as a result. Or causing mount to wait. Is the mount you refer to here configured in your autofs maps? > > Version-Release number of selected component (if applicable): > > autofs-5.0.2-12 How long have you been using this revision? Did the problem occur after your move without any update having been done? > > How reproducible: > Every time > > Steps to Reproduce: > 1. Login via GDM > 2. > 3. I do that every day myself but I don't have this problem. I'll need more information about your configuration to reproduce the problem. All your maps to begin with please, even if they are the default configuration. > > Actual results: > > automount process hogs the CPU > > Expected results: > > > Additional info: > > I haven't made any alterations to /etc/auto.master or any other files related to > this. Alterations to the default installed configuration or your existing configuration? Ian
No, it's not referred to in anything other than the Nautilus bookmark. Only just updated to this revision of autofs. The problem did occur before that (e.g. at the weekend). Didn't mean to be facetious with the 'steps to reproduce' thing... just trying to be comprehensive. I'll attach the relevant files. This laptop used to run F7 but I wiped all but /home and put rawhide on it. Nothing related to this has been altered.
Created attachment 178861 [details] auto.master
Created attachment 178881 [details] auto.net
Created attachment 178901 [details] auto.misc
Created attachment 178921 [details] auto.smb
>Didn't mean to be facetious with the 'steps to reproduce' thing... just trying >to be comprehensive. And I didn't mean to be sarcastic, sorry couldn't help it. So it's totally the default and no version of autofs was previously installed? What was the bookmark? I'll probably have to install an new instance of Rawhide with a similar bookmark to try and duplicate this. But first it might be useful to get a debug log from autofs in case there's something obvious in it. See http://people.redhat.com/jmoyer for info. on getting one. Ian
The bookmark is: file:///net/bloch/maxtor250/adam/...etc. - it goes down several subdirectories Getting debug output now, which I'll attach separately.
Created attachment 178981 [details] /var/log/debug output
(In reply to comment #8) > The bookmark is: > > file:///net/bloch/maxtor250/adam/...etc. > > - it goes down several subdirectories > > Getting debug output now, which I'll attach separately. OK, I'll set up something like that. Ian
(In reply to comment #9) > Created an attachment (id=178981) [edit] > /var/log/debug output > Was this taken when the problem happened? I can't even see a request coming to the daemon. Hopefully I'll find out when I start trying to duplicate the problem. Ian
Yes, I restarted autofs once I'd activated the debug log and immediately 'automount' went to the top of the 'top' chart.
(In reply to comment #12) > Yes, I restarted autofs once I'd activated the debug log and immediately > 'automount' went to the top of the 'top' chart. That's bizar. The only thing that I'm aware of in recent times that could cause this behavior was a kernel change which I'm not to clear on except that it caused pthread condition wait to spin instead of wait. But I thought I fixed that in the daemon and that it was fixed in the kernel. If you have a chance could you get an strace of this happening. Sorry to keep coming back and asking for stuff. Ian
This is what happens when I try to strace the CPU-hogging automount process: [root@vaio ~]# strace -p 2425 -f Process 2433 attached with 5 threads - interrupt to quit [pid 2425] rt_sigtimedwait(~[CHLD CONT RTMIN RT_1], then the process seems to lose priority until I exit strace. If I need to run with other options, let me know.
(In reply to comment #14) > This is what happens when I try to strace the CPU-hogging automount process: > > [root@vaio ~]# strace -p 2425 -f > Process 2433 attached with 5 threads - interrupt to quit > [pid 2425] rt_sigtimedwait(~[CHLD CONT RTMIN RT_1], > > then the process seems to lose priority until I exit strace. > > If I need to run with other options, let me know. Something like "strace -tf -p <pid> -otraced.txt" should get what I need. Ian
(In reply to comment #15) > (In reply to comment #14) > > This is what happens when I try to strace the CPU-hogging automount process: > > > > [root@vaio ~]# strace -p 2425 -f > > Process 2433 attached with 5 threads - interrupt to quit > > [pid 2425] rt_sigtimedwait(~[CHLD CONT RTMIN RT_1], > > > > then the process seems to lose priority until I exit strace. > > > > If I need to run with other options, let me know. > > Something like "strace -tf -p <pid> -otraced.txt" should get > what I need. Oh be careful as if automount is spinning on the condition waits then there could be a lot of output so maybe 5-10 seconds is plenty. Ian
Aaah .. I'm having trouble duplicating this. I've updated to the latest Rawhide and setup a bookmark, and shut it down, shutdown the server and started up. Can't see any problem. In fact Nautilus doesn't show the bookmark when the server is down. Perhaps I haven't created the bookmark deep enough, I'll keep trying. In the meantime what kernel version are you using? Ian
*** Bug 279331 has been marked as a duplicate of this bug. ***
I'm reproducing this consistently on a fresh install of rawhide x86_64 without any NFS at all.
strace -tf -p 2503 -otraced.txt for 10 seconds... 2503 19:16:10 rt_sigtimedwait(~[CHLD CONT RTMIN RT_1], <unfinished ...> kernel-2.6.23-0.164.rc5.fc8 autofs-5.0.2-14
(In reply to comment #20) > strace -tf -p 2503 -otraced.txt for 10 seconds... > 2503 19:16:10 rt_sigtimedwait(~[CHLD CONT RTMIN RT_1], <unfinished ...> > > kernel-2.6.23-0.164.rc5.fc8 > autofs-5.0.2-14 Mmm .. I've been trying to duplicate this without success so far.
It seems to happen with regularity on my laptop prior to a successful connection to the Internet.
(In reply to comment #22) > It seems to happen with regularity on my laptop prior to a successful connection > to the Internet. OK .. I can try that. Do you mean "no network available" or "local network available but no outgoing connection"? Problem is that as far as I know so far the daemon doesn't actually get any mount requests. This apparently happens at startup. A complete debug log might tell me more about what's going on and when. See Jeffs http://people.redhat.com/jmoyer in case you need any info. on what needs to be done? Ian
I've been away at a conference since the weekend. While there I had to disable the autofs service to stop this... Now I'm back and it's happening again. The odd thing is that automounting itself does seem to be working, but I'm still seeing the high CPU usage. Current versions in use: kernel-2.6.23-0.164.rc5.fc8 autofs-5.0.2-14
Could someone install the autofs-debuginfo package and when automount is taking gobs of CPU run "gdb -p <automount pid> /usr/sbin/automount" and do a backtrace (bt) command and post the output here please. If that's not possible could someone do a "top -H -p <automount pid>" and post the output. At least tht will give me some idea of the thread in which this is happening. Ian
#0 0x00002aaaaaf0a2a8 in do_sigwait () from /lib64/libpthread.so.0 #1 0x00002aaaaaf0a34d in sigwait () from /lib64/libpthread.so.0 #2 0x00002aaaaaab2d1d in statemachine (arg=<value optimized out>) at automount.c:1052 #3 0x00002aaaaaab3c4e in main (argc=1, argv=<value optimized out>) at automount.c:1695
There were some errors using gdb though, even though I have the debuginfo RPM installed: warning: Missing the separate debug info file: /usr/lib/debug/.build-id/09/8a6785767cca8ae88be4ff876487b6eb6db111.debug done. [Thread debugging using libthread_db enabled] [New Thread 46912510610176 (LWP 5729)] Error while reading shared library symbols: Can't attach LWP 5737 (No child processes) - possible SELinux denial, check your /var/log/messages for `avc: denied'
Also, after running automount through gdb, it's still running and running 'service autofs stop' fails. There are several sets of these messages in /var/log/debug: Sep 8 13:55:48 vaio automount[5729]: do_notify_state: signal 15 Sep 8 13:55:48 vaio automount[5729]: master_notify_state_change: sig 15 switching /misc from 1 to 5 Sep 8 13:55:48 vaio automount[5729]: master_notify_state_change: sig 15 switching /net from 1 to 5 Sep 8 13:55:51 vaio automount[5729]: do_notify_state: signal 15 Sep 8 13:55:51 vaio automount[5729]: master_notify_state_change: sig 15 switching /misc from 1 to 5 Sep 8 13:55:51 vaio automount[5729]: master_notify_state_change: sig 15 switching /net from 1 to 5
(In reply to comment #26) > #0 0x00002aaaaaf0a2a8 in do_sigwait () from /lib64/libpthread.so.0 > #1 0x00002aaaaaf0a34d in sigwait () from /lib64/libpthread.so.0 > #2 0x00002aaaaaab2d1d in statemachine (arg=<value optimized out>) at > automount.c:1052 > #3 0x00002aaaaaab3c4e in main (argc=1, argv=<value optimized out>) at > automount.c:1695 > This is what it should be. This would indicate that pthread sigwait is not waiting for signals and just spinning. That doesn't sound right as I'd expect may other applications to have the same problem. Ian
(In reply to comment #27) > There were some errors using gdb though, even though I have the debuginfo RPM > installed: > warning: Missing the separate debug info file: > /usr/lib/debug/.build-id/09/8a6785767cca8ae88be4ff876487b6eb6db111.debug > done. > [Thread debugging using libthread_db enabled] > [New Thread 46912510610176 (LWP 5729)] > Error while reading shared library symbols: > Can't attach LWP 5737 (No child processes) - possible SELinux denial, check your > /var/log/messages for `avc: denied' You mean just attaching to the automount process? Did you try setting selinux to permissive mode first? Ian
Figured out what's going on here... thanks to Jeff Moyer and Chuck Ebbert. - Run 2.6.23-0.170.rc5.git1.fc8 with clocksource=hpet, where Bug #262481 causes the system time to run forward rapidly. - autofs-5.0.2/daemon/state.c:1038 while (list_empty(head)) { status = pthread_cond_timedwait(&cond, &mutex, &wait); pthread_cond_timedwait() is timing out immediately because the clock is jumping forward rapidly on x86_64 because of a bug where values above 32bit in the DSO are not masked. Temporary Workaround ==================== echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource Fix in Bug #262481 ================== cebbert is building the kernel fix into rawhide kernel 171+.\ Test Case ========= jmoyer is writing a simple test-case based upon this finding so that we may diagnose this sooner if it happens again.
Adam, could you please verify the above workaround and/or test a newer kernel? If this diagnosis is correct, then automount is actually behaving properly, it cannot help that the kernel is giving it bad time. At least we got a test case out of this.
Was only able to find 170 kernel online, so I've just tried that with the workaround. That does prevent the automount process from running away as it did before. Thanks for persevering with this.
Based on the date this bug was created, it appears to have been reported during the development of Fedora 8. In order to refocus our efforts as a project we are changing the version of this bug to '8'. If this bug still exists in rawhide, please change the version back to rawhide. (If you're unable to change the bug's version, add a comment to the bug and someone will change it for you.) Thanks for your help and we apologize for the interruption. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.