Bug 262701 - automount taking up lots of CPU time
Summary: automount taking up lots of CPU time
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: autofs
Version: 8
Hardware: x86_64
OS: All
medium
medium
Target Milestone: ---
Assignee: Ian Kent
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: bzcl34nup
: 279331 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-08-29 08:00 UTC by Adam Huffman
Modified: 2009-01-09 07:13 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-09 07:13:43 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
auto.master (501 bytes, text/plain)
2007-08-29 12:14 UTC, Adam Huffman
no flags Details
auto.net (1.21 KB, text/plain)
2007-08-29 12:18 UTC, Adam Huffman
no flags Details
auto.misc (524 bytes, text/plain)
2007-08-29 12:19 UTC, Adam Huffman
no flags Details
auto.smb (660 bytes, text/plain)
2007-08-29 12:20 UTC, Adam Huffman
no flags Details
/var/log/debug output (12.91 KB, text/plain)
2007-08-29 13:12 UTC, Adam Huffman
no flags Details

Description Adam Huffman 2007-08-29 08:00:33 UTC
Description of problem:
I have moved house and in the new house the internal network is different (no
NFS available yet).  I have a Nautilus bookmark for one of the old NFS mounts. 
This seems to be causing automount to flail around looking for the old mount,
taking up most of the CPU time (according to top) as a result.

Version-Release number of selected component (if applicable):

autofs-5.0.2-12

How reproducible:
Every time

Steps to Reproduce:
1. Login via GDM
2.
3.
  
Actual results:

automount process hogs the CPU

Expected results:


Additional info:

I haven't made any alterations to /etc/auto.master or any other files related to
this.

Comment 1 Ian Kent 2007-08-29 10:16:39 UTC
(In reply to comment #0)
> Description of problem:
> I have moved house and in the new house the internal network is different (no
> NFS available yet).  I have a Nautilus bookmark for one of the old NFS mounts. 
> This seems to be causing automount to flail around looking for the old mount,
> taking up most of the CPU time (according to top) as a result.

Or causing mount to wait.
Is the mount you refer to here configured in your
autofs maps?

> 
> Version-Release number of selected component (if applicable):
> 
> autofs-5.0.2-12

How long have you been using this revision?
Did the problem occur after your move without any
update having been done?

> 
> How reproducible:
> Every time
> 
> Steps to Reproduce:
> 1. Login via GDM
> 2.
> 3.

I do that every day myself but I don't have this
problem. I'll need more information about your
configuration to reproduce the problem. All your
maps to begin with please, even if they are the
default configuration.

>   
> Actual results:
> 
> automount process hogs the CPU
> 
> Expected results:
> 
> 
> Additional info:
> 
> I haven't made any alterations to /etc/auto.master or any other files related to
> this.

Alterations to the default installed configuration or
your existing configuration?

Ian


Comment 2 Adam Huffman 2007-08-29 12:10:55 UTC
No, it's not referred to in anything other than the Nautilus bookmark.

Only just updated to this revision of autofs.  The problem did occur before that
(e.g. at the weekend).

Didn't mean to be facetious with the 'steps to reproduce' thing... just trying
to be comprehensive.

I'll attach the relevant files.

This laptop used to run F7 but I wiped all but /home and put rawhide on it. 
Nothing related to this has been altered.

Comment 3 Adam Huffman 2007-08-29 12:14:41 UTC
Created attachment 178861 [details]
auto.master

Comment 4 Adam Huffman 2007-08-29 12:18:33 UTC
Created attachment 178881 [details]
auto.net

Comment 5 Adam Huffman 2007-08-29 12:19:23 UTC
Created attachment 178901 [details]
auto.misc

Comment 6 Adam Huffman 2007-08-29 12:20:06 UTC
Created attachment 178921 [details]
auto.smb

Comment 7 Ian Kent 2007-08-29 12:51:40 UTC
>Didn't mean to be facetious with the 'steps to reproduce' thing... just trying
>to be comprehensive.

And I didn't mean to be sarcastic, sorry couldn't help
it.

So it's totally the default and no version of autofs
was previously installed?

What was the bookmark?

I'll probably have to install an new instance of
Rawhide with a similar bookmark to try and duplicate
this. But first it might be useful to get a debug log
from autofs in case there's something obvious in it.
See http://people.redhat.com/jmoyer for info. on getting
one.

Ian


Comment 8 Adam Huffman 2007-08-29 13:05:56 UTC
The bookmark is:

file:///net/bloch/maxtor250/adam/...etc.

 - it goes down several subdirectories

Getting debug output now, which I'll attach separately.

Comment 9 Adam Huffman 2007-08-29 13:12:53 UTC
Created attachment 178981 [details]
/var/log/debug output

Comment 10 Ian Kent 2007-08-29 13:53:48 UTC
(In reply to comment #8)
> The bookmark is:
> 
> file:///net/bloch/maxtor250/adam/...etc.
> 
>  - it goes down several subdirectories
> 
> Getting debug output now, which I'll attach separately.

OK, I'll set up something like that.

Ian


Comment 11 Ian Kent 2007-08-29 13:55:59 UTC
(In reply to comment #9)
> Created an attachment (id=178981) [edit]
> /var/log/debug output
> 

Was this taken when the problem happened?
I can't even see a request coming to the daemon.
Hopefully I'll find out when I start trying to duplicate
the problem.

Ian


Comment 12 Adam Huffman 2007-08-29 14:59:36 UTC
Yes, I restarted autofs once I'd activated the debug log and immediately
'automount' went to the top of the 'top' chart.

Comment 13 Ian Kent 2007-08-29 15:37:08 UTC
(In reply to comment #12)
> Yes, I restarted autofs once I'd activated the debug log and immediately
> 'automount' went to the top of the 'top' chart.

That's bizar.
The only thing that I'm aware of in recent times that
could cause this behavior was a kernel change which I'm
not to clear on except that it caused pthread condition
wait to spin instead of wait. But I thought I fixed
that in the daemon and that it was fixed in the kernel.

If you have a chance could you get an strace of this
happening. Sorry to keep coming back and asking for
stuff.

Ian

Comment 14 Adam Huffman 2007-08-29 21:18:38 UTC
This is what happens when I try to strace the CPU-hogging automount process:

[root@vaio ~]# strace -p 2425 -f
Process 2433 attached with 5 threads - interrupt to quit
[pid  2425] rt_sigtimedwait(~[CHLD CONT RTMIN RT_1],

then the process seems to lose priority until I exit strace.

If I need to run with other options, let me know.

Comment 15 Ian Kent 2007-08-30 01:57:02 UTC
(In reply to comment #14)
> This is what happens when I try to strace the CPU-hogging automount process:
> 
> [root@vaio ~]# strace -p 2425 -f
> Process 2433 attached with 5 threads - interrupt to quit
> [pid  2425] rt_sigtimedwait(~[CHLD CONT RTMIN RT_1],
> 
> then the process seems to lose priority until I exit strace.
> 
> If I need to run with other options, let me know.

Something like "strace -tf -p <pid> -otraced.txt" should get
what I need.

Ian



Comment 16 Ian Kent 2007-08-30 01:59:36 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > This is what happens when I try to strace the CPU-hogging automount process:
> > 
> > [root@vaio ~]# strace -p 2425 -f
> > Process 2433 attached with 5 threads - interrupt to quit
> > [pid  2425] rt_sigtimedwait(~[CHLD CONT RTMIN RT_1],
> > 
> > then the process seems to lose priority until I exit strace.
> > 
> > If I need to run with other options, let me know.
> 
> Something like "strace -tf -p <pid> -otraced.txt" should get
> what I need.

Oh be careful as if automount is spinning on the condition waits
then there could be a lot of output so maybe 5-10 seconds is
plenty.

Ian


Comment 17 Ian Kent 2007-09-04 04:36:04 UTC
Aaah .. I'm having trouble duplicating this.

I've updated to the latest Rawhide and setup a bookmark,
and shut it down, shutdown the server and started up.
Can't see any problem. In fact Nautilus doesn't show
the bookmark when the server is down. Perhaps I haven't
created the bookmark deep enough, I'll keep trying.

In the meantime what kernel version are you using?

Ian


 

Comment 18 Jeff Moyer 2007-09-05 20:28:28 UTC
*** Bug 279331 has been marked as a duplicate of this bug. ***

Comment 19 Warren Togami 2007-09-05 21:06:02 UTC
I'm reproducing this consistently on a fresh install of rawhide x86_64 without
any NFS at all.

Comment 20 Warren Togami 2007-09-05 21:10:52 UTC
strace -tf -p 2503 -otraced.txt for 10 seconds...
2503  19:16:10 rt_sigtimedwait(~[CHLD CONT RTMIN RT_1],  <unfinished ...>

kernel-2.6.23-0.164.rc5.fc8
autofs-5.0.2-14

Comment 21 Ian Kent 2007-09-06 02:52:38 UTC
(In reply to comment #20)
> strace -tf -p 2503 -otraced.txt for 10 seconds...
> 2503  19:16:10 rt_sigtimedwait(~[CHLD CONT RTMIN RT_1],  <unfinished ...>
> 
> kernel-2.6.23-0.164.rc5.fc8
> autofs-5.0.2-14

Mmm .. I've been trying to duplicate this without success
so far.




Comment 22 Warren Togami 2007-09-06 15:06:19 UTC
It seems to happen with regularity on my laptop prior to a successful connection
to the Internet.

Comment 23 Ian Kent 2007-09-06 15:56:59 UTC
(In reply to comment #22)
> It seems to happen with regularity on my laptop prior to a successful connection
> to the Internet.

OK .. I can try that.
Do you mean "no network available" or "local network
available but no outgoing connection"?

Problem is that as far as I know so far the daemon
doesn't actually get any mount requests. This apparently
happens at startup. A complete debug log might tell me
more about what's going on and when. See Jeffs
http://people.redhat.com/jmoyer in case you need any
info. on what needs to be done?

Ian


Comment 24 Adam Huffman 2007-09-06 18:49:55 UTC
I've been away at a conference since the weekend.  While there I had to disable
the autofs service to stop this...

Now I'm back and it's happening again.  The odd thing is that automounting
itself does seem to be working, but I'm still seeing the high CPU usage.

Current versions in use:

kernel-2.6.23-0.164.rc5.fc8
autofs-5.0.2-14

Comment 25 Ian Kent 2007-09-07 04:39:07 UTC
Could someone install the autofs-debuginfo package
and when automount is taking gobs of CPU run
"gdb -p <automount pid> /usr/sbin/automount"
and do a backtrace (bt) command and post the
output here please.

If that's not possible could someone do a
"top -H -p <automount pid>" and post the output.
At least tht will give me some idea of the thread
in which this is happening.

Ian

Comment 26 Adam Huffman 2007-09-08 12:47:47 UTC
#0  0x00002aaaaaf0a2a8 in do_sigwait () from /lib64/libpthread.so.0
#1  0x00002aaaaaf0a34d in sigwait () from /lib64/libpthread.so.0
#2  0x00002aaaaaab2d1d in statemachine (arg=<value optimized out>) at
automount.c:1052
#3  0x00002aaaaaab3c4e in main (argc=1, argv=<value optimized out>) at
automount.c:1695


Comment 27 Adam Huffman 2007-09-08 12:51:08 UTC
There were some errors using gdb though, even though I have the debuginfo RPM
installed:
warning: Missing the separate debug info file:
/usr/lib/debug/.build-id/09/8a6785767cca8ae88be4ff876487b6eb6db111.debug
done.
[Thread debugging using libthread_db enabled]
[New Thread 46912510610176 (LWP 5729)]
Error while reading shared library symbols:
Can't attach LWP 5737 (No child processes) - possible SELinux denial, check your
/var/log/messages for `avc: denied'



Comment 28 Adam Huffman 2007-09-08 12:56:56 UTC
Also, after running automount through gdb, it's still running and running
'service autofs stop' fails.  There are several sets of these messages in
/var/log/debug:

Sep  8 13:55:48 vaio automount[5729]: do_notify_state: signal 15
Sep  8 13:55:48 vaio automount[5729]: master_notify_state_change: sig 15
switching /misc from 1 to 5
Sep  8 13:55:48 vaio automount[5729]: master_notify_state_change: sig 15
switching /net from 1 to 5
Sep  8 13:55:51 vaio automount[5729]: do_notify_state: signal 15
Sep  8 13:55:51 vaio automount[5729]: master_notify_state_change: sig 15
switching /misc from 1 to 5
Sep  8 13:55:51 vaio automount[5729]: master_notify_state_change: sig 15
switching /net from 1 to 5


Comment 29 Ian Kent 2007-09-10 03:10:45 UTC
(In reply to comment #26)
> #0  0x00002aaaaaf0a2a8 in do_sigwait () from /lib64/libpthread.so.0
> #1  0x00002aaaaaf0a34d in sigwait () from /lib64/libpthread.so.0
> #2  0x00002aaaaaab2d1d in statemachine (arg=<value optimized out>) at
> automount.c:1052
> #3  0x00002aaaaaab3c4e in main (argc=1, argv=<value optimized out>) at
> automount.c:1695
> 

This is what it should be.
This would indicate that pthread sigwait is not waiting for
signals and just spinning. That doesn't sound right as I'd
expect may other applications to have the same problem.

Ian



Comment 30 Ian Kent 2007-09-10 03:14:45 UTC
(In reply to comment #27)
> There were some errors using gdb though, even though I have the debuginfo RPM
> installed:
> warning: Missing the separate debug info file:
> /usr/lib/debug/.build-id/09/8a6785767cca8ae88be4ff876487b6eb6db111.debug
> done.
> [Thread debugging using libthread_db enabled]
> [New Thread 46912510610176 (LWP 5729)]
> Error while reading shared library symbols:
> Can't attach LWP 5737 (No child processes) - possible SELinux denial, check your
> /var/log/messages for `avc: denied'

You mean just attaching to the automount process?
Did you try setting selinux to permissive mode first?

Ian



Comment 31 Warren Togami 2007-09-10 20:46:57 UTC
Figured out what's going on here... thanks to Jeff Moyer and Chuck Ebbert.

- Run 2.6.23-0.170.rc5.git1.fc8 with clocksource=hpet, where Bug #262481 causes
the system time to run forward rapidly.
- autofs-5.0.2/daemon/state.c:1038
                while (list_empty(head)) {
                        status = pthread_cond_timedwait(&cond, &mutex, &wait);

pthread_cond_timedwait() is timing out immediately because the clock is jumping
forward rapidly on x86_64 because of a bug where values above 32bit in the DSO
are not masked.

Temporary Workaround
====================
echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource


Fix in Bug #262481
==================
cebbert is building the kernel fix into rawhide kernel 171+.\


Test Case
=========
jmoyer is writing a simple test-case based upon this finding so that we may
diagnose this sooner if it happens again.

Comment 32 Warren Togami 2007-09-10 20:54:48 UTC
Adam, could you please verify the above workaround and/or test a newer kernel?

If this diagnosis is correct, then automount is actually behaving properly, it
cannot help that the kernel is giving it bad time.  At least we got a test case
out of this.

Comment 33 Adam Huffman 2007-09-10 22:05:07 UTC
Was only able to find 170 kernel online, so I've just tried that with the
workaround.  That does prevent the automount process from running away as it did
before.

Thanks for persevering with this.

Comment 34 Bug Zapper 2008-04-04 13:42:43 UTC
Based on the date this bug was created, it appears to have been reported
during the development of Fedora 8. In order to refocus our efforts as
a project we are changing the version of this bug to '8'.

If this bug still exists in rawhide, please change the version back to
rawhide.
(If you're unable to change the bug's version, add a comment to the bug
and someone will change it for you.)

Thanks for your help and we apologize for the interruption.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 35 Bug Zapper 2008-11-26 07:43:43 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 36 Bug Zapper 2009-01-09 07:13:43 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.