Bug 247711

Summary: automount never ends, high cpu rate
Product: [Fedora] Fedora Reporter: Darwin H. Webb <thethirddoorontheleft>
Component: autofsAssignee: Jeff Moyer <jmoyer>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 7CC: espenas, ikent, jik, masaki.chikama, staubach
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 5.0.1-18 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-07-19 16:46:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace output from high-cpu automount process
none
Patch to try not to mix usage of hi and lo res time sources
none
Patch to try not to mix usage of hi and lo res time sources (5.0.2) none

Description Darwin H. Webb 2007-07-10 21:54:57 UTC
Description of problem:
automount is loaded at boot time but never ends.
Uses about 30-40% of cpu

Version-Release number of selected component (if applicable):
autofs-5.0.1-11 
I assume this is the package as there is no automount in the bugzilla list and
it is not installed on the system.

How reproducible:
Every time on both a Pent IV ht I865BGF and a Pent III 800 Tayn S1854 via apollo-133

F7 with all updates 

Steps to Reproduce:
1. Boot up and check top for automount
2.
3.
  
Actual results:
uses a lot of cpu

Expected results:
I don't really know but maybe ask to mount the windows share after a few minutes
when Nautilus is open? But it hasn't done this for some time, probably because
it is broke.

Additional info:

Comment 1 Jeff Moyer 2007-07-10 22:05:51 UTC
Is this a fresh install of F7?  I'm assuming you haven't made any modifications
to the automounter configuration.  If you have, then please post your config
files (/etc/auto.master and any referenced maps, /etc/nsswitch.conf,
/etc/sysconfig/autofs).

Thanks!


Comment 2 Jonathan Kamens 2007-07-16 13:52:38 UTC
I am seeing the same behavior, with current devel packages (e.g., autofs-5.0.2-
4).  Automount consistently consumes 10% or more of my CPU, and often as high 
as 50% or 60% for a prolonged period of time.

I will attach the output of strace -f -F -o /tmp/strace.out for a few seconds 
on the automount process.

"sudo rpm --verify autofs" shows no changes.



Comment 3 Jonathan Kamens 2007-07-16 13:53:26 UTC
Created attachment 159320 [details]
strace output from high-cpu automount process

Comment 4 Ian Kent 2007-07-16 14:30:06 UTC
And what about the information requested in comment #1?

There seems to be an assumption that this is easy to
reproduce and an obvious problem.

It's not, and we need information about the configuration
so can someone post what they're using please.

Ian


Comment 5 Jonathan Kamens 2007-07-16 14:33:36 UTC
I'm confused.  I believe I've already provided all the information you asked 
for.  As I said, I'm using Fedora devel packages, and as I said, I haven't 
changed any of the files you listed -- they've all got their default contents --
 so I don't see anything to be gained by attaching them to the ticket.  Could 
you be more specific about what you want me to provide?


Comment 6 Ian Kent 2007-07-16 14:46:47 UTC
(In reply to comment #5)
> I'm confused.  I believe I've already provided all the information you asked 
> for.  As I said, I'm using Fedora devel packages, and as I said, I haven't 
> changed any of the files you listed -- they've all got their default contents --
>  so I don't see anything to be gained by attaching them to the ticket.  Could 
> you be more specific about what you want me to provide?
> 

So you this is either a fresh install or you didn't previuosly
have autofs installed, is that right?

It probably won't be of use but nsswitch.conf can be modified
by other package installs so I'd like to see that.

What may be useful is knowing which thread is eating the CPU
cycles. If you could get a thread listing from gdb (preferably
with the debuginfo package installed) that would be great.

gdb -p <automount pid> /usr/sbin/automount
(gdb) info threads

Ian


Comment 7 Ian Kent 2007-07-16 14:51:19 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > I'm confused.  I believe I've already provided all the information you asked 
> > for.  As I said, I'm using Fedora devel packages, and as I said, I haven't 
> > changed any of the files you listed -- they've all got their default contents --
> >  so I don't see anything to be gained by attaching them to the ticket.  Could 
> > you be more specific about what you want me to provide?
> > 
> 
> So you this is either a fresh install or you didn't previuosly
> have autofs installed, is that right?
> 
> It probably won't be of use but nsswitch.conf can be modified
> by other package installs so I'd like to see that.
> 
> What may be useful is knowing which thread is eating the CPU
> cycles. If you could get a thread listing from gdb (preferably
> with the debuginfo package installed) that would be great.
> 
> gdb -p <automount pid> /usr/sbin/automount
> (gdb) info threads

Oh, sorry and some indication of which thread it is with
top -H -p <automount pid>

Ian


Comment 8 Jonathan Kamens 2007-07-16 15:04:37 UTC
I've had autofs installed forever; I don't know exactly when this problem 
started.  /etc/nsswitch.conf is unmodified from glibc (2.6-4).  Here's the 
thread listing:

(gdb) info threads
  5 Thread -1208689776 (LWP 26047)  0x0012d402 in __kernel_vsyscall ()
  4 Thread -1208808560 (LWP 26048)  0x0012d402 in __kernel_vsyscall ()
  3 Thread -1208878192 (LWP 26052)  0x0012d402 in __kernel_vsyscall ()
  2 Thread -1209930864 (LWP 26055)  0x0012d402 in __kernel_vsyscall ()
  1 Thread -1208804656 (LWP 26046)  0x0012d402 in __kernel_vsyscall ()

The thread that's consuming all the CPU is 26048.


Comment 9 Jonathan Kamens 2007-07-16 15:06:27 UTC
According to strace, that thread is calling futex, clock_gettime, time, futex, 
clock_gettime, time, futex, clock_gettime, time ...


Comment 10 Ian Kent 2007-07-16 15:36:36 UTC
(In reply to comment #9)
> According to strace, that thread is calling futex, clock_gettime, time, futex, 
> clock_gettime, time, futex, clock_gettime, time ...
> 

Ya .. but more than one of those threads will be doing that
but I would expect something like

futex(0x5555557868c4, FUTEX_WAIT, 5321099, {0, 998091311}) = -1 ETIMEDOUT
(Connection timed out)

as well so it looks like one of the threads is always waiting
for a time in the past.

In the thread list there will be one thread listening for
signals, that should be the first thread (probably the
lowest numbered pid 26046). Then two other threads are
created, the first is the alarm handler and the other is
the state queue handler, probably 26047 and 26048, respectively.
Finally the last two threads handle the autofs mount points.
I'm assuming the default config which has two automount points.
So what I need to do (since we don't have the debuginfo package
installed) is confirm that 26048 is in fact the state
queue handler, so could you get an strace of 26047 and post
that please.

Of course I still don't know what's causing it but at least
if I can stare at the right bit of code it's a start.

Ian



Comment 11 Jonathan Kamens 2007-07-16 18:02:18 UTC
I was unable to get the strace you wanted because for some reason automount 
entered the stopped state (i.e., showed up as "T" in the ps output), and 
sending it a CONT signal wouldn't wake it up.  I ended up having to kill and 
restart it.  Now here are the threads that top -H shows:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
14363 root      20   0  5128 1064  852 S   35  0.1   0:08.76 automount          
14361 root      20   0  5128 1064  852 S    0  0.1   0:00.00 automount          
14362 root      20   0  5128 1064  852 S    0  0.1   0:00.00 automount          
14366 root      20   0  5128 1064  852 S    0  0.1   0:00.00 automount          
14369 root      20   0  5128 1064  852 S    0  0.1   0:00.00 automount          

Here's what I get for strace for some of the threads shown above:

14363: time / futex / clock_gettime ...

14361: rt_sigtimedwait(~[CHLD CONT RTMIN RT_1], 

14362: restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT 
(Connection timed out)
time(NULL)                              = 1184608874
futex(0x80034c00, FUTEX_WAKE, 1)        = 0
clock_gettime(CLOCK_REALTIME, {1184608875, 156378582}) = 0
futex(0x80034c24, FUTEX_WAIT, 772659, {15, 843621418}

Then a burst of clock_gettime / time / futex, then this:

write(7, "\2\0\0\0", 4)                 = 4
futex(0x817afc5c, FUTEX_WAKE, 1)        = 1
time(NULL)                              = 1184608892
clock_gettime(CLOCK_REALTIME, {1184608892, 431499109}) = 0



Comment 12 CHIKAMA Masaki 2007-07-17 11:24:35 UTC
I also have the same problem.
It doesn't happen with kernel-2.6.21-1.3228.fc7.
It happens with kernel-2.6.22.1-20.fc7 (in testing).

I hope this makes some help.


Comment 13 Espen Stefansen 2007-07-17 11:31:05 UTC
I can confirm this. I have one laptop with rawhide installed, and automount 
goes from about 1% to about 40% every other second. The same thing happened 
when i updated my fc7-box today as well. When kernel kernel-2.6.22.1-20.fc7 
from updates-testing got installed, automount went haywire.

Comment 14 Ian Kent 2007-07-17 12:01:55 UTC
I believe this is due to kernel changes realting to mixing
system calls that use a low resolution timer with those
that use a high resolution timer.

I have a suggestion to try and I'll be posting a patch
a bit later. For various reasons I'm not able to test
here myself.

Hopefully someone will be able to apply it and give it
a test.

Ian


Comment 15 Ian Kent 2007-07-17 14:36:17 UTC
Created attachment 159441 [details]
Patch to try not to mix usage of hi and lo res time sources

This patch should apply against the F7 source rpm.
Could someone please try it out.

Ian

Comment 16 Jonathan Kamens 2007-07-17 16:14:18 UTC
Sorry, I've got 5.0.2, not 5.0.1, and your patch doesn't apply cleanly to 5.0.2.


Comment 17 CHIKAMA Masaki 2007-07-17 16:53:26 UTC
(In reply to comment #15)
Appling the patch to autofs-5.0.1-17 (in update-testing) fix my problem.
Thank you.



Comment 18 Ian Kent 2007-07-17 17:07:08 UTC
Created attachment 159454 [details]
Patch to try not to mix usage of hi and lo res time sources (5.0.2)

This patch should apply to 5.0.2.
Please try it out.

Ian

Comment 19 Darwin H. Webb 2007-07-17 20:11:42 UTC
http://www.redhat.com/archives/fedora-test-list/2007-July/msg00368.html

This shows it was 2.6.22 (which just made it to FC6)
and that was what I was running also. (fc8 kerenel in F7)

I now use 2.6.23 (fc8 in F7 and autofs-5.0.1-17
and it is not happening anymore.

Darwin

Comment 20 Paul Johnson 2007-07-18 18:51:17 UTC
I ran into this problem with the new F7 test kernels in the 2.6.22 series.  I
applied your patch and it DID solve the problem.  In case other people want to
try, I posted the RPMs here:

http://pj.freefaculty.org/Fedora/7/i386/autofs-5.0.1-12pj.i386.rpm
http://pj.freefaculty.org/Fedora/7/i386/autofs-debuginfo-5.0.1-12pj.i386.rpm


Comment 21 Fedora Update System 2007-07-18 20:55:01 UTC
autofs-5.0.1-18 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.

Comment 22 Ian Kent 2007-07-19 02:41:24 UTC
(In reply to comment #20)
> I ran into this problem with the new F7 test kernels in the 2.6.22 series.  I
> applied your patch and it DID solve the problem.  In case other people want to
> try, I posted the RPMs here:
> 
> http://pj.freefaculty.org/Fedora/7/i386/autofs-5.0.1-12pj.i386.rpm
> http://pj.freefaculty.org/Fedora/7/i386/autofs-debuginfo-5.0.1-12pj.i386.rpm

I've applied the change to the FC6, F7 and Rawhide.
It should be available on the mirrors soon.
The F7 change has been pushed to testing but I request that
be moved to stable today.

Thanks for the feedback.
Ian


Comment 23 Mihai Harpau 2007-07-19 06:21:13 UTC
autofs-5.0.1-18 from Fedora 7 testing repository does not resolve the problem.
but autofs-5.0.1-19 from here
http://koji.fedoraproject.org/koji/buildinfo?buildID=11351 solve the problem.

Comment 24 Ian Kent 2007-07-19 09:25:36 UTC
(In reply to comment #23)
> autofs-5.0.1-18 from Fedora 7 testing repository does not resolve the problem.
> but autofs-5.0.1-19 from here
> http://koji.fedoraproject.org/koji/buildinfo?buildID=11351 solve the problem.

That's right, as revision 19 has the patch that's meant
to fix the problem and 18 is for an earlier bug fix.

Ian


Comment 25 Fedora Update System 2007-07-19 16:46:08 UTC
autofs-5.0.1-18 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 26 dominique 2007-07-19 18:57:53 UTC
I try autofs-5.0.1-18 with 2.6.22.1-27.fc7
 and the bug still persist

Comment 27 dominique 2007-07-19 19:18:46 UTC
But autofs-5.0.1-19 with 2.6.22.1-27.fc7 solve the problem

Comment 28 Peter Staubach 2007-07-19 19:23:37 UTC
Does /net work with that combination of autofs and kernel?

Comment 29 Ian Kent 2007-07-20 03:07:14 UTC
(In reply to comment #28)
> Does /net work with that combination of autofs and kernel?

It should, with the patch.
The problem had nothing to do with mounting, it was
related to setting a time to wait that was 1 second
away but not accounting for the fraction of the second
already passed. The behavior change was introduced by
the CFS scheduler patch in recent kernels. In spite of
the feeling that this needs to be fixed anyway I'm
going to leave the autofs change in place because it
seems sensible anyway.

Ian


Comment 30 Peter Staubach 2007-07-20 11:48:42 UTC
With what patch?  autofs-5.0.1-19 with 2.6.22.1-23 does not mount
file systems in /net and leave them mounted long enough to be usable.
Neither does autofs-5.0.2-6 with that same kernel.

Comment 31 Ian Kent 2007-07-20 12:37:16 UTC
(In reply to comment #30)
> With what patch?  autofs-5.0.1-19 with 2.6.22.1-23 does not mount
> file systems in /net and leave them mounted long enough to be usable.
> Neither does autofs-5.0.2-6 with that same kernel.

I was referring to the patch posted in comment #18.

I can't seem to find kernel 1.23 kernel.
All I could find was 1.24 and my machine was not usable
with it.

As I said the problem described in this thread was due
to the CFS scheduler v19 patch.

Ian


Comment 32 Peter Staubach 2007-07-20 12:52:17 UTC
I have now also tested autofs-5.0.1-18 and kernel-2.6.21-1.3228.fc7
and /net does not work correctly there either.  These are the latest
rpms available for F-7.  I have "yum update"'d and these were the
rpms installed.

Comment 33 Darwin H. Webb 2007-08-22 20:51:08 UTC
This bug can be closed.
autofs and kernel updates fixed problem.

thanks,

Darwin