Bug 247711
Summary: | automount never ends, high cpu rate | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Darwin H. Webb <thethirddoorontheleft> |
Component: | autofs | Assignee: | Jeff Moyer <jmoyer> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 7 | CC: | espenas, ikent, jik, masaki.chikama, staubach |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 5.0.1-18 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-07-19 16:46:14 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Darwin H. Webb
2007-07-10 21:54:57 UTC
Is this a fresh install of F7? I'm assuming you haven't made any modifications to the automounter configuration. If you have, then please post your config files (/etc/auto.master and any referenced maps, /etc/nsswitch.conf, /etc/sysconfig/autofs). Thanks! I am seeing the same behavior, with current devel packages (e.g., autofs-5.0.2- 4). Automount consistently consumes 10% or more of my CPU, and often as high as 50% or 60% for a prolonged period of time. I will attach the output of strace -f -F -o /tmp/strace.out for a few seconds on the automount process. "sudo rpm --verify autofs" shows no changes. Created attachment 159320 [details]
strace output from high-cpu automount process
And what about the information requested in comment #1? There seems to be an assumption that this is easy to reproduce and an obvious problem. It's not, and we need information about the configuration so can someone post what they're using please. Ian I'm confused. I believe I've already provided all the information you asked for. As I said, I'm using Fedora devel packages, and as I said, I haven't changed any of the files you listed -- they've all got their default contents -- so I don't see anything to be gained by attaching them to the ticket. Could you be more specific about what you want me to provide? (In reply to comment #5) > I'm confused. I believe I've already provided all the information you asked > for. As I said, I'm using Fedora devel packages, and as I said, I haven't > changed any of the files you listed -- they've all got their default contents -- > so I don't see anything to be gained by attaching them to the ticket. Could > you be more specific about what you want me to provide? > So you this is either a fresh install or you didn't previuosly have autofs installed, is that right? It probably won't be of use but nsswitch.conf can be modified by other package installs so I'd like to see that. What may be useful is knowing which thread is eating the CPU cycles. If you could get a thread listing from gdb (preferably with the debuginfo package installed) that would be great. gdb -p <automount pid> /usr/sbin/automount (gdb) info threads Ian (In reply to comment #6) > (In reply to comment #5) > > I'm confused. I believe I've already provided all the information you asked > > for. As I said, I'm using Fedora devel packages, and as I said, I haven't > > changed any of the files you listed -- they've all got their default contents -- > > so I don't see anything to be gained by attaching them to the ticket. Could > > you be more specific about what you want me to provide? > > > > So you this is either a fresh install or you didn't previuosly > have autofs installed, is that right? > > It probably won't be of use but nsswitch.conf can be modified > by other package installs so I'd like to see that. > > What may be useful is knowing which thread is eating the CPU > cycles. If you could get a thread listing from gdb (preferably > with the debuginfo package installed) that would be great. > > gdb -p <automount pid> /usr/sbin/automount > (gdb) info threads Oh, sorry and some indication of which thread it is with top -H -p <automount pid> Ian I've had autofs installed forever; I don't know exactly when this problem started. /etc/nsswitch.conf is unmodified from glibc (2.6-4). Here's the thread listing: (gdb) info threads 5 Thread -1208689776 (LWP 26047) 0x0012d402 in __kernel_vsyscall () 4 Thread -1208808560 (LWP 26048) 0x0012d402 in __kernel_vsyscall () 3 Thread -1208878192 (LWP 26052) 0x0012d402 in __kernel_vsyscall () 2 Thread -1209930864 (LWP 26055) 0x0012d402 in __kernel_vsyscall () 1 Thread -1208804656 (LWP 26046) 0x0012d402 in __kernel_vsyscall () The thread that's consuming all the CPU is 26048. According to strace, that thread is calling futex, clock_gettime, time, futex, clock_gettime, time, futex, clock_gettime, time ... (In reply to comment #9) > According to strace, that thread is calling futex, clock_gettime, time, futex, > clock_gettime, time, futex, clock_gettime, time ... > Ya .. but more than one of those threads will be doing that but I would expect something like futex(0x5555557868c4, FUTEX_WAIT, 5321099, {0, 998091311}) = -1 ETIMEDOUT (Connection timed out) as well so it looks like one of the threads is always waiting for a time in the past. In the thread list there will be one thread listening for signals, that should be the first thread (probably the lowest numbered pid 26046). Then two other threads are created, the first is the alarm handler and the other is the state queue handler, probably 26047 and 26048, respectively. Finally the last two threads handle the autofs mount points. I'm assuming the default config which has two automount points. So what I need to do (since we don't have the debuginfo package installed) is confirm that 26048 is in fact the state queue handler, so could you get an strace of 26047 and post that please. Of course I still don't know what's causing it but at least if I can stare at the right bit of code it's a start. Ian I was unable to get the strace you wanted because for some reason automount entered the stopped state (i.e., showed up as "T" in the ps output), and sending it a CONT signal wouldn't wake it up. I ended up having to kill and restart it. Now here are the threads that top -H shows: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14363 root 20 0 5128 1064 852 S 35 0.1 0:08.76 automount 14361 root 20 0 5128 1064 852 S 0 0.1 0:00.00 automount 14362 root 20 0 5128 1064 852 S 0 0.1 0:00.00 automount 14366 root 20 0 5128 1064 852 S 0 0.1 0:00.00 automount 14369 root 20 0 5128 1064 852 S 0 0.1 0:00.00 automount Here's what I get for strace for some of the threads shown above: 14363: time / futex / clock_gettime ... 14361: rt_sigtimedwait(~[CHLD CONT RTMIN RT_1], 14362: restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT (Connection timed out) time(NULL) = 1184608874 futex(0x80034c00, FUTEX_WAKE, 1) = 0 clock_gettime(CLOCK_REALTIME, {1184608875, 156378582}) = 0 futex(0x80034c24, FUTEX_WAIT, 772659, {15, 843621418} Then a burst of clock_gettime / time / futex, then this: write(7, "\2\0\0\0", 4) = 4 futex(0x817afc5c, FUTEX_WAKE, 1) = 1 time(NULL) = 1184608892 clock_gettime(CLOCK_REALTIME, {1184608892, 431499109}) = 0 I also have the same problem. It doesn't happen with kernel-2.6.21-1.3228.fc7. It happens with kernel-2.6.22.1-20.fc7 (in testing). I hope this makes some help. I can confirm this. I have one laptop with rawhide installed, and automount goes from about 1% to about 40% every other second. The same thing happened when i updated my fc7-box today as well. When kernel kernel-2.6.22.1-20.fc7 from updates-testing got installed, automount went haywire. I believe this is due to kernel changes realting to mixing system calls that use a low resolution timer with those that use a high resolution timer. I have a suggestion to try and I'll be posting a patch a bit later. For various reasons I'm not able to test here myself. Hopefully someone will be able to apply it and give it a test. Ian Created attachment 159441 [details]
Patch to try not to mix usage of hi and lo res time sources
This patch should apply against the F7 source rpm.
Could someone please try it out.
Ian
Sorry, I've got 5.0.2, not 5.0.1, and your patch doesn't apply cleanly to 5.0.2. (In reply to comment #15) Appling the patch to autofs-5.0.1-17 (in update-testing) fix my problem. Thank you. Created attachment 159454 [details]
Patch to try not to mix usage of hi and lo res time sources (5.0.2)
This patch should apply to 5.0.2.
Please try it out.
Ian
http://www.redhat.com/archives/fedora-test-list/2007-July/msg00368.html This shows it was 2.6.22 (which just made it to FC6) and that was what I was running also. (fc8 kerenel in F7) I now use 2.6.23 (fc8 in F7 and autofs-5.0.1-17 and it is not happening anymore. Darwin I ran into this problem with the new F7 test kernels in the 2.6.22 series. I applied your patch and it DID solve the problem. In case other people want to try, I posted the RPMs here: http://pj.freefaculty.org/Fedora/7/i386/autofs-5.0.1-12pj.i386.rpm http://pj.freefaculty.org/Fedora/7/i386/autofs-debuginfo-5.0.1-12pj.i386.rpm autofs-5.0.1-18 has been pushed to the Fedora 7 testing repository. If problems still persist, please make note of it in this bug report. (In reply to comment #20) > I ran into this problem with the new F7 test kernels in the 2.6.22 series. I > applied your patch and it DID solve the problem. In case other people want to > try, I posted the RPMs here: > > http://pj.freefaculty.org/Fedora/7/i386/autofs-5.0.1-12pj.i386.rpm > http://pj.freefaculty.org/Fedora/7/i386/autofs-debuginfo-5.0.1-12pj.i386.rpm I've applied the change to the FC6, F7 and Rawhide. It should be available on the mirrors soon. The F7 change has been pushed to testing but I request that be moved to stable today. Thanks for the feedback. Ian autofs-5.0.1-18 from Fedora 7 testing repository does not resolve the problem. but autofs-5.0.1-19 from here http://koji.fedoraproject.org/koji/buildinfo?buildID=11351 solve the problem. (In reply to comment #23) > autofs-5.0.1-18 from Fedora 7 testing repository does not resolve the problem. > but autofs-5.0.1-19 from here > http://koji.fedoraproject.org/koji/buildinfo?buildID=11351 solve the problem. That's right, as revision 19 has the patch that's meant to fix the problem and 18 is for an earlier bug fix. Ian autofs-5.0.1-18 has been pushed to the Fedora 7 stable repository. If problems still persist, please make note of it in this bug report. I try autofs-5.0.1-18 with 2.6.22.1-27.fc7 and the bug still persist But autofs-5.0.1-19 with 2.6.22.1-27.fc7 solve the problem Does /net work with that combination of autofs and kernel? (In reply to comment #28) > Does /net work with that combination of autofs and kernel? It should, with the patch. The problem had nothing to do with mounting, it was related to setting a time to wait that was 1 second away but not accounting for the fraction of the second already passed. The behavior change was introduced by the CFS scheduler patch in recent kernels. In spite of the feeling that this needs to be fixed anyway I'm going to leave the autofs change in place because it seems sensible anyway. Ian With what patch? autofs-5.0.1-19 with 2.6.22.1-23 does not mount file systems in /net and leave them mounted long enough to be usable. Neither does autofs-5.0.2-6 with that same kernel. (In reply to comment #30) > With what patch? autofs-5.0.1-19 with 2.6.22.1-23 does not mount > file systems in /net and leave them mounted long enough to be usable. > Neither does autofs-5.0.2-6 with that same kernel. I was referring to the patch posted in comment #18. I can't seem to find kernel 1.23 kernel. All I could find was 1.24 and my machine was not usable with it. As I said the problem described in this thread was due to the CFS scheduler v19 patch. Ian I have now also tested autofs-5.0.1-18 and kernel-2.6.21-1.3228.fc7 and /net does not work correctly there either. These are the latest rpms available for F-7. I have "yum update"'d and these were the rpms installed. This bug can be closed. autofs and kernel updates fixed problem. thanks, Darwin |