Red Hat Bugzilla – Bug 439196
pidof hangs in access_process_vm
Last modified: 2013-08-05 20:43:09 EDT
Description of problem:
Occasionally running pidof for process X hangs in access_process_vm for eternity.
Version-Release number of selected component (if applicable):
Day or two - but pidof access rate is very low, once per 5 minutes or so.
Steps to Reproduce:
Nokia flexiserver specific: test case is to fail over cluster NFS heads.
Failover script uses pidof to figure out already running NFS tasks. We get a
hang once every 100 runs or so.
Created attachment 299335 [details]
crash backtrace of pidof while hanging
Slow console & hw watchdog caused us not to get any sysrq-t's from this hardware
but that is now solved. We should get complete task list from next occurrence.
OK, thanks. BTW, it this reproducable on my system? The system is
obviously stuck here:
struct mm_struct *mm;
struct vm_area_struct *vma;
struct page *page;
void *old_buf = buf;
mm = get_task_mm(tsk);
But there are hundreds of other down_write(...>mmap_sem); calls
on that architecture that could cause this problem...
Can you get an AltSysrq-T when this happens so I can see what the
processes that has the semaphore is doing???
Sysrq-t is in the works. With any luck we get it tomorrow morning. I'll try to
make a reproducer testcase on one of the rhts systems. I'm feeling lucky..
Most basic imaginable testcase (killing/starting hordes of processes and making
pidof against them) does not seem to reproduce it. I'm betting this may have
something to do with the mount being done just prior to checking the existence
of leftover nfs tasks. Just a guess though.
Created attachment 300059 [details]
This is tricky. WD is not the cause of the reset, it has to be something else.
We can only get partial sysrq's.
Created attachment 300244 [details]
Probably complete backtrace from Crash
Created attachment 300249 [details]
NOTE::: attachment id 300249 is verified to be from NON-RECOVERABLE occurrence.
Umm, one of the tasks on top of NFS holds the semaphore that is required for NFS
to start up :) ?
I suspect this problem was introduced in linux-2.6.9-futex.patch:
* Thu May 10 2007 Jason Baron <firstname.lastname@example.org> [2.6.9-55.2]
-fix for futex()/FUTEX_WAIT race condition (Ernie Petrides) 
Can you try kernel-2.6.9-55.1 and see it the problem goes away???
Yeah, no (obvious) luck with the nfs guess.
Hmm, imho my initial guess may still be valid. It may be that 11748 holds the
write mmap_sem in sys_mmap holding just about everyone. That task may not be
proceeding as NFS is not up and given that NFS startup is hanging in having
'pidof' waiting for that same semaphore, we have a deadlock.
So we'll try both cases. We'll try removing the patch Larry suggested on one
system and on another we'll move the pidof call to a point where basic NFS is
already up. I'm willing to bet Larry a cup of machine coffee on this one :)
It took a day to find second cluster for testing with 55.1 kernel, but we found
one and will set up it on monday + start the test.
System 1 has been testing the fix that moves pidof call to a point where
nfsd/mountd are already up and the fault has not shown up yet. Given that this
is the case implications of this are yet to be properly understood: it may mean
that whole NFS failover concept is flaky, at least when it comes to having NFS
client and server in the same node.
We have not seen this bug again since pidof call was moved to a point when
nfsd/mountd are already up. We'll keep the test running for another day to be
Build with 55.1 kernel is also ready but has not been installed yet.
Verified: does not show up after having moved the pidof call.
Verified: using 55.1 kernel does not resolve this issue.
OK, so what does this all mean? Is the whole failover logic flawed whe NFS is
in the picture???
To me it means that with bad luck tasks that are running on top of NFS mount may
cause the system to deadlock when NFS server itself migrates to the same node. I
take it not too many people are doing this..
To summarize: provided that we have NFS server migration from external host to
local occurring at the same time when local NFS client task is holding mmap_sem
we can have a deadlock. Pidof calls in NFS server startup iterate all tasks
(proc/pid/cmdline) and they will stop once hitting this task: and this task is
never proceeding as server is not coming back up. Larry, any major holes in this