Bug 439196

Summary:

pidof hangs in access_process_vm

Product:

Red Hat Enterprise Linux 4

Reporter:

Janne Karhunen <jkarhune>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED WONTFIX

QA Contact:

Martin Jenner <mjenner>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.6

CC:

jbaron, lwang, pcfe, peterm, staubach

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-04-29 13:40:29 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
crash backtrace of pidof while hanging	none
partial sysrq	none
Probably complete backtrace from Crash	none
Another crash-bt	none

Description Janne Karhunen 2008-03-27 14:53:40 UTC

Description of problem:

Occasionally running pidof for process X hangs in access_process_vm for eternity.


Version-Release number of selected component (if applicable):

2.6.9-67


How reproducible:

Day or two - but pidof access rate is very low, once per 5 minutes or so.


Steps to Reproduce:

Nokia flexiserver specific: test case is to fail over cluster NFS heads.
Failover script uses pidof to figure out already running NFS tasks. We get a
hang once every 100 runs or so.

Comment 1 Janne Karhunen 2008-03-27 14:53:40 UTC

Created attachment 299335 [details]
crash backtrace of pidof while hanging

Comment 2 Janne Karhunen 2008-03-27 15:22:28 UTC

Slow console & hw watchdog caused us not to get any sysrq-t's from this hardware
but that is now solved. We should get complete task list from next occurrence.

Comment 3 Larry Woodman 2008-03-27 15:26:20 UTC

OK, thanks.  BTW, it this reproducable on my system?  The system is
obviously stuck here:

------------------------------------
int access_process_vm(...)
{
        struct mm_struct *mm;
        struct vm_area_struct *vma;
        struct page *page;
        void *old_buf = buf;

        mm = get_task_mm(tsk);
        if (!mm)
                return 0;

>>>     down_read(&mm->mmap_sem);
-------------------------------------

But there are hundreds of other down_write(...>mmap_sem); calls
on that architecture that could cause this problem...

Can you get an AltSysrq-T when this happens so I can see what the
processes that has the semaphore is doing???

Larry

Comment 4 Janne Karhunen 2008-03-27 15:37:04 UTC

Sysrq-t is in the works. With any luck we get it tomorrow morning. I'll try to
make a reproducer testcase on one of the rhts systems. I'm feeling lucky..

Comment 5 Janne Karhunen 2008-03-27 16:39:05 UTC

Most basic imaginable testcase (killing/starting hordes of processes and making
pidof against them) does not seem to reproduce it. I'm betting this may have
something to do with the mount being done just prior to checking the existence
of leftover nfs tasks. Just a guess though.

Comment 6 Janne Karhunen 2008-04-02 13:10:12 UTC

Created attachment 300059 [details]
partial sysrq

Comment 7 Janne Karhunen 2008-04-02 13:11:01 UTC

This is tricky. WD is not the cause of the reset, it has to be something else.
We can only get partial sysrq's.

Comment 8 Janne Karhunen 2008-04-03 13:35:13 UTC

Created attachment 300244 [details]
Probably complete backtrace from Crash

Comment 9 Janne Karhunen 2008-04-03 13:54:27 UTC

Created attachment 300249 [details]
Another crash-bt

Comment 10 Janne Karhunen 2008-04-03 13:58:40 UTC

NOTE::: attachment id 300249 is verified to be from NON-RECOVERABLE occurrence.

Comment 11 Janne Karhunen 2008-04-03 14:54:10 UTC

Umm, one of the tasks on top of NFS holds the semaphore that is required for NFS
to start up :) ?

Comment 12 Larry Woodman 2008-04-03 15:15:06 UTC

I suspect this problem was introduced in linux-2.6.9-futex.patch:

* Thu May 10 2007 Jason Baron <jbaron> [2.6.9-55.2]
-fix for futex()/FUTEX_WAIT race condition (Ernie Petrides) [217067]


Can you try kernel-2.6.9-55.1 and see it the problem goes away???

Larry

Comment 13 Janne Karhunen 2008-04-03 15:26:12 UTC

Yeah, no (obvious) luck with the nfs guess.

Comment 14 Janne Karhunen 2008-04-03 16:15:13 UTC

Hmm, imho my initial guess may still be valid. It may be that 11748 holds the
write mmap_sem in sys_mmap holding just about everyone. That task may not be
proceeding as NFS is not up and given that NFS startup is hanging in having
'pidof' waiting for that same semaphore, we have a deadlock. 

So we'll try both cases. We'll try removing the patch Larry suggested on one
system and on another we'll move the pidof call to a point where basic NFS is
already up. I'm willing to bet Larry a cup of machine coffee on this one :)

Comment 15 Janne Karhunen 2008-04-04 13:55:12 UTC

It took a day to find second cluster for testing with 55.1 kernel, but we found
one and will set up it on monday + start the test.

System 1 has been testing the fix that moves pidof call to a point where
nfsd/mountd are already up and the fault has not shown up yet. Given that this
is the case implications of this are yet to be properly understood: it may mean
that whole NFS failover concept is flaky, at least when it comes to having NFS
client and server in the same node.

Comment 16 Janne Karhunen 2008-04-07 14:52:24 UTC

We have not seen this bug again since pidof call was moved to a point when
nfsd/mountd are already up. We'll keep the test running for another day to be
'sure'. 

Build with 55.1 kernel is also ready but has not been installed yet.

Comment 17 Janne Karhunen 2008-04-08 15:10:31 UTC

Verified: does not show up after having moved the pidof call.

Comment 18 Janne Karhunen 2008-04-11 12:47:31 UTC

Verified: using 55.1 kernel does not resolve this issue.

Comment 19 Larry Woodman 2008-04-11 14:08:40 UTC


OK, so what does this all mean?  Is the whole failover logic flawed whe NFS is
in the picture???

Larry

Comment 20 Janne Karhunen 2008-04-11 18:27:23 UTC

To me it means that with bad luck tasks that are running on top of NFS mount may
cause the system to deadlock when NFS server itself migrates to the same node. I
take it not too many people are doing this..

Comment 21 Janne Karhunen 2008-04-14 18:36:44 UTC

To summarize: provided that we have NFS server migration from external host to
local occurring at the same time when local NFS client task is holding mmap_sem
we can have a deadlock. Pidof calls in NFS server startup iterate all tasks
(proc/pid/cmdline) and they will stop once hitting this task: and this task is
never proceeding as server is not coming back up. Larry, any major holes in this
theory?