From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Description of problem: The bug as I understand it is this: process forks() parent does some NFS stuff that blocks someone hits ^c parent goes away but leaves the request for the page child exits There is no one to reap them and so they stay in the D state. We found this patch on Trond\'s page under linux-2.4.20-16-waitq.dif It fixes our problem and has been working in production for several days now. http://www.fys.uio.no/~trondmy/src/2.4.20/linux-2.4.20-16-waitq.dif from: http://www.fys.uio.no/~trondmy/src/2.4.20/ Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. have a process which does NFS stuff fork a child 2. killall by the name of the process 3. notice that the child is sometimes left in D state Actual Results: process left in D state Expected Results: process is inherited by init Additional info:
Created attachment 89427 [details] patch which fixed the problem for us
Note that the key thing to understand here is that the parent exects through the error case when you hit ^c.
From: Jim Garlick <garlick> To: Ben Woodard <bwoodard> Cc: Mike Haskell <haskell5> Subject: Re: Kernel/131 Date: Fri, 17 Jan 2003 14:37:11 -0800 (PST) I actually seemed to need two processes reading from two separate files to get it to happen. I attached a "sure fire" reproducer to our gnat which starts two threads reading at the same time. Run the reproducer once on an SMP system, hit ^C, then try to umount the filesystem. umount will hang and crash will show it is waiting uninterruptibly for a lock on the page. Jim
Sorry got two similar bugs confused. This is the original text: On MCR we have a problem that seems to occur when processes doing I/O to bluearcs over NFS are interrupted with a SIGINT. Processes will block in an unkillable state, stuck in a read system call to a file on a remote bluearc. Crash says the stack for the hung process looks like this: system_call -> sys_read -> nfs_file_read -> generic_file_read -> do_generic_file_read -> lock_page -> __lock_page -> schedule Following the page referenced in the __lock_page argument (and correlating with pages from hung processes on several other nodes), I see the pages have PG_locked, PG_dirty, and PG_error flags set). PG_locked explains why the system call went to sleep with TASK_UNINTERRUPTABLE on the page's wait queue. Why the lock is never released is another question! There is no NFS client retry activity on the eip0 interface; however, the NFS server and even the file remains responsive from the node with the hung process (located by tracing the page to its inode). I guess it is no suprise that dd'ing the entire file to /dev/null on the node with the hung process causes dd to hang on the same page - there is still a PG_locked bit on it and nobody is going to release the lock! =========Added garlick 2003-01-16 Applied trond patch described in audit trail and was unable to reproduce using single node test. Try test on whole cluster. Noted another failure mode on unpatched kerenl - after aborting reproducer prun with ctrl-C, no c2d processes were left behind but a umount hung in __wait_on_page. I *was* able to reproduce both the zombie c2d's and umounts on MDEV. A more "sure fire" reproducer (myzmb.tgz) that just reads some big files on two CPU's is attached. Instructions are in the .c file. This one so far has always created a zombie umount.
Created attachment 89428 [details] reproducer for the bug
From: Mike Haskell <haskell5> Reply-To: haskell5 To: Ben Woodard <bwoodard> Cc: Jim Garlick <garlick> Subject: Re: Kernel/131 Date: Fri, 17 Jan 2003 15:45:51 -0800 Actually, the parent goes into a 'D' state waiting for a page that will never arrive. The children finish and are waiting in do_wait() via exit() as zombies for their parent to come along and reap them. The umount hangs because of outstanding pages the parent has related to nfs server via the client. Why we didn't get a "umount: filesystem busy" error is unkown. It obviously thought it could be let go.
seems to be fixed in current RHEL3 kernels - LLNL has closed IT ticket --> closing BZ