From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.19-6.2.1 i686; en-US; Galeon) Gecko/20010216 After upgrade to nfs-utils-0.3.1-0.6.x.1 on our network (Linux RH 6.2/7.0 and Thru64), zombie lockd processes started to appear on the Linux NFS server. Reproducible: Sometimes Steps to Reproduce: 1. use nfs-ultils-0.3.1-0.6.x.1 2. wait 1-2 days. Reboot some clients in meantime. Actual Results: The ps afx output shows: root 26178 0.0 0.0 0 0 ? SW Apr24 0:08 [nfsd] root 26186 0.0 0.0 0 0 ? SW Apr24 0:00 \_ [lockd] root 26187 0.0 0.0 0 0 ? SW Apr24 0:00 \_ [rpciod] root 28739 0.0 0.0 0 0 ? Z Apr24 0:00 \_ [lockd <defunct>] Expected Results: root 1112 0.0 0.0 0 0 pts/0 SW 10:41 0:00 [nfsd] root 1120 0.0 0.0 0 0 pts/0 SW 10:41 0:00 \_ [lockd] root 1121 0.0 0.0 0 0 pts/0 SW 10:41 0:00 \_ [rpciod] This is not a big deal but having too many zombies hanging around can be slightly scary and I don't feel like upgrading to 2.4.2 on our main server yet.
For the record: I rebooted some Linux clients and haven't got any new zombie processes since the report date (two days ago). Simple disk remounting on Thru64 boxes does not create zombies neither. It is strange in a way, because before I reported the problem, I had to restart nfs services twice to get rid of zombies. I will keep you informed.
I should add that my attempts to reproduce this behavior failed.
Okay, good to know that the zombie lockd's are not reappearing. If you see this behaviour in the future, please reopen this bug report.
I have actually observed it again: root 1112 0.0 0.0 0 SW Apr25 4:01 [nfsd] root 1120 0.0 0.0 0 SW Apr25 0:01 \_ [lockd] root 1121 0.0 0.0 0 SW Apr25 0:00 \_ [rpciod] root 21020 0.0 0.0 0 Z May08 0:00 \_ [lockd <defunct>] root 1113 0.0 0.0 0 SW Apr25 4:03 [nfsd] I left it hanging around, so if you needed more information (or correlate it with some info from log files, for instance), let me know.
FYI: yet another zombie process appeared couple of days ago: root 1112 0.0 0.0 0 SW Apr25 4:21 [nfsd] root 1120 0.0 0.0 0 SW Apr25 0:01 \_ [lockd] root 1121 0.0 0.0 0 SW Apr25 0:00 \_ [rpciod] root 21020 0.0 0.0 0 Z May08 0:00 \_ [lockd <defunct>] root 6089 0.0 0.0 0 Z May18 0:00 \_ [lockd <defunct>] Is it lockd that does not wait for its children? I guess I could look at the source if I knew where to start from.
I have noticed a correlation: the zombie process is created when one of the client boxes is rebooted. When it tries to mount the NFS again (or connect to remote lockd; I have not sufficient time resolution), zombie proces is created.
the same problem at our main fileserver: 557 ? SW 11:54 [nfsd] 565 ? SW 0:00 \_ [lockd] 566 ? SW 0:00 \_ [rpciod] 12799 ? Z 0:00 \_ [lockd <defunct>] 9876 ? Z 0:00 \_ [lockd <defunct>] before the update the zombies didn't appear; will it be fixed someday?
> will it be fixed someday? There is an experimental patch for 2.4.7 kernels at http://www.fys.uio.no/~trondmy/src/2.4.7/linux-2.4.7-reclaim.dif but Linus has not yet accepted the patch into the mainstream. If/when Linus accepts the patch, a backport to the 2.2 kernel series is possible, in which case we would issue it as a kernel errata. Unfortunately, I can't offer a firm timeframe.
I think the patch made it into 2.4.10 finally...
Does it work ok on RHL9?
I cannot tell about RH9, but RH7.3+ current updates is OK (i.e problem resolved). Thanks! (should the resolution be "current release", or "errata"?)
"Currentrelease" sounds good. Closing.