This bug was initially created as a copy of Bug #1556839
I am copying this bug because:
This bug is present in RHEL 8 (make-4.2.1-9.el8).
Description of problem:
Parallel make sometimes hangs with processes in zombie state. Happens with large projects like building the kernel.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Do a parallel build of a large project like the linux kernel (make -j8 ...)
Build sometimes hangs and looking in the process list there are <defunct> processes
Seems to be a deadlock where the jobserver waits for children to die but at least one child tries to read from the jobserver pipe.
This bug seems to be known upstream:
There seems to be a fix in upstream git:
As described in the original Fedora bug report, building the kernel using make-4.2.1-9.el8 causes a number of defunct processes to be created. They are eventually reaped by the time the %build step completes. However they could significantly prolong the build time.
The upstream patch referenced in the description section does fix the issue.
Some example timings for kernel builds, -j8 (eight fold parallel).
A fully up-to-date RHEL-8.1 system with make-4.2.1-9.el8.x86_64, many defunct processes are observed.
Elapsed time: 71m 4s
Elapsed time: 54m 38s
Elapsed time: 83m 49s
With the referenced patch  applied to make, no defunct processes are observed.
Elapsed time: 23m 10s
The make bug attracted the attention and interest of Linus Torvalds. Quoting one of his posts on lkml.org:
[ Added DJ to the participants, since he seems to be the Fedora make
maintainer - DJ, any chance that this absolutely horrid 'make' buf can
be fixed in older versions too, not just rawhide? The bugfix is two
and a half years old by now, and the bug looks real and very serious ]
On Mon, Dec 9, 2019 at 1:54 AM Vincent Guittot
> Which version of make should I use to reproduce the problem ?
So the problematic one is "make-4.2.1-13.fc30.x86_64" in Fedora 30.
I'm assuming it's fairly plain 4.2.1, but I didn't try to look into
the source rpm or anything like that.
The working one for me was just the top of -git from
which is 4.2.92 right now.
The fix is presumably commit b552b05 ("[SV 51159] Use a non-blocking
read with pselect to avoid hangs") as per Akemi. That is indeed after
4.2.1, and it looks real.
But sadly, there's no way I can push that fair pipe wakeup thing as
long as this horribly buggy version of make is widespread.
rawhide test build:
That test build is for Fedora. Please provide a patched make for RHEL 8 so that we can test.
Support Case #02541226.
Verified against make-4.2.1-10.el8.
SanityOnly because of unstable reproducer. Required patch was successfully applied.
Thank you a lot, DJ.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.