Bug 1774790 - make: occasional deadlock when using parallel build
Summary: make: occasional deadlock when using parallel build
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: make
Version: 8.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: DJ Delorie
QA Contact: Michal Kolar
Oss Tikhomirova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-20 23:58 UTC by Akemi Yagi
Modified: 2021-09-17 14:37 UTC (History)
10 users (show)

Fixed In Version: make-4.2.1-10.el8
Doc Type: Bug Fix
Doc Text:
.`make` no longer slows down when using parallel builds Previously, while running parallel builds, `make` sub-processes could become temporarily unresponsive when waiting for their turn to run. As a consequence, builds with high `-j` values slowed down or ran at lower effective `-j` values. With this update, the job control logic of `make` is now non-blocking. As a result, builds with high `-j` values run at full `-j` speed.
Clone Of:
: 1785447 (view as bug list)
Environment:
Last Closed: 2020-04-28 17:03:19 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNU Savannah 51159 0 None None None 2019-12-19 21:51:14 UTC
Red Hat Product Errata RHBA-2020:1911 0 None None None 2020-04-28 17:03:22 UTC

Description Akemi Yagi 2019-11-20 23:58:44 UTC
This bug was initially created as a copy of Bug #1556839

I am copying this bug because: 
This bug is present in RHEL 8 (make-4.2.1-9.el8).


Description of problem:
Parallel make sometimes hangs with processes in zombie state. Happens with large projects like building the kernel.

Version-Release number of selected component (if applicable):
make-4.2.1-4.fc27.x86_64


How reproducible:
Occasionally.

Steps to Reproduce:
1. Do a parallel build of a large project like the linux kernel (make -j8 ...)
2.
3.

Actual results:
Build sometimes hangs and looking in the process list there are <defunct> processes

Expected results:
Build completes

Additional info:
Seems to be a deadlock where the jobserver waits for children to die but at least one child tries to read from the jobserver pipe.
This bug seems to be known upstream:
https://savannah.gnu.org/bugs/?51159
https://savannah.gnu.org/bugs/?49014 (duplicate)

There seems to be a fix in upstream git:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=b552b05251980f693c729e251f93f5225b400714

Comment 1 Akemi Yagi 2019-11-21 00:09:32 UTC
As described in the original Fedora bug report, building the kernel using make-4.2.1-9.el8 causes a number of defunct processes to be created. They are eventually reaped by the time the %build step completes. However they could significantly prolong the build time.

The upstream patch referenced in the description section does fix the issue.

Comment 2 Alan Bartlett 2019-11-22 22:54:12 UTC
Some example timings for kernel builds, -j8 (eight fold parallel).

A fully up-to-date RHEL-8.1 system with make-4.2.1-9.el8.x86_64, many defunct processes are observed.

Elapsed time: 71m 4s
Elapsed time: 54m 38s
Elapsed time: 83m 49s

With the referenced patch [1] applied to make, no defunct processes are observed.

Elapsed time: 23m 10s

[1] https://git.savannah.gnu.org/cgit/make.git/commit/?id=b552b05251980f693c729e251f93f5225b400714

Comment 3 Akemi Yagi 2019-12-09 21:02:41 UTC
The make bug attracted the attention and interest of Linus Torvalds. Quoting one of his posts on lkml.org:

https://lkml.org/lkml/2019/12/9/674

[quote]
[ Added DJ to the participants, since he seems to be the Fedora make
maintainer - DJ, any chance that this absolutely horrid 'make' buf can
be fixed in older versions too, not just rawhide? The bugfix is two
and a half years old by now, and the bug looks real and very serious ]

On Mon, Dec 9, 2019 at 1:54 AM Vincent Guittot
<vincent.guittot> wrote:
>
> Which version of make should I use to reproduce the problem ?

So the problematic one is "make-4.2.1-13.fc30.x86_64" in Fedora 30.
I'm assuming it's fairly plain 4.2.1, but I didn't try to look into
the source rpm or anything like that.

The working one for me was just the top of -git from

    https://git.savannah.gnu.org/git/make.git

which is 4.2.92 right now.

The fix is presumably commit b552b05 ("[SV 51159] Use a non-blocking
read with pselect to avoid hangs") as per Akemi. That is indeed after
4.2.1, and it looks real.
(snip snip)
But sadly, there's no way I can push that fair pipe wakeup thing as
long as this horribly buggy version of make is widespread.

                 Linus
[/quote]

Comment 5 DJ Delorie 2019-12-10 16:39:59 UTC
rawhide test build:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1420394

Comment 6 Akemi Yagi 2019-12-10 17:14:45 UTC
That test build is for Fedora. Please provide a patched make for RHEL 8 so that we can test.

Comment 10 Akemi Yagi 2019-12-18 17:52:17 UTC
Support Case #02541226.

Comment 15 Michal Kolar 2020-02-05 13:57:35 UTC
Verified against make-4.2.1-10.el8.
SanityOnly because of unstable reproducer. Required patch was successfully applied.

Comment 20 Oss Tikhomirova 2020-03-27 00:38:40 UTC
Thank you a lot, DJ.

Comment 22 errata-xmlrpc 2020-04-28 17:03:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1911


Note You need to log in before you can comment on or make changes to this bug.