RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1774790 - make: occasional deadlock when using parallel build
Summary: make: occasional deadlock when using parallel build
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: make
Version: 8.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: DJ Delorie
QA Contact: Michal Kolar
Oss Tikhomirova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-20 23:58 UTC by Akemi Yagi
Modified: 2023-09-07 21:03 UTC (History)
10 users (show)

Fixed In Version: make-4.2.1-10.el8
Doc Type: Bug Fix
Doc Text:
.`make` no longer slows down when using parallel builds Previously, while running parallel builds, `make` sub-processes could become temporarily unresponsive when waiting for their turn to run. As a consequence, builds with high `-j` values slowed down or ran at lower effective `-j` values. With this update, the job control logic of `make` is now non-blocking. As a result, builds with high `-j` values run at full `-j` speed.
Clone Of:
: 1785447 (view as bug list)
Environment:
Last Closed: 2020-04-28 17:03:19 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNU Savannah 51159 0 None None None 2019-12-19 21:51:14 UTC
Red Hat Product Errata RHBA-2020:1911 0 None None None 2020-04-28 17:03:22 UTC

Description Akemi Yagi 2019-11-20 23:58:44 UTC
This bug was initially created as a copy of Bug #1556839

I am copying this bug because: 
This bug is present in RHEL 8 (make-4.2.1-9.el8).


Description of problem:
Parallel make sometimes hangs with processes in zombie state. Happens with large projects like building the kernel.

Version-Release number of selected component (if applicable):
make-4.2.1-4.fc27.x86_64


How reproducible:
Occasionally.

Steps to Reproduce:
1. Do a parallel build of a large project like the linux kernel (make -j8 ...)
2.
3.

Actual results:
Build sometimes hangs and looking in the process list there are <defunct> processes

Expected results:
Build completes

Additional info:
Seems to be a deadlock where the jobserver waits for children to die but at least one child tries to read from the jobserver pipe.
This bug seems to be known upstream:
https://savannah.gnu.org/bugs/?51159
https://savannah.gnu.org/bugs/?49014 (duplicate)

There seems to be a fix in upstream git:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=b552b05251980f693c729e251f93f5225b400714

Comment 1 Akemi Yagi 2019-11-21 00:09:32 UTC
As described in the original Fedora bug report, building the kernel using make-4.2.1-9.el8 causes a number of defunct processes to be created. They are eventually reaped by the time the %build step completes. However they could significantly prolong the build time.

The upstream patch referenced in the description section does fix the issue.

Comment 2 Alan Bartlett 2019-11-22 22:54:12 UTC
Some example timings for kernel builds, -j8 (eight fold parallel).

A fully up-to-date RHEL-8.1 system with make-4.2.1-9.el8.x86_64, many defunct processes are observed.

Elapsed time: 71m 4s
Elapsed time: 54m 38s
Elapsed time: 83m 49s

With the referenced patch [1] applied to make, no defunct processes are observed.

Elapsed time: 23m 10s

[1] https://git.savannah.gnu.org/cgit/make.git/commit/?id=b552b05251980f693c729e251f93f5225b400714

Comment 3 Akemi Yagi 2019-12-09 21:02:41 UTC
The make bug attracted the attention and interest of Linus Torvalds. Quoting one of his posts on lkml.org:

https://lkml.org/lkml/2019/12/9/674

[quote]
[ Added DJ to the participants, since he seems to be the Fedora make
maintainer - DJ, any chance that this absolutely horrid 'make' buf can
be fixed in older versions too, not just rawhide? The bugfix is two
and a half years old by now, and the bug looks real and very serious ]

On Mon, Dec 9, 2019 at 1:54 AM Vincent Guittot
<vincent.guittot> wrote:
>
> Which version of make should I use to reproduce the problem ?

So the problematic one is "make-4.2.1-13.fc30.x86_64" in Fedora 30.
I'm assuming it's fairly plain 4.2.1, but I didn't try to look into
the source rpm or anything like that.

The working one for me was just the top of -git from

    https://git.savannah.gnu.org/git/make.git

which is 4.2.92 right now.

The fix is presumably commit b552b05 ("[SV 51159] Use a non-blocking
read with pselect to avoid hangs") as per Akemi. That is indeed after
4.2.1, and it looks real.
(snip snip)
But sadly, there's no way I can push that fair pipe wakeup thing as
long as this horribly buggy version of make is widespread.

                 Linus
[/quote]

Comment 5 DJ Delorie 2019-12-10 16:39:59 UTC
rawhide test build:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1420394

Comment 6 Akemi Yagi 2019-12-10 17:14:45 UTC
That test build is for Fedora. Please provide a patched make for RHEL 8 so that we can test.

Comment 10 Akemi Yagi 2019-12-18 17:52:17 UTC
Support Case #02541226.

Comment 15 Michal Kolar 2020-02-05 13:57:35 UTC
Verified against make-4.2.1-10.el8.
SanityOnly because of unstable reproducer. Required patch was successfully applied.

Comment 20 Oss Tikhomirova 2020-03-27 00:38:40 UTC
Thank you a lot, DJ.

Comment 22 errata-xmlrpc 2020-04-28 17:03:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1911


Note You need to log in before you can comment on or make changes to this bug.