Bug 654822
Summary: | GNU make hanging at end of build | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Anthony Green <green> | ||||
Component: | make | Assignee: | Petr Machata <pmachata> | ||||
Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-tools-bugs | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | mnewsome | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-11-25 12:10:35 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anthony Green
2010-11-18 20:11:21 UTC
Not reproducible with 5.9.4-7, the latest in fedora repository. Could you attach the problematic srpm to this bugzilla? This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. Created attachment 472256 [details]
srpm that won't finish building
It's reproducible with that srpm. (Even on Fedora with the right make version.) What seems to be minimal reproducer: --checkopt-xx.def-- AutoGen Definitions options; prog-name = check; prog-title = "Checkout Automated Options"; flag = { name = e; }; --Makefile-- all-am: agen5/autogen checkopt-xx.def # $(MAKE) echo DONE $ make -C doc -r -j2 Notes: the comment with $(MAKE) has to be there. Other variable won't do. It has to be the new autogen that is launched. In -jN, N must be >1 (supposedly to enable jobserver). So the minimal stand-alone reproducer is this: --hang.mf-- run: hang +./hang echo DONE --hang.c-- #include <stdio.h> int main(int argc, char ** argv) { if (fork() == 0) execl("/bin/cat", "/bin/cat", NULL); return 0; } $ make -r -j2 -f hang.mf run ./hang echo DONE DONE #and it hangs, waiting for cat to finish. Pressing C-d does that. When you remove the initial "+", make doesn't hang. I don't know what the problem is yet, but that's the essence of the autogen build hang. In autogen what hangs make is the process "sh". When rpmbuild hangs, "pstree" shows a pack of sh's rooted right under "init", as the autogen process that launched them died without collecting them. Killing those sh's un-hangs the build and rpmbuild finishes with error. The easiest workaround is not to pass %{?_smp_mflags} to make. I'll look into fixing the make problem next. When make sees $(MAKE) or initial + in recipe, it assumes that the command will recurse and therefore, in jobserver mode, leaves the jobserver pipe open in sub-process. (That pipe is used to coordinate parallel builds in face of several make instances.) Before the toplevel make exits, it looks into that pipe and waits for all the synchronization tokens to turn up. But your build is stuck in some innocent "sh" that has no idea that it's supposed to be part of a recursive build and that it should close those descriptors that it will never use anyway. So it doesn't, and the toplevel make hangs there indefinitely. On make side, dropping master_job_slots sanity check in main.c:clean_jobserver gets rid of the problem. On autogen side, in doc/Makefile.in, doing something like this gets rid of the recursion trigger: _MAKE := $(MAKE) agdoc.texi : # self-depends upon all executables MAKE=$(_MAKE) ./mk-agen-texi.sh But note that this is just working around the problem. That variable is being passed down presumably to be used in recursive make invocation, so technically make is right to catch that. The only upstreamable solution, I think, would be if autogen collected its children. I don't know why it doesn't, I think I've seen some comments related to SIGCHLD etc. in the code. FWIW, the shell that stays hanging is opened in agen5/agShell.c:chainOpen. |