When I was building glibc 2.5 from CVS with "make -j4" on dual processor machines, I got make[3]: *** No rule to make target `/export/build/gnu/glibc-nptl-local/build-x86_64-linux/iconv/charmap.o', needed by `others'. Stop. make[3]: *** Waiting for unfinished jobs.... make[3]: Leaving directory `/export/gnu/src/glibc/libc/iconv' make[2]: *** [iconv/others] Error 2 make[2]: Leaving directory `/export/gnu/src/glibc/libc' make[1]: *** [all] Error 2 make[1]: Leaving directory `/export/build/gnu/glibc-nptl-local/build-x86_64-linux' "make -j1" worked fine. make-3.80-10.2 from FC5 has no problem with "make -j4". It happens on both x86 and x86-64.
Hmm, so it's reproducible, right? Unfortunately 3.81-1.1 seems to do a good job for me so far, gave it a shot at several x86-based machines and it just works.
I have seen this problem on 2 processor x86, x86-64 and ia64 machines when I use "make -j4" to build glibc 2.5. How did you do build glibc? Can you show me your /proc/cpuinfo?
I even saw it with "make -j2" on a single processor x86 machine.
The command I used is make -jN PARALLELMFLAGS="-jN" where N == 2 x NUM_OF_CPUs.
To reproduce it, after glibc build is done, in glibc build directory: [hjl@gnu-25 build-x86_64-linux]$ rm -rf iconv [hjl@gnu-25 build-x86_64-linux]$ make -j4 PARALLELMFLAGS=-j4 > make.log make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. mkdir /export/build/gnu/glibc-nptl-local/build-x86_64-linux/iconv gconv_open.c: In function ‘__gconv_open’: gconv_open.c:59: warning: ‘ptr’ may be used uninitialized in this function gconv_open.c: In function ‘__gconv_open’: gconv_open.c:59: warning: ‘ptr’ may be used uninitialized in this function make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. make[2]: warning: -jN forced in submake: disabling jobserver mode. No rule to make target `/export/build/gnu/glibc-nptl-local/build-x86_64-linux/iconv/charmap.o', needed by `others' make[1]: *** [iconv/others] Aborted (core dumped) make: *** [all] Error 2 [hjl@gnu-25 build-x86_64-linux]$
Created attachment 139496 [details] A patch The problem is when start_job_command closes job_fds, it doesn't set them to -1. Then the same fd is returned by opendir. Later it is used for pipe again. From there, everything goes down hill.
Created attachment 139522 [details] An updated patch clean_jobserver should also set job_fds to -1 after closing them.
My patch breaks job server. I am looking into it now.
Got it reproduced now, the trick was the PARALLELMFLAGS="-jN" part, I think. Investigating.
Created attachment 139578 [details] A new patch Here is the patch to fix. When make re-execs itself, it calls clean_jobserver which may close job_fds. After it is re-execed, it reads job_fds from jobserver_fds again and closes them when jobserver mode is disabled. It closes the same fd twice. The second time it closes the wrong file. This patch sets jobserver_fds_invalid_flag after closing job_fds and checks it before closing job_fds.
Nice work, many thanks!
I need to understand the situation better. I'm not convinced that the patch provided is actually correct. The only time clean_jobserver() closes the FDs is when it is invoked by the master (top-level) make instance. The master make instance never has the --jobserver-fds flag in its command line, since that is an internal flag that is added by make itself when it invokes sub-makes. When the master make instance finishes building makefiles and re-execs itself, it considers itself as a brand new instance of make (which it is) and re-opens the jobserver pipes and hands those new values to its children. I've annotated the source and done some tests and verified all these things. There must be something else unexpected going on here if the problem is as you describe it. I can't find a "normal" code path the results in the behavior described in comment #10. I rather suspect it has something to do with the PARALLELMFLAGS="-jN" variable. How is this variable actually used in the makefiles?
Make handles error conditions poorly, which hides the real problem and results in misleading error messages. In this particular case, we have #define ENULLLOOP(_v,_c) do{ errno = 0; \ while (((_v)=_c)==0 && errno==EINTR); }while(0) ENULLLOOP (d, readdir (dir->dirstream)); if (d == 0) break; Because this bug under discussion, dir->dirstream is now referring to a pipe instead of a directory. But ENULLLOOP doesn't check ENOTDIR at all. Better error handling will make it easier to identify where the real problem is. At least it should do if (d == 0) { assert (errno == 0); break; }
The purpose of these special loops is to try to work properly on systems where SA_RESTART does not work universally (Solaris, for example, is such a system). On those systems virtually any system call can be interrupted, so this loop is intended to mask that in the code. The macros are _not_ supposed to handle general error conditions beyond EINTR. However, you're absolutely correct that the code after the macro should look for error conditions. I still fail to understand exactly what the bug is. It seems that someone must understand what's going on since a patch has been produced, but there is no description of what's happening and it's not obvious (to me) from the patch. I _especially_ don't understand how this dir.c code relates to the original bug, and how dir->dirstream could be referring to a pipe rather than a directory (!) If that's true there's something _really_ wrong internally, as far as I can make out. If someone could jot down even a high-level description of how the bug is triggered that would help me a lot. Cheers!
I can't reproduce it with a simple testcase. I can only see it when I was building glibc with "make -jN PARALLELMFLAGS=-jN" on both single processor and multiple processor machines. Glibc Makefiles calls make with $(MAKE) $(PARALLELMFLAGS) ... and $(MAKE) -r PARALLELMFLAGS="$(PARALLELMFLAGS)" ... Basically, every single make invocation is called with -jN. Do you have a Linux machine to build glibc?
Hm. With GNU make 3.81beta4, which comes on my Ubuntu Dapper box, the bug doesn't happen. But with the latest CVS version of GNU make, I can reproduce it. I'll look into it.
I'm chiming in for the time here. Think the problem is this http://www.gnu.org/software/make/manual/make.html#Archive-Pitfalls Seems to me the solution would be to enhance make to recognize that archive members are part of the same target. I'm assuming that no more than one job will be scheduled for the same target at the same time in the current logic, but if not that would be needed too.
BTW an interim work-around for this problem is to remove the implicit archive rule and replace it with a single explicit archive step for each archive. These steps should not use the archive member syntax. Since I'm not expecting a fix tomorrow, that's what I'm doing.
I really don't think that this problem is related at all to the archive issues. It works fine in beta4 and fails in the release. The issue you discuss here has existed forever; it's even documented in the manual to work this way. Further, make has no problem per se with this: it's just your output archive that is bolloxed up. In this bug, make actually can't figure out how to build something. If this problem is fixed by the changes you're making I can only assume that it's a coincidence. If we're talking about workarounds the best one is quite simple: stop adding PARALLELMFLAGS=-jN to your make invocations! The entire point of the jobserver feature in GNU make is that all recursive instances share the same pool of jobs, so forcing every sub-make to have N jobs kind of defeats the purpose. Please understand I'm not at all saying that this behavior is not a bug or that it shouldn't be fixed. I have some debug logs and I'll look at them. I'm just saying that if you use the jobserver as designed you won't see this failure. Cheers!
Another problem is multiple targets. See http://docs.sun.com/source/806-3573/Dmake.html GNU make lacks the "+" construct available in Sun 'dmake'. I was able to work around it by placing all the multiple target rules in a separate tree and running a separate single-threaded pass for them.
(In reply to comment #12) > When the master make instance finishes building makefiles and re-execs itself, > it considers itself as a brand new instance of make (which it is) and re-opens > the jobserver pipes and hands those new values to its children. I've > annotated the source and done some tests and verified all these things. The sequence of events is as follows: start sub-make with -j2 on commandline and --jobserver-fds=3,4 in MAKEFLAGS make opens lots of DIR streams (not important right now) warning: -jN forced in submake: disabling jobserver mode => close(3); close(4) open new job_fds through pipe (fds 3, 4 again, these fds are now free) make decides it has to reexec close(3); close(4) in clean_jobserver to clean up after ourselves restart sub-make with -j2 on commandline and --jobserver-fds=3,4 in MAKEFLAGS open lots of dirs, don't close one of them fd=4, because it's not exhausted yet warning: -jN forced in submake: disabling jobserver mode => close(3); close(4) => oops! 3 is invalid, 4 refers to directory stream! => make[2]: *** No rule to make target etc. etc. Does it make a sense?
Created attachment 204131 [details] Fix for this problem. The patch leaves whole make logic intact, but only actually calls the problematic `close' if it's first iteration of make, i.e. when make restarts is zero. Opinions? Does it break somewhere?
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This is fixed in rawhide and F8.