Red Hat Bugzilla – Bug 242453
kernel doesn't support correctly fork()
Last modified: 2007-11-16 20:14:55 EST
Description of problem:
Application Vectis is using the fork() command and does not run any more over
native Infiniband with RHEL4U3 and voltaire gridstack-4.1.5_3.
Version-Release number of selected component (if applicable):
IB: Voltaire Gridstck-4.1.5_3
Try to run vectis with HPMPI
Steps to Reproduce:
1.phase5 -V 3.10p2 -np 8 -hf MPI_HOSTS piston.INP
Vectis is starting but nothing is running. all process are waiting for something.
some information from Ricardo:
I believe the problem on the cluster running VECTIS is due to the use
of fork() within VECTIS. If I remove all system calls from the code then
it runs ok. I've done some research on this and I have found that OFED
1.1 does not properly support fork(). Problems can be found on any
kernel version but for 2.6.12 and lower it is not supported at all.
Above 2.6.12 there is limited support.
Can you elaborate on what exactly you mean by saying "OFED does not properly
Sorry, This is the information, we get from Software vendor Ricardo.
I have no more deeper Information.
>From the OFED 1.1 Release Notes:
2. Fork support from kernel 2.6.12 and above is available provided
that applications do not use threads. The fork() is supported as long
as parent process does not run before child exits or calls exec().
The former can be achieved by calling wait(childpid) the later can be
achieved by application specific means. Posix system() call is
>From Open-MPI FAQ:
24. Can I use system() or fork() in an MPI application that uses the
The answer is, unfortunately, complicated.
* If you have a Linux kernel before version 2.6.16: no. Some
distros may provide patches for older versions (e.g, RHEL4 may
someday receive a hotfix).
* If you have a version of OFED before v1.2: sort of.
Specifically, newer kernels with OFED 1.0 and OFED 1.1 may
generally allow the use of system() and/or the use of fork() as
long as the parent does nothing until the child exits.
* If you have a Linux kernel >= v2.6.16 and OFED >= v1.2 and Open
MPI >=v1.2.1: yes.
This sounds like an MPI / Open Fabrics Limitation rather than a Kernel issue.
Note the end of the FAQ entry:
"NOTE: Arbitrary fork() support is not supported in the OpenFabrics software
stack. If you use fork() in your application, you must not touch any registered
memory before calling some form of exec() to launch another process. Calling
system() is safe."
I've checked the upstream git tree and there are no changesets that look to
correct bugs specific to Open-MPI or OFED. If there is a specific upstream
commit that enhances OpenFabrics or Open-MPI we can look into backporting it,
but as it stands now, this appears to me to be an MPI/OpenFabrics limitation,
rather than a kernel bug