I have been writing an article on the clone system call. A program I wrote that works under Slackware doesn't work properly. It compiles, and then runs incorrectly. It clones a new child process, then the parent is supposed to sleep for 1 second, while the child does some work. What actually happens is that both parent and child sleep for 1 second. Then parent exits before the child can finish and I get a core dump. I know my program works OK because it works under other distributions. Is this a kernel problem, or is it libc? Any help would be great.
It's hard to tell without your program. Can you post the source?
I'm sorry I didn't post this info originally, but I was away from my home box. The relevant version numbers are: kernel-source-2.2.16-3 libc-5.3.12-31 glibc-2.1.3-21 The source that I was getting the errors with is: #include <unistd.h> #include <sched.h> #include <sys/types.h> #include <stdlib.h> int do_something() { printf("start clone\n"); printf("Clone closes\n"); exit(0); } int main(int argc, char *argv[]) { pid_t chpid; void **child_stack; printf("MALLOC\n"); child_stack = (void **) malloc(16384); printf("Address of stack = %d\n", (int)child_stack); printf("CLONE\n"); chpid = clone(do_something, child_stack, CLONE_VM|CLONE_FILES, NULL); printf("Parent pid = %d\n", getpid()); printf("Child pid = %d\n", chpid); sleep(1); printf("Parent closes\n"); return 0; } It shows the PID's of the parent and child, the child prints its first line, then when the parent sleeps, the child also sleeps. The parent then wakes up, prints its last line and exits. At this point, the child has not finished yet, so I get a seg-fault and a core dump. I know this works on other linux distributions, so I'm not sure what to do next. Any ideas?
You cannot mix the glibc printf type functions with clone safely. They assume you use the pthread wrappers and interface. Try using write() and _exit
The problems with printf are not relevant here. The problem involves context switching between the parent and child. I have successfully used this program on another machine (winlinux2000) with kernel 2.2.13 and glibc 2.1.2. Could you please verify whether this is an actual bug or not? I do not have a second machine with Redhat 6.2 installed to test it on. If this is not an actual bug, then my system has been damaged somehow, and I should reinstall. If it is a bug, then there is something wrong with either the kernel, or the libraries shipped with Redhat 6.2.
The problem is not using pthreads but using pthread assuming libraries. printf isnt safe the way you use it, exit isnt the right call and the library unwind code will drop your dlsyms before the other thread uses them. libc expects you access it in standards compliant ways. (rebuild it -static with no printf/exit calls and you'll find it works for example)
Thank you for the help. This is the very reason I'm writing this article, for people like me who don't realize the ins and outs of problems like this. Would you have any suggestions of other things I should cover in this article? Anything else you think might trip up other beginning programmers?