Description of problem: From IT #53712: ----------------- Users here have noticed that Perl appears to correct for the old Linux threads model by caching the PID and PPID at the start of a perl process. It appears this is done as an attempt to make Perl programs more portable - by making Linux seem more Posix compliant. A simple recreate to demonstrate this is as follows: #!/usr/bin/perl print "this is the parent $$\n"; unless (fork) { #this is the child print "this is the child $$\n"; unless (fork) { # this is the grandchild print "this is the grandchild $$\n"; ################################## # THE FOLLOWING LINE DOESN'T WORK ################################## #sleep 1 until getppid == 1; ################################################################## # SO AN ALTERNATIVE TEST WHICH SHOWS THAT getppid NEVER RETURNS 1 ################################################################## while (($ppid=getppid) != 1) { print "getppid = $ppid\n"; print "$$: $ppid is alive?", (kill(0, $$ppid) ? "yes" : "no"), "\n"; sleep 1; } print "this is the grandchild going bye bye\n"; exit(0); } print "this is the child going bye bye\n"; exit 0; } print "this is the parent waiting\n"; wait; print "this is the parent going bye bye\n"; exit (0); There is an old Perl module Linux::PPID which works around this. However, it would seem that on Linux systems with NPTL, the correct default behavior of Perl would be to do what it does on other OSes with Posix (or nearly Posix) threads. I downloaded a copy of the Perl 5.8.5 source and I think the offending portion is in the hints/linux.sh file: cat > UU/usethreads.cbu <<'EOCBU' case "$usethreads" in $define|true|[yY]*) ccflags="-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS $ccflags" set `echo X "$libswanted "| sed -e 's/ c / pthread c /'` shift libswanted="$*" # Somehow at least in Debian 2.2 these manage to escape # the #define forest of <features.h> and <time.h> so that # the hasproto macro of Configure doesn't see these protos, # even with the -D_GNU_SOURCE. d_asctime_r_proto="$define" d_crypt_r_proto="$define" d_ctime_r_proto="$define" d_gmtime_r_proto="$define" d_localtime_r_proto="$define" d_random_r_proto="$define" ;; esac EOCBU It appears that the THREADS_HAVE_PIDS definition is what Perl uses to set up it's caching of PIDs / PPIDs. -Ryan ----------------------------------------- Event posted 11-11-2004 03:59pm by gavin ----------------------------------------- Yes, "grandchild" process should show that it's parent process is "1" once the "child" process ends. Looks like some problem in perl. But, "fork" has very little to do with threads and so I doubt that this problem has anything to do with NPTL or LinuxThreads, but I've been wrong before. Also, there is a bug in the testcase that causes it to give very confusing results. The reference to "$ppid" in the call to kill should only have one "$", not two. ----------------------------------------- Event posted 11-18-2004 03:26pm by dmaley ----------------------------------------- Discussed this issue during the weekly call today and requested that Ryan further explain why they believe this is related to moving to NPTL in RHEL3. This was in response to the post above by Gavin where he mentioned that he didn't feel this was a threading issue because fork has very little to do with threads. Ryan, please feel free to correct any of this if I explain things incorrectly. Apparently Perl implemented a way to provide applications with a more POSIX like interface under LinuxThreads, to assist in allowing Perl apps to be portable to and from Linux (LinuxThreads). The way they accomplished this was to cache the PID and PPID at the start of a perl process. However with NPTL threads are no longer pids and so this implementation doesn't work correctly. There's a perl module (Linux::PPID) which apparently prevents this non-standard behavior and reverts Perl back to normal POSIX behavior. However now that we include a POSIX compiant threading model, and considering that this isn't RH specific (ie. NPTL is upstream in 2.6), upstream Perl should be updated to remove the "workaround" for LinuxThreads. And obviously RH should incorporate this into the Perl we ship in RHEL. LLNL is currently hitting this issue in RHEL3, and it is believed this problem will also exist in RHEL4. ----------------------------------------- Event posted 11-18-2004 03:43pm by braby1 ----------------------------------------- I don't see anything wrong with Dave's explanation, but thought I'd paraphrase it so that hopefully someone ready both Dave's entry and mine would be able to get a good solid understanding of this. Without NPTL, Linux gave threads new PIDs. This meant that someone writing a threaded app in Perl would either have to know about this different Linux behavior or that Perl would have to hide that Linux did not follow POSIX threads. To allow for Perl portability, it appears that the Perl maintainers attempted to hide the Linux threads behavior by caching the PID and PPID of Perl processes and passing those onto any Perl threads. Unfortunately, this work around was not perfect, and breaks examples like the one posted above. For these cases someone developed the Linux::PPID module that would expose the "true" Linux PPIDs. Now that RHEL3 and the 2.6 kernel both have NPTL, it would seem that this behavior of caching PIDs and PPIDs to hide that the threads model is non-posix should be disabled on systems with NPTL. Looking through the make files, I think this can be done by simply not defining THREADS_HAVE_PIDS. Ideally, a somewhat intelligent method for determining this at build time could be found and put into the Perl build system. Version-Release number of selected component (if applicable): perl-5.8.0-88.9 How reproducible: Every time Steps to Reproduce: 1. run script provided in above description 2. 3. Actual results: "grandchild" process doesn't show that it's parent process is "1" once the "child" process ends Expected results: "grandchild" process should show that it's parent process is "1" once the "child" process ends Additional info:
this is a good analysis of the problem. I am hesitant to change it in RHEL3, but I will adjust the RHEL4 beta perl to properly undefine the THREADS_HAVE_PIDS setting, which should end the caching of ppids. it also may appear in future U releases of RHEL3.
What did it break? Knowing what to look for might allow us to catch problems faster.
Thanks much for the additional info here. Chip had also added this to the IT, so I wanted to add it here for anybody else who may be interested: "Sure. Basically the variable used to cache the ppid, PL_ppid, is no longer present in libperl.so. But anything that links against libperl.so and expects that symbol to be present will fail to start with an undefined symbol error. We saw this happen with mod_perl, for instance, the night after I made this change, so it would definitely have an impact on any software that links against perl."