Bug 1436446 - getpid() in child process created using clone(CLONE_VM) returns parent's pid
Summary: getpid() in child process created using clone(CLONE_VM) returns parent's pid
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 25
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-27 23:43 UTC by Andrew Vagin
Modified: 2017-04-03 14:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-03 14:15:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Andrew Vagin 2017-03-27 23:43:09 UTC
Description of problem:

When a child process is created by clone(CLONE_VM), getpid() returns a parent PID. This problem exists in fc25 (glibc-2.24-4.fc25.x86_64) and doesn't exist in fc24 (glibc-2.23.1-11.fc24.x86_64).

[avagin@laptop ~]$ cat test1.c 
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <signal.h>
#include <sys/wait.h>

int child(void *a)
{
	printf("1)child: pid=%ld\n", syscall(__NR_getpid));
	printf("2)child: pid=%d\n", getpid());
	if (getpid() != syscall(__NR_getpid))
		printf("FAIL\n");
	return 0;
}

int main(void)
{
	int stack_size = 2 * 1024 * 1024, status;
	char *stack;
	pid_t pid;

	stack = mmap(NULL, stack_size, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
	if (stack == MAP_FAILED) {
		perror("Can't allocate stack");
		exit(1);
	}

	setbuf(stdout, NULL);

	printf("parent: pid=%d\n", getpid());
	pid = clone(child, stack + stack_size, CLONE_VM | CLONE_FILES | SIGCHLD, NULL);
	printf("parent: fork pid=%d\n", pid);
	waitpid(pid, &status, 0);
}

[avagin@laptop ~]$ rpm -q glibc
glibc-2.24-4.fc25.x86_64
glibc-2.24-4.fc25.i686
[avagin@laptop ~]$ gcc -Wall test1.c 
[avagin@laptop ~]$ ./a.out 
parent: pid=8301
parent: fork pid=8302
1)child: pid=8302
2)child: pid=8301
FAIL

[root@fc24 ~]# rpm -q glibc
glibc-2.23.1-11.fc24.x86_64
glibc-2.23.1-11.fc24.i686
[root@fc24 ~]# gcc -Wall test1.c 
[root@fc24 ~]# ./a.out 
parent: pid=10453
parent: fork pid=10454
1)child: pid=10454
2)child: pid=10454


Version-Release number of selected component (if applicable):
glibc-2.24-4.fc25.x86_64

How reproducible:
100%


Steps to Reproduce:
1. Compile the previous program
2. Check that getpid() and syscall(__NR_getpid) return the same values for a child process

Actual results:
getpid() returns a parent PID

Expected results:
getpid() returns a child PID

Comment 2 Andrew Vagin 2017-03-28 06:01:02 UTC
Here is the quote from the glibc bugzilla:
Adhemerval Zanella 2017-03-28 04:24:22 UTC
As you noted it was fixed by c579f48 (Remove cached PID/TID in clone) on master by removing the Linux getpid implementation altogether (and then use the auto-generation syscall).  I think for 2.24 the straightforward fix is just remove getpid Linux implementation.

Comment 3 Carlos O'Donell 2017-04-03 14:15:17 UTC
(In reply to Andrew Vagin from comment #2)
> Here is the quote from the glibc bugzilla:
> Adhemerval Zanella 2017-03-28 04:24:22 UTC
> As you noted it was fixed by c579f48 (Remove cached PID/TID in clone) on
> master by removing the Linux getpid implementation altogether (and then use
> the auto-generation syscall).  I think for 2.24 the straightforward fix is
> just remove getpid Linux implementation.

This is fixed in F26 and onward and will not be backported to Fedora 25 unless a specific application is being impacted by the functionality in question.

The workaround is to use syscall to call getpid directly and bypass the core library cache.

Applications that directly use clone must understand all the coordination aspects involved with the underlying core library that is maintaining the threading model provided by the language constructs e.g. pthread_create etc.

Please reopen this bug is the workaround is unsupportable for you for some reason, or if the fix in F26 is insufficient.


Note You need to log in before you can comment on or make changes to this bug.