Bug 114341 - semctl(blah, GETPID) is broken
semctl(blah, GETPID) is broken
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jim Paradis
Brian Brock
http://groups.google.com/groups?hl=en...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-01-26 16:55 EST by Mikhail Kruk
Modified: 2013-08-05 21:03 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-12-05 15:11:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mikhail Kruk 2004-01-26 16:55:02 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6)
Gecko/20040113

Description of problem:
Here is the text of the report submitted by Anton Lavrentiev in 2002. 
(http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&c2coff=1&threadm=fa.hd5olgv.1e0qt21%40ifi.uio.no&rnum=3&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26c2coff%3D1%26q%3Dsemctl%2BGETPID%2Blinux%26btnG%3DGoogle%2BSearch)
The bug is still there.

Dear Linux Developers:

I reported this bug a while ago but seems that it is still there.

If "wait for zero" on the IPC semaphore is going to be blocked,
then it clobbers PID of the last process, which operated on
semaphore, actually reverting the PID to be the PID of the last but
one process that made an operation.

In particular, if the syscall was interrupted by a signal, restarted,
and interrupted again, the PID of last process operated on the semaphore
(obtainable via semctl(...GETPID...)) becomes unconditionally zero.

The bug comes from the following fragment of file (look at the lines
marked with exclamation points)

ipc/sem.c, try_atomic_semop():
---------------------------------------------------------------------------
                 if (!sem_op && curr->semval) /*!!!!!!*/
                         goto would_block;

                 curr->sempid = (curr->sempid << 16) | pid; /*!!!!!!*/

                 ........

would_block:
         if (sop->sem_flg & IPC_NOWAIT)
                 result = -EAGAIN;
         else
                 result = 1;

undo:
         while (sop >= sops) {
                 curr = sma->sem_base + sop->sem_num;
                 curr->semval -= sop->sem_op;
                 curr->sempid >>= 16; /*!!!!!!*/
---------------------------------------------------------------------------

The simplest fix (that works!) is just to swap the "wait for zero"
condition
and PID backup, like this:

                 curr->sempid = (curr->sempid << 16) | pid;

                 if (!sem_op && curr->semval)
                         goto would_block;


Best regards,

Anton Lavrentiev
NCBI/NLM/NIH
Bethesda MD 20894

Version-Release number of selected component (if applicable):
2.4.9-e.30

How reproducible:
Always

Steps to Reproduce:
I have an application which waits on a semaphore with sigalarm set to
one second, it gets interrupted, then waits again, then gets
interrupted again.
If later I try to get the PID of the last process which accessed the
given semaphore I always get a 0.

Actual Results:  get 0 pid

Expected Results:  get the real pid of the last app which accessed the
semaphore.

Additional info:
Comment 1 Need Real Name 2004-01-29 03:41:45 EST
mcorrigan@biscom.com
Comment 2 Austin France 2004-10-29 08:20:36 EDT
This bug exists in previous versions of redhat/kernels (rh7.3 2.4.18 
for example) and is fixed in Kernel 2.4.22 I believe

Sample code that demonstrates the bug:

#include <sys/types.h>
#include <sys/sem.h>
#include <errno.h>

int main() {
	int semid, res;
	pid_t pid;
	struct sembuf semops[2];

	printf("parent: pid = %d\n", getpid());

	/* create a semaphore set */
	semid = semget(0xDEADBEEF, 1, IPC_CREAT|0666);
	if (semid==-1) {
		perror("parent: semget");
		return 1;
	}

	printf("parent: semid = %d\n", semid);

	/* raise the semaphore */
	semops[0].sem_num = 0;				/* Semaphore 
0 */
	semops[0].sem_op = 0;				/* Test 0 */
	semops[0].sem_flg = IPC_NOWAIT;		/* dont wait */
	semops[1].sem_num = 0;
	semops[1].sem_op = 1;				/* Increment 
*/
	semops[1].sem_flg = 0;
	res = semop(semid, semops, 2);
	if (res==-1) {
		perror("parent: semop");
		return 1;
	}

	printf("parent: got semaphore\n", semid);

	/* Ok we have the semaphore... spawn a sub-process */
	pid = fork();

	if (pid == 0) {
		printf("child: pid = %d\n", getpid());

		/* get the semaphore set */
		semid = semget(0xDEADBEEF, 1, IPC_CREAT|0666);
		if (semid==-1) {
			perror("child: semget");
			return 1;
		}

		printf("child: semid = %d\n", semid);

		/* raise the semaphore */
		semops[0].sem_num = 0;				/* 
Semaphore 0 */
		semops[0].sem_op = 0;				/* 
Test 0 */
		semops[0].sem_flg = IPC_NOWAIT;		/* dont wait 
*/
		semops[1].sem_num = 0;
		semops[1].sem_op = 1;				/* 
Increment */
		semops[1].sem_flg = 0;

		res = semop(semid, semops, 2);
		if (res==-1 && errno==11) {
			printf("child: resource locked\n");

			/* Now, we should be able to query the 
semaphore and see */
			/* who has it */
			res = semctl(semid, 0, GETPID, 0);
			if (res == -1) {
				perror("child: semctl");
				return 1;
			}

			printf("child: Pid %d last changed the 
semaphore\n", res);

			/* all ok so far, however if we now try and 
obtain the lock */
			/* again, we hit the bug... */
			semops[0].sem_num = 0;			
	/* Semaphore 0 */
			semops[0].sem_op = 0;			
	/* Test 0 */
			semops[0].sem_flg = IPC_NOWAIT;		/* 
dont wait */
			semops[1].sem_num = 0;
			semops[1].sem_op = 1;			
	/* Increment */
			semops[1].sem_flg = 0;

			res = semop(semid, semops, 2);
			if (res == -1 && errno==11) {		/* We 
expect this */
				printf("child: resource still 
locked\n");
				res = semctl(semid, 0, GETPID, 0);
				if (res == -1) {
					perror("child: semctl");
					return 1;
				}

				printf("child: ** THE BUG **\n");
				printf("child: Pid %d last changed 
the semaphore\n", res);
				printf("child: this should say\n");
				printf("child: Pid %d last changed 
the semaphore\n",
					getppid());

				/* Give up */
				return 0;
			}

			/* drop through to check result */
		}

		/* this should fail with errno 11 (which is ok) */
		if (res==-1) {
			perror("child: semop");
			return 1;
		}

		printf ("child: got semaphore!! this should not have 
happened\n");
		return 1;
	}

	sleep(2);

	/* Release the semaphore */
	res = semctl (semid, 0, SETVAL, 0);
	if (res == -1) {
		perror("parent: semctl SETVAL");
		return 1;
	}

	res = semctl (semid, 0, IPC_RMID, 0);
	if (res == -1) {
		perror("parent: semctl IPC_RMID");
		return 1;
	}

	return 0;
}

I found the bug in redhat 7.3 (kernel 2.4.18), the offending code is 
here:

2.4.18 (not fixed): 
http://lxr.linux.no/source/ipc/sem.c?v=2.4.18#L300

2.4.20 (not fixed): 
http://lxr.linux.no/source/ipc/sem.c?v=2.4.20#L300

2.4.21 (not fixed): 
http://lxr.linux.no/source/ipc/sem.c?v=2.4.21#L300

2.4.22 (fixed):
http://lxr.linux.no/source/ipc/sem.c?v=2.4.22#L300
Comment 3 Jim Paradis 2005-12-05 15:11:13 EST
This issue is outside the scope of the current support status for RHEL2.1.  No
fix is planned.

Note You need to log in before you can comment on or make changes to this bug.