Bug 1662936

Summary: strace reports 'ptrace(SYSCALL): No such process' on multi-threaded testcase on RHEL-8
Product: Red Hat Enterprise Linux 8 Reporter: Edjunior Barbosa Machado <emachado>
Component: straceAssignee: Eugene Syromiatnikov <esyr>
strace sub component: system-version QA Contact: Edjunior Barbosa Machado <emachado>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: emachado, law, ohudlick, skozina
Version: 8.0   
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: strace-4.24-4.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 22:33:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1682600    
Bug Blocks: 1696304    
Attachments:
Description Flags
full strace log output on aarch64 none

Description Edjunior Barbosa Machado 2019-01-02 13:12:54 UTC
Created attachment 1517950 [details]
full strace log output on aarch64

Description of problem:
Latest strace-4.24-3.el8 occasionally reports '<ptrace(SYSCALL):No such process>' on getuid() calls when running the testcase below:

[root@qualcomm-amberwing-rep-10 ~]# cat many_looping_threads.c 
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <stdlib.h>

static int thd_no;

static void *sub_thd(void *c)
{
	fprintf(stderr, "sub-thread %d created\n", ++thd_no);
	for (;;)
		getuid();
	return NULL;
}

int main(int argc, char *argv[])
{
	int i;
	pthread_t *thd;
	int num_threads = 1;

	if (argv[1])
		num_threads = atoi(argv[1]);

	thd = malloc(num_threads * sizeof(thd[0]));
	fprintf(stderr, "test start, num_threads:%d...\n", num_threads);
	for (i = 0; i < num_threads; i++) {
		pthread_create(&thd[i], NULL, sub_thd, NULL);
		fprintf(stderr, "after pthread_create\n");
	}
	/* Exit. This kills all threads */
	return 0;
}
[root@qualcomm-amberwing-rep-10 ~]# gcc many_looping_threads.c -o many_looping_threads -lpthread
[root@qualcomm-amberwing-rep-10 ~]# strace -f -o strace-log.log ./many_looping_threads 20
test start, num_threads:20...
after pthread_create
sub-thread 1 created
sub-thread 2 created
after pthread_create
after pthread_create
sub-thread 3 created
after pthread_create
sub-thread 4 created
after pthread_create
sub-thread 5 created
after pthread_create
sub-thread 6 created
after pthread_create
sub-thread 7 created
after pthread_create
sub-thread 8 created
after pthread_create
sub-thread 9 created
after pthread_create
sub-thread 10 created
after pthread_create
sub-thread 11 created
after pthread_create
sub-thread 12 created
after pthread_create
sub-thread 13 created
after pthread_create
sub-thread 14 created
after pthread_create
sub-thread 15 created
after pthread_create
sub-thread 16 created
after pthread_create
sub-thread 17 created
after pthread_create
sub-thread 18 created
after pthread_create
sub-thread 19 created
after pthread_create
sub-thread 20 created
[root@qualcomm-amberwing-rep-10 ~]# grep 'No such process' strace-log.log -C3
22468 getuid( <unfinished ...>
22467 exit_group(0 <unfinished ...>
22487 <... write resumed> )             = 22
22486 getuid( <ptrace(SYSCALL):No such process>
22485 <... getuid resumed> )            = ? <unavailable>
22484 ????( <unfinished ...>
22483 <... getuid resumed> )            = 0

The same tests also runs successfully sometimes, without any 'No such process' message.

Version-Release number of selected component (if applicable):
strace-4.24-3.el8
RHEL-8.0-20181220.1

How reproducible:
Occasionally

Comment 1 Eugene Syromiatnikov 2019-01-29 14:04:40 UTC
I would say that this is not a bug, since the aforementioned ptrace(PTRACE_SYSCALL) error is genuinely possible when the process disappears between execution of the syscall entry code and an attempt to resume the tracee[1].

[1] https://gitlab.com/strace/strace/blob/v4.24/strace.c#L396

Comment 2 Eugene Syromiatnikov 2019-02-11 18:26:11 UTC
Upstream commits that tackle this issue:
https://gitlab.com/strace/strace/commit/fe64f96ac09bfc97b6554816a19ae1fe138f1cae "Make inline message on failed restart attempt more verbose"
https://gitlab.com/strace/strace/commit/e0632590bdc041ef937ecf0491d6cd1504dec36f "ptrace_restart: do not print diagnostics when ptrace returns ESRCH"

Comment 6 errata-xmlrpc 2019-11-05 22:33:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3642