From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050721 CentOS/1.0.6-1.4.1.centos4 Firefox/1.0.6 Description of problem: if a running job is waiting in nanosleep attaching strace will report setup as the syscall being executed. Version-Release number of selected component (if applicable): strace-4.5.9-2.EL4 How reproducible: Always Steps to Reproduce: 1. perl -e 'sleep(100)' & 2. strace -p PID-OF-PERL-JOB 3. Actual Results: % perl -e 'sleep(100)' & [2] 32603 % strace -p 32603 Process 32603 attached - interrupt to quit setup( <unfinished ...> Process 32603 detached % Expected Results: strace should have reported nanosleep as the current syscall instead of setup. Additional info: this isn't just for perl. a c program calling sleep will have the same result, perl is just easier for demostration. running the process directly from strace will correctly report: nanosleep({100, 0}, <unfinished ...>
i forgot to add that i have not tested a newer version of strace to see if that fixes the problem.
$ sleep 100 & [1] 12345 $ strace -p $! Process 12345 attached - interrupt to quit nanosleep({100, 0}, <unfinished ...> Process 12345 detached $ strace -V strace -- version 4.5.13
RHEL4U2 has 4.5.13.
rhel4u2 doesn't seem to fix it: # rpm -q strace strace-4.5.13-0.EL4.1 # sleep 100 & [1] 32077 # strace -p 32077 Process 32077 attached - interrupt to quit setup( <unfinished ...> Process 32077 detached
The name "setup" is wrong but what strace is doing is reporting what the kernel tells it--system call 0, not the nanosleep system call number. This happens when the kernel restarts the nanosleep system call after a signal (the signal is an unavoidable part of attaching strace). The kernel does not leave any information that strace can find about the original system call, only that the special "restart_syscall" call is being made. I've fixed strace upstream to print this info more clearly: restart_syscall(<... resuming interrupted call ...>) = 0
is this a kernel problem in that it isn't reporting the system call?
Looks like 2.4.x reports the syscall while 2.6.x doesn't.
This is in the kernel domain, yes. It is not clear it can be called a "problem" or that a kernel change can be expected. At the low level, it is an accurate report of what the process is doing--by the time you look at it, it is no longer blocked in nanosleep system call, but is in fact blocked in the restart_syscall system call. In 2.4, nanosleep does not back out and restart in the same way for signals, so the issue does not arise.
This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 4.4 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 4.4 release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0418.html