Bug 131881 - clock_gettime() triggers audit kill from i386 binary on x86_64
Summary: clock_gettime() triggers audit kill from i386 binary on x86_64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Ernie Petrides
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix 186960
TreeView+ depends on / blocked
 
Reported: 2004-09-06 10:35 UTC by Joe Orton
Modified: 2008-08-02 23:40 UTC (History)
8 users (show)

Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-20 13:15:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
I assume this patch takes care of what peter mentioned? (460 bytes, patch)
2005-09-07 16:01 UTC, Eric Paris
no flags Details | Diff
fix for correcting IA32_NR_syscalls value on x86_64 (304 bytes, patch)
2006-04-19 02:04 UTC, Ernie Petrides
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Joe Orton 2004-09-06 10:35:02 UTC
A cronned job which calls clock_gettime(), which glibc does internally
for in some places, is being killed by the kernel auditing code.  This
is a regression with U3, or maybe just that cron started enabling
auditing in U3 so it's the first time I saw it.

Linux emcee.cambridge.redhat.com 2.4.21-20.EL #1 Wed Aug 18 20:48:55
EDT 2004 x86_64 x86_64 x86_64 GNU/Linux

$ cat clock.c
#include <time.h>
int main(int argc, char **argv)
{
    struct timespec ts;
    return clock_gettime(CLOCK_REALTIME, &ts);
}
$ gcc -m32 -o clock clock.c -lrt
$ su
Password:
$ ./clock
$ echo $?
0
$ /usr/sbin/aurun ./clock
Killed
$ echo $?
137
$ dmesg  | tail -1
audit_intercept: error 38, killing task

if run under strace and not under aurun:
clock_gettime(0, 0xfffe3f70)            = -1 ENOSYS (Function not
implemented)

Comment 1 Peter Martuccelli 2004-09-08 21:03:45 UTC
entry.S compares against IA32_NR_syscalls before it enters the audit
code.  The audit x86_64 syscall code compares the syscall in the regs
to see if it is >= IA32_NR_syscalls, this needs to be changed to > to
stop audit from killing the process.

Comment 2 Peter Martuccelli 2004-09-08 21:09:05 UTC
I do not beleive this to be a kernel regression.

Comment 3 Iustin Pop 2004-09-27 06:47:17 UTC
I also hit this issue. In my case, it is a proprietary 32 bit
application (IBM Tivoli Storage Manager) which is killed with the same
error. The only fix is disabling the audit code, but in that case cron
gives a lot of errors.

Any timeline on a fix?

Comment 4 Iustin Pop 2004-12-30 09:09:36 UTC
Any news on this issue? U4 is out and still didn't fix this problem.

Comment 6 Ernie Petrides 2005-07-22 01:57:07 UTC
Reassigning to Brian.

Comment 7 Chuck Berg 2005-07-28 18:27:16 UTC
I also hit this issue, with update 5. Also a proprietary 32 bit application,
Tibco Rendezvous. I get the same "audit_intercept: error 38, killing task" message.


Comment 10 Eric Paris 2005-09-07 16:02:00 UTC
Created attachment 118566 [details]
I assume this patch takes care of what peter mentioned?

Comment 11 Michael J. Saletnik 2005-09-19 22:35:56 UTC
I can reproduce the exact same symptoms at will by simply calling statfs64()
from a 32-bit program (compiled on RHEL 2.1 AS) run on 2.4.21-32.0.1.EL #1 Tue
May 17 17:53:25 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux. It works from the
command line, but under cron it's killed by the audit_intercept: error 38 code. 

I have my customer testing to see if the failure under cron can be reproduced at
the command line using 'aurun'.

Is there perhaps a more general problem with certain 32-bit syscalls (possibly
in a given high numerical range) interacting poorly with the audit system on
x86_64 systems?


Comment 12 Michael J. Saletnik 2005-09-20 15:36:04 UTC
Okay, following up on my own previous comment, here's what I've discovered in
the 2.4.21-32.0.1.EL sources:

arch/i386/kernel/entry.S defines all the syscalls. clock_gettime is 265, statfs
is 268. There are a total of 270 syscalls.

drivers/audit/syscall-x86_64.c uses IA32_NR_syscalls to set nr_syscalls and then
does the (code_raw >= nr_syscalls) as noted previously in this bug, to decide
whether or not to kill the process.

The problem is that include/asm-x86_64/ia32_unistd.h has an unconditional
#define of IA32_NR_syscalls to be 265.

Due to this dichotomy, syscalls 265 through 270 will all get killed if called by
a 32-bit process on a 64-bit kernel with auditing enabled.

That's the real problem here. Can someone please investigate a fix?


Comment 14 Alberto 2005-10-04 15:52:29 UTC
I get the same problem with heartbeat + DRBD

Oct  4 17:30:05 c-229 heartbeat: ERROR: Return code 20 from
/etc/ha.d/resource.d/drbddisk
Oct  4 17:30:06 c-229 heartbeat: info: Retrying failed stop operation
[drbddisk::drbd_var_mqm]
Oct  4 17:30:06 c-229 heartbeat: info: Running /etc/ha.d/resource.d/drbddisk
drbd_var_mqm stop
Oct  4 17:30:06 c-229 kernel: audit_intercept: error 14, killing task


I cant switch resource group. Any fix for this? U6 fix this??

Comment 20 Bob Johnson 2006-04-11 16:04:08 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 3.8 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 3.8 release.

Comment 24 Ernie Petrides 2006-04-19 00:55:10 UTC
The information in comment #1 and the associated patch in comment #10 are
bogus.  The problem has already been correctly diagnosed in comment #12
(except for the fact that the syscall table has 271 entries, not 270).

The correct fix is to make IA32_NR_syscalls in include/asm-x86_64/ia32_unistd.h
match the value for NR_syscalls in include/linux/sys.h, which is 271.  The
associated table "ia32_sys_call_table" in arch/x86_64/ia32/ia32entry.S already
contains 271 entries.

I'll test a fix shortly and post it for consideration in U8.

Thank you Michael Saletnik for your analysis.


Comment 25 Ernie Petrides 2006-04-19 02:04:22 UTC
Created attachment 127961 [details]
fix for correcting IA32_NR_syscalls value on x86_64

The patch above has been verified to correct the problem
and has been posted for review (with less than two hours
to spare before the U8 patch posting deadline).

Comment 26 Ernie Petrides 2006-04-25 03:32:39 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.10.EL).


Comment 28 Joshua Giles 2006-05-30 14:36:46 UTC
A kernel has been released that contains a patch for this problem.  Please
verify if your problem goes away with the latest available kernel from the RHEL3
public beta channel at rhn.redhat.com.



Comment 29 Ernie Petrides 2006-05-30 20:25:09 UTC
Reverting to ON_QA.

Comment 30 Michael J. Saletnik 2006-06-28 19:19:06 UTC
Kernel 2.4.21-43.ELsmp x86_64 was tested (on an otherwise update 7 system) and indeed I have verified 
that the problem has gone away for our product with this kernel. We now run fine (as tested by using 
aurun) instead of being killed. Thank you for the fix!


Comment 32 Red Hat Bugzilla 2006-07-20 13:15:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html



Note You need to log in before you can comment on or make changes to this bug.