Bug 172574

Summary: LTC19767- Watch list fails at approx 4500 watches
Product: Red Hat Enterprise Linux 4 Reporter: tom.h.call
Component: auditAssignee: Steve Grubb <sgrubb>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: andriusb, bugproxy, chavezt, dwmw2, khake, krisw, kweidner, linda.knippers, poelstra, tgraf
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHEA-2006-0033 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-07 16:55:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168429    
Attachments:
Description Flags
strace output when auditctl doesn't display rwatches
none
strace output when auditctl does display watches
none
Patch addressing the watch listing problem none

Description tom.h.call 2005-11-07 14:01:32 UTC
Description of problem:
 I scripted a file lister that populates audit.rules with a watch for each file 
in a directory tree, I then restart auditd to load audit.rules.
 I check auditctl for the watches ( auditctl -l ), they are there and the audit 
events are getting recorded.
 This works fine until I reach a certain number of watches, approx. 4500, once 
I reach that number of watches the watches no long show up with auditctl -l and 
the audit events are not getting generated.
 So I changed the script to issue an 'auditctl -w /directory/file -k xxx -p wa' 
for each file, and checked $? after each. All the $? returns were 0 but I got 
the same results, no watches seen by auditctl after around 4500 watches.

Version-Release number of selected component (if applicable):
audit-1.0.3-6.EL4

How reproducible:
Every time

Steps to Reproduce:
1. Add 4500 ( or more ) watches ( -w /directory/file -k name -p wa ) to 
audit.rules 
2. Restart auditd
3. Check auditctl for watches ( auditctl -l )
  
Actual results:
 When I tried to see the watches with auditctl -l the result came back quickly, 
no hang-up, but it reported 'No rules' but nothing about watches, just blank. 
Also no audit event was generated for an access failure on the 
watched /directory/file.
 The machine didn't appear to be adversely affected. But when I did a 'service 
auditd stop' it would take an appreciable amount of time to return, and there 
was a lot of disk activity as though it was deleting the watches I had asked 
for.

Expected results:
 Auditctl should have listed the watches defined in audit.rules and audit 
events should have been generated for access failures on the defined watches.

Additional info:
I machine I am working on has 2GB of memory, its an HP xw8000 3GHZ Xeon.

Comment 1 Andrius Benokraitis 2005-11-07 21:30:30 UTC
Ken/Tim/Kris: Can you verify this is auditfs related?

Comment 2 Klaus Weidner 2005-11-08 16:08:13 UTC
Tom, can you please provide the version of the kernel you are using, and if it's
it's the 32bit or 64bit (EM64T) architecture?

Comment 3 David Woodhouse 2005-11-08 16:36:17 UTC
As asked on IRC -- does this also happen when the rules are added individually
with auditctl, or only when there's a huge audit.rules file to be loaded?

Comment 4 tom.h.call 2005-11-08 16:53:11 UTC
R#2 - Linux lionw145 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 
i686 i386 GNU/Linux : 32 bit

R#3 - See description of problem, I scripted to use 'auditctl -w ...' for each 
watch and I got the same results. I have not entered manually > 4500 individual 
auditctl's on a command line.

Comment 5 Klaus Weidner 2005-11-08 23:07:52 UTC
This appears to be fixed in later kernels in the audit development fork, can you
please try
ftp://ftp.uk.linux.org/pub/people/dwmw2/audit/kernel/kernel-2.6.9-11.EL.audit.89.i686.rpm
or another appropriate recent kernel from that yum repository?

Comment 6 Klaus Weidner 2005-11-08 23:10:27 UTC
... that should be
ftp://ftp.uk.linux.org/pub/people/dwmw2/audit/kernel/kernel-smp-2.6.9-11.EL.audit.89.i686.rpm
for the SMP kernel. Anyway, I heard that those kernels worked as expected with
90000 watched files.

Comment 7 Klaus Weidner 2005-11-09 00:10:54 UTC
re #5 and #6 - the latest I heard back from Tim is that he can't reproduce it on
2.6.9-22.EL after all, and based on a kernel source diff the audit related
changes seem to be minimal. So I can't confirm that the kernels from the audit
branch fix it, but it would still help if you could try those. Sorry about the
confusion.

Comment 8 tom.h.call 2005-11-09 13:31:02 UTC
R#7 - Do I understand that you want me load a kernel ( kernel-smp-2.6.9-
11.EL.audit.89.i686.rpm ) older than the kernel ( 2.6.9-22.ELsmp ) I currently 
have loaded? 

Comment 9 Steve Grubb 2005-11-09 13:41:46 UTC
That kernel is the kernel that is undergoing CAPP evaluation. The audit system
in that kernel is known to behave a certain way and is the basis for what is in
U2. We want to determine if your problem is a U2 only problem or a general problem.

Comment 10 tom.h.call 2005-11-09 15:27:39 UTC
R#9 - I have loaded the 89 kernel ( Linux lionw145 2.6.9-11.EL.audit.89smp #1 
SMP Tue Oct 18 09:32:07 EDT 2005 i686 i686 i386 GNU/Linux ) and rerun my test.

I thought it showed the same symthoms as before, that is the watches were shown 
and audits generated for 4000 watches but not 4500 watches.
But then I looked at some other things for the 4500 test, 
i.e. /var/log/audit/audit.log listed watches as inserted. And after stopping 
auditd, I took a look at dmesg. 
What surprised me was that dmesg had been completely overwritten with messages 
that the watches had been deleted!

So I decided to rerun the test, and surprise! again.
This time 4500 watches worked, so did 5000, 6000, 7000, and 8000. But it failed 
for 9000 watches (not 90000, I didn't try that).

Needless to say, I'm confused with these results. But maybe it will help you.

Comment 11 Klaus Weidner 2005-11-09 17:21:06 UTC
This is confusing - Tom, could you post the actual test case you're using
(preferably a self-contained shell script or similar) so that we're all looking
at the same thing? We haven't been able to reproduce this effect so far.

Also, could you try running "while true; do dmesg -c; sleep 1; done >>
dmesg.log" during the test to capture the dmesg output and post that as well? 

I'm still not quite clear on what exactly is happening here, for example it
would be normal for watches to be deleted if the directory containing the
watched files is deleted since the watches are a property of the containing
directory.

Comment 12 Klaus Weidner 2005-11-09 19:06:26 UTC
Ok, we're starting to be able to reproduce this... The effect seems to be
related to adding watches for files being watched already, and only a certain
CPU seems to fail at listing watches. Running strace auditctl -l shows that it's
getting an -EAGAIN error (resource temporarily unavailable) when trying to read
the netlink socket.

If you don't see watches, try running one or more CPU bound tasks in the
background (such as perl -e 'while(1){}' & ) and run auditctl -l again, and the
watches will get listed again.

This would seem to confirm that it's an SMP related locking issue. However, the
watches do still seem to be working even though they aren't being listed (not
exhaustively tested though), and adding new watches also still works.

Comment 13 tom.h.call 2005-11-09 19:52:48 UTC
R#12 - That's good! I did as you suggested and ran the perl background and the 
watches showed up in auditctl. And I verified that when the watches were not 
showing up in auditctl the audit events were getting generated for watches that 
should be there.
But, this is a result of the 89 kernel. On my production 2.6.9-22.ELsmp kernel 
the audit events were not getting generated when the watches didn't show up in 
auditctl.

R#11 - How much of this do you still want?

Comment 14 Klaus Weidner 2005-11-11 19:37:02 UTC
We're still looking into tracking down what's happening - a weird effect is that
the netlink_unicast() call in kernel/audit.c:audit_send_reply() gets a -111
(-ECONNREFUSED) return value back and ignores that. The most likely candidate in
net/netlink/af_netlink.c seems to be netlink_getsockbypid() which would point to
the wrong PID being used.

According to some added kernel printks the pid does seem to be correct, but so
far it's uncertain which of the pids (NETLINK_CREDS(skb)->pid or
NETLINK_CB(skb).pid) is the right one to use. Changing that didn't seem to have
any effect.

Re comment #13: Tom, I think we have enough information to reproduce the effect,
assuming we're seeing the same issue.

Comment 15 tom.h.call 2005-11-11 19:51:24 UTC
R#14 - Klaus, in case we're not looking at the same thing, here is a shell 
script I can use to create my problem:
#!/bin/sh -e
cnt=0
for file in `find $@`
do
  echo $file
 `auditctl -w $file -k test -p wa`
  if [ $? != 0 ] ; then
   echo "Return error aborting" 
   exit 
  fi
  cnt=`expr $cnt + 1`
  if [ $cnt == 5000 ] ; then
   echo $cnt 
   exit 
  fi
done 
exit 0

 I run it with a parameter for the directory where is want to create watches, 
eg. # script /usr/lib


Comment 16 Tim Chavez 2005-11-14 15:35:21 UTC
(In reply to comment #14)
> We're still looking into tracking down what's happening - a weird effect is that
> the netlink_unicast() call in kernel/audit.c:audit_send_reply() gets a -111
> (-ECONNREFUSED) return value back and ignores that. The most likely candidate in
> net/netlink/af_netlink.c seems to be netlink_getsockbypid() which would point to
> the wrong PID being used.

Actually what's happening is that by the time we get to netlink_getsockbypid()
and do the netlink_lookup() with our pid (the pidof auditctl), it is no longer
in the netlink table, so -ECONNREFUSED is returned.  What this means is that for
"some reason" the netlink socket is being prematurely closed... perhaps
'auditctl' is timing out?  I don't know that I buy this because it is all
happening in much less time than three seconds (which is roughly the timeout in
seconds according to Steve).  Also, I was playing around with printk's and did
see that the netlink_unicast() in netlink_ack(), which is called by
audit_receive_msg(), is returning a positive number indicating success.

I haven't actually yet looked at 'auditctl', but we need to figure out why pidof
auditctl is no longer in the netlink table when we do the lookup.

Loading the CPU seems to keep us (the socket) alive long enough to get the list.

> 
> According to some added kernel printks the pid does seem to be correct, but so
> far it's uncertain which of the pids (NETLINK_CREDS(skb)->pid or
> NETLINK_CB(skb).pid) is the right one to use. Changing that didn't seem to have
> any effect.

Steve says that due to how things are aligned they are equivical, but that
NETLINK_CB is the more correct/acceptable method of retrieval.. or something.

Comment 17 IBM Bug Proxy 2005-11-14 20:06:02 UTC
---- Additional Comments From rosalesa.com(prefers email via rosalesa.com)  2005-11-14 15:01 EDT -------
Tim,
Do you have any update on comment 2?
-Thanks. 

Comment 18 Andrius Benokraitis 2005-11-14 20:12:20 UTC
Wondering if this can be reproduced on any SMP machine as well...

Comment 19 Linda Knippers 2005-11-14 20:16:22 UTC
I've been able to reproduce the problem on a 2 CPU ia64 box.

Comment 20 IBM Bug Proxy 2005-11-14 22:31:12 UTC
---- Additional Comments From rosalesa.com(prefers email via rosalesa.com)  2005-11-14 17:26 EDT -------
Speaking with Tim and Mounir,
The problem seems to be occuring with the netlink socket communication between
kernel and user space:

Quoting Tim Chavez
>Actually what\'s happening is that by the time we get to netlink_getsockbypid()
>and do the netlink_lookup() with our pid (the pidof auditctl), it is no longer
>in the netlink table, so -ECONNREFUSED is returned.  What this means is that
>for \"some reason\" the netlink socket is being prematurely closed... perhaps
>\'auditctl\' is timing out?  I don\'t know that I buy this because it is all
>happening in much less time than three seconds (which is roughly the timeout in
>seconds according to Steve).  Also, I was playing around with printk\'s and did
>see that the netlink_unicast() in netlink_ack(), which is called by
>audit_receive_msg(), is returning a positive number indicating success.
>I haven\'t actually yet looked at \'auditctl\', but we need to figure out why
>pidof auditctl is no longer in the netlink table when we do the lookup.
>Loading the CPU seems to keep us (the socket) alive long enough to get the >list.

Tim is in the process of attaching the strace of the auditctl call.

-Thanks 

Comment 23 Linda Knippers 2005-11-16 00:01:05 UTC
I have a theory about the problem:

I have two sets of strace logs.  auditctl.nowatches is where it fails.
It fails after 30 EAGAIN returns from recvfrom.  auditctl.watches.trunc
is a truncated log of when it works.  Notice in this log that we get
25 EAGAIN failures from recvfrom.  (strace logs attached in this or
the next update)

I think what's happening is that the select returns when the ack is
there, we peek and see that there are 36 bytes, we get the 36 bytes
and we do a nonblocking recvfrom expecting more and there sometimes
isn't more, at least not yet.  There's no select and no peek between
the two recvfrom calls so I think we're lucky when the data really is
there.

I looked at get_reply(), which I believe is where the 30 recvfroms
that return EAGAIN happen, and it looks like if audit_get_reply returns
a negative value, which is does in this case, that we just loop through
30 tries on the recvfrom without a select or a delay between them, and
then bail out.

Seems like auditctl ought to do another select before looking for more
data from the socket, or at least delay before doing another recvfrom.

To test this theory, I modified get_reply() to sleep for a second if
audit_get_reply returned a negative value and it seems to work every 
time now.  This is rather brute force but I think demonstrates the 
problem.

Comment 24 Linda Knippers 2005-11-16 00:02:32 UTC
Created attachment 121102 [details]
strace output when auditctl doesn't display rwatches

Comment 25 Linda Knippers 2005-11-16 00:04:07 UTC
Created attachment 121103 [details]
strace output when auditctl does display watches


This file is truncated to only show the output of the first few watch
records.  Notice the EAGAIN's from recvfrom before we start getting
records.

Comment 26 Klaus Weidner 2005-11-16 00:40:03 UTC
Thanks Linda, this analysis appears to explain all the effects. auditctl's
get_reply() doesn't handle -EAGAIN, so auditctl exits before the data is ready,
and then the kernel gets the -ECONNREFUSED error when trying to send the data
since the auditctl process is gone. Adding extra load would cause auditctl to be
scheduled less often which can give the kernel enough time to start sending data
before it gives up.

(BTW, the comment in lib/netlink.c for audit_get_reply() is misleading, it
actually returns -errno on error, not -1)

Comment 27 Steve Grubb 2005-11-16 18:22:32 UTC
Created attachment 121141 [details]
Patch addressing the watch listing problem

A patch has been created that adds select before the recvfrom when receiving
file watch listings. It is being attached for review. auditctl is the only app
that is fixed by this patch. I am changing the timeout from 3 seconds to 4 in
some other places, too.

Comment 28 tom.h.call 2005-11-17 23:59:07 UTC
I ran the patched auditd against both the production and the 89 kernel. 
Got much better results when setting large numbers of watches. Auditctl lists 
them, and the audits were getting generated with some exceptions.
Those exceptions a hard to define because I couldn't repeat the scenario every 
time.
There seems to be a problem if a watch set is attempted on a file name that 
contains a space. Auditctl will refuse the watch, but for the watches that it 
does set, audit events do not get generated for a refused attempt to 'Echo' a 
string to the end of a watched file. Other failed attempts such 
as 'touch' 'rm' 'mv' etc. are detected.
Sorry for the lack of a complete test case, I'll try to spend more time testing 
next week.

Comment 29 Klaus Weidner 2005-11-18 00:17:21 UTC
re comment #28: I can't reproduce that issue, watches on path names containing
spaces work fine for me. The audit records contain hex encoded file and path
names as expected. Are you sure you are quoting things correctly in the shell?

This is what I tried:

    ## run as user 'root':
    touch /tmp/"spaced out"
    mkdir /tmp/spaced\ dir/
    date > /tmp/spaced\ dir/spaced\ file
    chmod 600 /tmp/spaced\ out /tmp/spaced\ dir/spaced\ file

    auditctl -a exit,possible -S open

    auditctl -w /tmp/"spaced out"
type=CONFIG_CHANGE msg=audit(1132273264.224:33): auid=4294967295 inserted watch

    auditctl -w /tmp/spaced\ dir/spaced\ file
type=CONFIG_CHANGE msg=audit(1132273266.948:34): auid=4294967295 inserted watch

    auditctl -l
AUDIT_LIST: exit,possible syscall=open
AUDIT_WATCH_LIST: dev=8:19, path=/tmp/spaced out, filterkey=, perms=, valid=0
AUDIT_WATCH_LIST: dev=8:19, path=/tmp/spaced dir/spaced file, filterkey=,
perms=, valid=0

    ## run as user "bin":
    echo test >> /tmp/spaced\ out

type=SYSCALL msg=audit(1132273045.670:27): arch=c0000032 syscall=1028 success=no
exit=-13 a0=6000000000043e20 a1=441 a2=1b6 a3=2 items=1 pid=3466 auid=4294967295
uid=1 gid=1 euid=1 suid=1 fsuid=1 egid=1 sgid=1 fsgid=1 comm="bash" exe="/bin/bash"
type=FS_WATCH msg=audit(1132273045.670:27): watch_inode=2851512
watch=737061636564206F7574 filterkey= perm=0 perm_mask=10

    echo test >> /tmp/spaced\ dir/spaced\ file

type=SYSCALL msg=audit(1132273045.670:28): arch=c0000032 syscall=1028 success=no
exit=-13 a0=6000000000043e20 a1=401 a2=1b6 a3=2 items=1 pid=3466 auid=4294967295
uid=1 gid=1 euid=1 suid=1 fsuid=1 egid=1 sgid=1 fsgid=1 comm="bash" exe="/bin/bash"
type=FS_WATCH msg=audit(1132273045.670:28): watch_inode=2851512
watch=737061636564206F7574 filterkey= perm=0 perm_mask=10

If you have a reproducable test case please add that.

Comment 30 IBM Bug Proxy 2005-11-21 20:57:56 UTC
---- Additional Comments From bsaibes.com  2005-11-21 15:57 EDT -------
re comment #20. 
If it turns out that there is a problem when setting a watch on a file name 
that contains a space, there should be a seperate bug to track the issue, 
correct? 

Comment 31 Klaus Weidner 2005-11-21 21:05:36 UTC
re Glen's comment #30 (RH bugzilla), I agree that the space issue should be a
new bug (assuming it's reproducable). The main issue regarding the watch listing
was a definite bug and the tracker contains a fix for it, so I think this one is
getting ready to be closed.

Comment 32 tom.h.call 2005-11-22 13:18:21 UTC
I wasn't very clear about how to instigate the "problem" with watches after a 
watch is attempted on a file with spaces. I'll do some testing then maybe start 
a new bug report.

And I understand you want to close this report. We just want to say Thanks! to 
you all for the hard work and attention to make Auditd better.

Comment 34 IBM Bug Proxy 2005-11-22 15:43:23 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|FIXEDAWAITINGTEST           |TESTED




------- Additional Comments From rosalesa.com(prefers email via rosalesa.com)  2005-11-22 10:41 EDT -------
Marking status of this bug as \"Tested\" per comment 24. 

Comment 37 Red Hat Bugzilla 2006-03-07 16:55:57 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2006-0033.html