Bug 172574
Summary: | LTC19767- Watch list fails at approx 4500 watches | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | tom.h.call | ||||||||
Component: | audit | Assignee: | Steve Grubb <sgrubb> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.0 | CC: | andriusb, bugproxy, chavezt, dwmw2, khake, krisw, kweidner, linda.knippers, poelstra, tgraf | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHEA-2006-0033 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2006-03-07 16:55:57 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 168429 | ||||||||||
Attachments: |
|
Description
tom.h.call
2005-11-07 14:01:32 UTC
Ken/Tim/Kris: Can you verify this is auditfs related? Tom, can you please provide the version of the kernel you are using, and if it's it's the 32bit or 64bit (EM64T) architecture? As asked on IRC -- does this also happen when the rules are added individually with auditctl, or only when there's a huge audit.rules file to be loaded? R#2 - Linux lionw145 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux : 32 bit R#3 - See description of problem, I scripted to use 'auditctl -w ...' for each watch and I got the same results. I have not entered manually > 4500 individual auditctl's on a command line. This appears to be fixed in later kernels in the audit development fork, can you please try ftp://ftp.uk.linux.org/pub/people/dwmw2/audit/kernel/kernel-2.6.9-11.EL.audit.89.i686.rpm or another appropriate recent kernel from that yum repository? ... that should be ftp://ftp.uk.linux.org/pub/people/dwmw2/audit/kernel/kernel-smp-2.6.9-11.EL.audit.89.i686.rpm for the SMP kernel. Anyway, I heard that those kernels worked as expected with 90000 watched files. re #5 and #6 - the latest I heard back from Tim is that he can't reproduce it on 2.6.9-22.EL after all, and based on a kernel source diff the audit related changes seem to be minimal. So I can't confirm that the kernels from the audit branch fix it, but it would still help if you could try those. Sorry about the confusion. R#7 - Do I understand that you want me load a kernel ( kernel-smp-2.6.9- 11.EL.audit.89.i686.rpm ) older than the kernel ( 2.6.9-22.ELsmp ) I currently have loaded? That kernel is the kernel that is undergoing CAPP evaluation. The audit system in that kernel is known to behave a certain way and is the basis for what is in U2. We want to determine if your problem is a U2 only problem or a general problem. R#9 - I have loaded the 89 kernel ( Linux lionw145 2.6.9-11.EL.audit.89smp #1 SMP Tue Oct 18 09:32:07 EDT 2005 i686 i686 i386 GNU/Linux ) and rerun my test. I thought it showed the same symthoms as before, that is the watches were shown and audits generated for 4000 watches but not 4500 watches. But then I looked at some other things for the 4500 test, i.e. /var/log/audit/audit.log listed watches as inserted. And after stopping auditd, I took a look at dmesg. What surprised me was that dmesg had been completely overwritten with messages that the watches had been deleted! So I decided to rerun the test, and surprise! again. This time 4500 watches worked, so did 5000, 6000, 7000, and 8000. But it failed for 9000 watches (not 90000, I didn't try that). Needless to say, I'm confused with these results. But maybe it will help you. This is confusing - Tom, could you post the actual test case you're using (preferably a self-contained shell script or similar) so that we're all looking at the same thing? We haven't been able to reproduce this effect so far. Also, could you try running "while true; do dmesg -c; sleep 1; done >> dmesg.log" during the test to capture the dmesg output and post that as well? I'm still not quite clear on what exactly is happening here, for example it would be normal for watches to be deleted if the directory containing the watched files is deleted since the watches are a property of the containing directory. Ok, we're starting to be able to reproduce this... The effect seems to be related to adding watches for files being watched already, and only a certain CPU seems to fail at listing watches. Running strace auditctl -l shows that it's getting an -EAGAIN error (resource temporarily unavailable) when trying to read the netlink socket. If you don't see watches, try running one or more CPU bound tasks in the background (such as perl -e 'while(1){}' & ) and run auditctl -l again, and the watches will get listed again. This would seem to confirm that it's an SMP related locking issue. However, the watches do still seem to be working even though they aren't being listed (not exhaustively tested though), and adding new watches also still works. R#12 - That's good! I did as you suggested and ran the perl background and the watches showed up in auditctl. And I verified that when the watches were not showing up in auditctl the audit events were getting generated for watches that should be there. But, this is a result of the 89 kernel. On my production 2.6.9-22.ELsmp kernel the audit events were not getting generated when the watches didn't show up in auditctl. R#11 - How much of this do you still want? We're still looking into tracking down what's happening - a weird effect is that the netlink_unicast() call in kernel/audit.c:audit_send_reply() gets a -111 (-ECONNREFUSED) return value back and ignores that. The most likely candidate in net/netlink/af_netlink.c seems to be netlink_getsockbypid() which would point to the wrong PID being used. According to some added kernel printks the pid does seem to be correct, but so far it's uncertain which of the pids (NETLINK_CREDS(skb)->pid or NETLINK_CB(skb).pid) is the right one to use. Changing that didn't seem to have any effect. Re comment #13: Tom, I think we have enough information to reproduce the effect, assuming we're seeing the same issue. R#14 - Klaus, in case we're not looking at the same thing, here is a shell script I can use to create my problem: #!/bin/sh -e cnt=0 for file in `find $@` do echo $file `auditctl -w $file -k test -p wa` if [ $? != 0 ] ; then echo "Return error aborting" exit fi cnt=`expr $cnt + 1` if [ $cnt == 5000 ] ; then echo $cnt exit fi done exit 0 I run it with a parameter for the directory where is want to create watches, eg. # script /usr/lib (In reply to comment #14) > We're still looking into tracking down what's happening - a weird effect is that > the netlink_unicast() call in kernel/audit.c:audit_send_reply() gets a -111 > (-ECONNREFUSED) return value back and ignores that. The most likely candidate in > net/netlink/af_netlink.c seems to be netlink_getsockbypid() which would point to > the wrong PID being used. Actually what's happening is that by the time we get to netlink_getsockbypid() and do the netlink_lookup() with our pid (the pidof auditctl), it is no longer in the netlink table, so -ECONNREFUSED is returned. What this means is that for "some reason" the netlink socket is being prematurely closed... perhaps 'auditctl' is timing out? I don't know that I buy this because it is all happening in much less time than three seconds (which is roughly the timeout in seconds according to Steve). Also, I was playing around with printk's and did see that the netlink_unicast() in netlink_ack(), which is called by audit_receive_msg(), is returning a positive number indicating success. I haven't actually yet looked at 'auditctl', but we need to figure out why pidof auditctl is no longer in the netlink table when we do the lookup. Loading the CPU seems to keep us (the socket) alive long enough to get the list. > > According to some added kernel printks the pid does seem to be correct, but so > far it's uncertain which of the pids (NETLINK_CREDS(skb)->pid or > NETLINK_CB(skb).pid) is the right one to use. Changing that didn't seem to have > any effect. Steve says that due to how things are aligned they are equivical, but that NETLINK_CB is the more correct/acceptable method of retrieval.. or something. ---- Additional Comments From rosalesa.com(prefers email via rosalesa.com) 2005-11-14 15:01 EDT ------- Tim, Do you have any update on comment 2? -Thanks. Wondering if this can be reproduced on any SMP machine as well... I've been able to reproduce the problem on a 2 CPU ia64 box. ---- Additional Comments From rosalesa.com(prefers email via rosalesa.com) 2005-11-14 17:26 EDT -------
Speaking with Tim and Mounir,
The problem seems to be occuring with the netlink socket communication between
kernel and user space:
Quoting Tim Chavez
>Actually what\'s happening is that by the time we get to netlink_getsockbypid()
>and do the netlink_lookup() with our pid (the pidof auditctl), it is no longer
>in the netlink table, so -ECONNREFUSED is returned. What this means is that
>for \"some reason\" the netlink socket is being prematurely closed... perhaps
>\'auditctl\' is timing out? I don\'t know that I buy this because it is all
>happening in much less time than three seconds (which is roughly the timeout in
>seconds according to Steve). Also, I was playing around with printk\'s and did
>see that the netlink_unicast() in netlink_ack(), which is called by
>audit_receive_msg(), is returning a positive number indicating success.
>I haven\'t actually yet looked at \'auditctl\', but we need to figure out why
>pidof auditctl is no longer in the netlink table when we do the lookup.
>Loading the CPU seems to keep us (the socket) alive long enough to get the >list.
Tim is in the process of attaching the strace of the auditctl call.
-Thanks
I have a theory about the problem: I have two sets of strace logs. auditctl.nowatches is where it fails. It fails after 30 EAGAIN returns from recvfrom. auditctl.watches.trunc is a truncated log of when it works. Notice in this log that we get 25 EAGAIN failures from recvfrom. (strace logs attached in this or the next update) I think what's happening is that the select returns when the ack is there, we peek and see that there are 36 bytes, we get the 36 bytes and we do a nonblocking recvfrom expecting more and there sometimes isn't more, at least not yet. There's no select and no peek between the two recvfrom calls so I think we're lucky when the data really is there. I looked at get_reply(), which I believe is where the 30 recvfroms that return EAGAIN happen, and it looks like if audit_get_reply returns a negative value, which is does in this case, that we just loop through 30 tries on the recvfrom without a select or a delay between them, and then bail out. Seems like auditctl ought to do another select before looking for more data from the socket, or at least delay before doing another recvfrom. To test this theory, I modified get_reply() to sleep for a second if audit_get_reply returned a negative value and it seems to work every time now. This is rather brute force but I think demonstrates the problem. Created attachment 121102 [details]
strace output when auditctl doesn't display rwatches
Created attachment 121103 [details]
strace output when auditctl does display watches
This file is truncated to only show the output of the first few watch
records. Notice the EAGAIN's from recvfrom before we start getting
records.
Thanks Linda, this analysis appears to explain all the effects. auditctl's get_reply() doesn't handle -EAGAIN, so auditctl exits before the data is ready, and then the kernel gets the -ECONNREFUSED error when trying to send the data since the auditctl process is gone. Adding extra load would cause auditctl to be scheduled less often which can give the kernel enough time to start sending data before it gives up. (BTW, the comment in lib/netlink.c for audit_get_reply() is misleading, it actually returns -errno on error, not -1) Created attachment 121141 [details]
Patch addressing the watch listing problem
A patch has been created that adds select before the recvfrom when receiving
file watch listings. It is being attached for review. auditctl is the only app
that is fixed by this patch. I am changing the timeout from 3 seconds to 4 in
some other places, too.
I ran the patched auditd against both the production and the 89 kernel. Got much better results when setting large numbers of watches. Auditctl lists them, and the audits were getting generated with some exceptions. Those exceptions a hard to define because I couldn't repeat the scenario every time. There seems to be a problem if a watch set is attempted on a file name that contains a space. Auditctl will refuse the watch, but for the watches that it does set, audit events do not get generated for a refused attempt to 'Echo' a string to the end of a watched file. Other failed attempts such as 'touch' 'rm' 'mv' etc. are detected. Sorry for the lack of a complete test case, I'll try to spend more time testing next week. re comment #28: I can't reproduce that issue, watches on path names containing spaces work fine for me. The audit records contain hex encoded file and path names as expected. Are you sure you are quoting things correctly in the shell? This is what I tried: ## run as user 'root': touch /tmp/"spaced out" mkdir /tmp/spaced\ dir/ date > /tmp/spaced\ dir/spaced\ file chmod 600 /tmp/spaced\ out /tmp/spaced\ dir/spaced\ file auditctl -a exit,possible -S open auditctl -w /tmp/"spaced out" type=CONFIG_CHANGE msg=audit(1132273264.224:33): auid=4294967295 inserted watch auditctl -w /tmp/spaced\ dir/spaced\ file type=CONFIG_CHANGE msg=audit(1132273266.948:34): auid=4294967295 inserted watch auditctl -l AUDIT_LIST: exit,possible syscall=open AUDIT_WATCH_LIST: dev=8:19, path=/tmp/spaced out, filterkey=, perms=, valid=0 AUDIT_WATCH_LIST: dev=8:19, path=/tmp/spaced dir/spaced file, filterkey=, perms=, valid=0 ## run as user "bin": echo test >> /tmp/spaced\ out type=SYSCALL msg=audit(1132273045.670:27): arch=c0000032 syscall=1028 success=no exit=-13 a0=6000000000043e20 a1=441 a2=1b6 a3=2 items=1 pid=3466 auid=4294967295 uid=1 gid=1 euid=1 suid=1 fsuid=1 egid=1 sgid=1 fsgid=1 comm="bash" exe="/bin/bash" type=FS_WATCH msg=audit(1132273045.670:27): watch_inode=2851512 watch=737061636564206F7574 filterkey= perm=0 perm_mask=10 echo test >> /tmp/spaced\ dir/spaced\ file type=SYSCALL msg=audit(1132273045.670:28): arch=c0000032 syscall=1028 success=no exit=-13 a0=6000000000043e20 a1=401 a2=1b6 a3=2 items=1 pid=3466 auid=4294967295 uid=1 gid=1 euid=1 suid=1 fsuid=1 egid=1 sgid=1 fsgid=1 comm="bash" exe="/bin/bash" type=FS_WATCH msg=audit(1132273045.670:28): watch_inode=2851512 watch=737061636564206F7574 filterkey= perm=0 perm_mask=10 If you have a reproducable test case please add that. ---- Additional Comments From bsaibes.com 2005-11-21 15:57 EDT ------- re comment #20. If it turns out that there is a problem when setting a watch on a file name that contains a space, there should be a seperate bug to track the issue, correct? re Glen's comment #30 (RH bugzilla), I agree that the space issue should be a new bug (assuming it's reproducable). The main issue regarding the watch listing was a definite bug and the tracker contains a fix for it, so I think this one is getting ready to be closed. I wasn't very clear about how to instigate the "problem" with watches after a watch is attempted on a file with spaces. I'll do some testing then maybe start a new bug report. And I understand you want to close this report. We just want to say Thanks! to you all for the hard work and attention to make Auditd better. changed: What |Removed |Added ---------------------------------------------------------------------------- Status|FIXEDAWAITINGTEST |TESTED ------- Additional Comments From rosalesa.com(prefers email via rosalesa.com) 2005-11-22 10:41 EDT ------- Marking status of this bug as \"Tested\" per comment 24. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2006-0033.html |