Bug 158591 - New audit.conf in Update 5 causes problems
New audit.conf in Update 5 causes problems
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: laus (Show other bugs)
3.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Jason Vas Dias
Jay Turner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-23 15:52 EDT by Chris Carr
Modified: 2015-01-07 19:10 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-05-23 16:49:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
audit.conf file (3.26 KB, text/plain)
2005-07-29 04:42 EDT, Andrew P
no flags Details
audit.conf file (3.28 KB, text/plain)
2005-07-29 04:44 EDT, Andrew P
no flags Details

  None (edit)
Description Chris Carr 2005-05-23 15:52:11 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.7) Gecko/20050414 Firefox/1.0.3

Description of problem:
The new audit.conf file in Update 5 has the following line:

notify          = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20%";

The effect of this line is to cause auditd to suspend when the drive containing /var/log/audit.d/ has less than 20% available space. When auditd suspends, all auditable actions cannot take place. Logging in, accessing files, etc become impossible. A power cycle is the only option. Users who had the previous RPM installed and who had less than 20% remaining on their /var/log/audit.d partitions would have their systems become unresponsive the next time auditd tries to rotate its log files.

Version-Release number of selected component (if applicable):
laus-0.1-70RHEL3.i386.rpm

How reproducible:
Always

Steps to Reproduce:
1. Install the RPM version immediately preceeding laus-0.1-70RHEL3.i386.rpm
2. Create a large file such that the partition where /var/log/audit.d resides has less than 20% space remaining.
3. Upgrade to laus-0.1-70RHEL3.i386.rpm.
4. Perform auditable actions until auditd tries to rotate its files.
5. Observe that further auditble actions (logging in, writing files, etc.) are no longer possible.
  

Actual Results:  Was forced to power-cycle my production server.

Expected Results:  New package should not change expected behavior.

Additional info:
Comment 1 Jason Vas Dias 2005-05-23 16:49:13 EDT
The audit.conf file contains USER CONFIGURABLE settings for the LAuS 
audit subsystem.

The new audit.conf defaults were part of the fix for bug #130071 :
"LAus creates an ever-increasing (and never rotated/deleted) set of logfiles" .

The fix was to give audbin, the program which rotates auditd's binary
log files, a -T "threshold" option and a -N "notify command" option 
(see man 1 audbin).

Without these options, the default behaviour of audbin was to simply save 
each rotated binary log file without checking disk usage, until the audit
partition (by default /var) was 100% full, and then fail to save the next
log; when it failed, auditd would then enter suspend mode. So you were left
with a 100% full /var partition, causing syslogd (and every process using
it) to hang, AND a suspended audit process .
  
Now, with the -T option and no -N option, audbin will check that the percentage
of free disk space on the audit log partition is at least equal to the 
threshold; once the amount of free space falls below the threshold, audbin
will not save the rotated log file, and returns an error to auditd .

With the -T and -N options, audbin will execute the -N command for the oldest
saved audit log file when the amount of free space on the audit log partition
falls below the threshold, and then recheck the amount of free space, repeating
for the next oldest saved audit log until the free space is above the threshold.
So the -N command could be something like "mv %f /backup" or "rm -f %f", to
move or remove the oldest saved log file and increase the amount of free space
on the audit log partition. If the -N command fails to increase the amount 
of free space to the threshold for all saved audit logs, audbin will return
an error to auditd.
  
The default error action in the default /etc/audit/audit.conf is specified as:

   error {
          action {
                  type = suspend;
                 };
          ...

This is the action taken by auditd when audbin fails.

You are thus enabled to configure the audit system to :

 o discard / move old saved audit log files until free space is above the
   threshold by setting the audit notify command :
   ie. replace this line in the default audit.conf :
       notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20%";
   with 
       notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'rm
-f %f'"; 
  
and / or :

 o tell auditd to ignore audbin errors, or to just emit a log message when
   they occur, by simply removing the "action { type = suspend; }" clause
   from audit.conf .

or :

 o disable the audit subsystem ("chkconfig --del auditd") if you do not
   require the audit log data - performance of the whole system will increase.

Since auditd is NOT enabled by default, it is presumed that if it is enabled,
auditing is important to you and you do not want to lose audit messages .

Hence, the default action taken when audbin fails must be to enter suspend mode,
as any other action could result in loss of audit data.

No default audbin -N command is specified because the partition on which the
audit log files reside is user configurable, so no default partition to move
audit log files can be determined in advance, and a default -N command to 
remove saved audit log files cannot be specified, as that could result in 
loss of audit data .
 
I'm sorry the new audit.conf file resulted in system downtime for you ;
but if the audit log partition had been filled up by saved audit logs, 
you would also have had downtime, and may not have been informed of the
cause since syslog logging would also have been disabled .

The audit.conf file shipped by default provides default settings that will
not result in audit data loss when the audit partition fills up. There are
several ways as described above to configure audit not to suspend or to
clean up old saved audit logs and reclaim free space automatically. 

So this problem is not a LAuS audit system bug - LAuS is behaving
correctly as configured by the user, and there is no other reasonable
default configuration to supply that would not result in either an
auditd suspend or loss of audit data.

Comment 2 Simon Matter 2005-05-25 07:46:00 EDT
Unfortunately everyone upgrading from U4 and below still runs into the problem
above because
1) laus was installed per default
2) auditd was started per default
3) audit.conf was and is broken considering 1) and 2)

The new audit.conf is still broken in my eyes. In my case /var was filled up to
20% free space very quickly. The effect was that logins were denied, even on the
console! How do you fix without root access or rebooting?

from /var/log/messages:

May 25 11:06:41 mailhub audbin[15254]: clearing binary audit log
/var/log/audit.d/bin.0
May 25 11:30:59 mailhub audbin[20556]: saving binary audit log
/var/log/audit.d/bin.1
May 25 11:30:59 mailhub audbin[20556]: threshold 20.00 exceeded for filesystem
/var/log/audit.d/. - free blocks down to 19.99%
May 25 11:30:59 mailhub auditd[2139]: Notify command /usr/sbin/audbin -S
/var/log/audit.d/save.%u -C -T 20% exited with status 1
May 25 11:30:59 mailhub auditd[2139]: output error
May 25 11:30:59 mailhub auditd[2139]: output error
May 25 11:30:59 mailhub auditd[2139]: output error; suspending execution
May 25 12:57:10 mailhub sshd(pam_unix)[28616]: authentication failure; logname=
uid=0 euid=0 tty=NODEVssh ruser= rhost=sup.mydomain.com  user=root
Comment 3 Jason Vas Dias 2005-05-25 09:44:54 EDT
As I stated in Comment #1, you need to configure the audit system to suit
your requirements and site policies. 
 - If you do not require auditing, disable it:
   # chkconfig --del audit
or
 - If you do not mind losing audit data, remove the suspend action 
   from audit.conf
or
 - If you want to keep saved audit logs, make them be rotated to another
   partition - change the "notify" line in audit.conf to:
      notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'mv %f
/backup'";
   where /backup is on a different partition to the audit logs
 - If you want to discard saved audit logs, remove them:
   change the "notify" line in audit.conf to:
      notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'rm -f
%f'";

These decisions are yours to make.

The default audit configuration should not and will not make them for you.

 

Comment 4 Simon Matter 2005-05-25 11:17:39 EDT
Thanks for your explanation. The problem I and others have is that with U5, you
can do a default install and laus is installed but not enabled by default. Those
running/installing older releases end up with auditd enabled, even after running
up2date. I verified it, auditd is enabled per default on all installs until U5.
I understand that disabling auditd on upgrade of the laus package as a
postinstall step is also bad.
Comment 5 Chris Carr 2005-05-26 11:00:57 EDT
"These decisions are yours to make.

"The default audit configuration should not and will not make them for you."

Before upgrading from U4 to U5, I had 10% disk space remaining, which is about
26GB in my case. (That can hold a lot of audit files.) I knew this quite well
and I was comfortable with it. The new audit.conf *decided* that -- because I
had less than 20% (52GB) remaining -- I should no longer be permitted to log in
to my production server even as root.

Any change from the status quo is a decision, and that decision should be made
by the administrator of the hardware.

It's unconsionable to expect admins to read every single changelog for every rpm
before upgrading. Yes, admins should make the decisions about how daemons behave
on their system. I for one had made a decision not to change the original
audit.conf file that was installed prior to U5. Maybe that was unwise, but it
was my decision. This upgrade over-rode my decision and caused significant down
time of my production server.

"...if the audit log partition had been filled up by saved audit logs, 
you would also have had downtime, and may not have been informed of the
cause since syslog logging would also have been disabled ."

No. I monitor my disk usage daily, and would not have reached the point of
filling my hard drive. The information in the system log was certainly helpfull
-- after I was forced to power cycle my production server.
Comment 6 Jason Vas Dias 2005-05-26 12:24:04 EDT
You can also specify the -T <threshold> audbin argument as a number of
disk blocks - eg. -T 1000 would mean that audbin would not attempt to 
create a save.%u file and return an error to auditd when there was
1000 free blocks or less - see man audbin(1) .
There are now many ways to create a audit log disk usage threshold / 
rotation policy, which was not the case with the previous audit 
configuration - there was no threshold for the amount of space 
occupied by saved binary audit logs before .


Comment 7 Chris Carr 2005-05-26 13:20:39 EDT
I appreciate that the new audit.conf is probably a better default for new
installations. I also appreciate all the terrific choices there are for admins
who explore the wonders of auth.conf, audbin, and their man pages. Indeed, I
find your points very educational, and I'm sure my server will benefit greatly
from your tutelage. But all of your responses are missing the central point:
systems with an installed pre-U5 audit.conf and less than 20% free space on the
/var/log/audit partition become unresponsive after upgrading to U5 where they
had been working correctly prior to the upgrade.

All these marvelous configuration options you're positing might well have
already been considered by an admin of a pre-U5 box, and he might have made a
conscious and deliberate decision to keep the installed defaults. This
hypothetical admin may have been unwise to do so, but as you say, it's his decision.

Whether an admin of a pre-U5 box has this knowdlege, or whether, like me, he
didn't even know laus was installed, if his auth log partition has less than 20%
free space, the upgrade to U5 is going to take his server down, and he's not
going to have warning or opportunity to check his auth.conf file before it happens.
Comment 8 solomon 2005-07-05 17:44:24 EDT
The following notify line does not work properly:

notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'rm -f
%f'"; 

For example while my parition is below the threshold, the following save files
are created:
save.0
save.1
save.2

Now the first time the threshold is reached, files are rotated properly:
bin.x  => save.0
save.0 => save.1
save.1 => save.2
save.2 => /dev/null

At the next rotation, if the thresold is still exceeded, only save.0 is rotated.
 This leaves save.1 and save.2 hanging around for eternity:
bin.x => save.0
save.0 => /dev/null
save.1
save.2
Comment 9 Jason Vas Dias 2005-07-11 10:20:17 EDT
Actually, the command is working properly - save.%u files are NOT 
left " hanging around for eternity " :
When audbin gets the arguments "-T 20% -N 'rm -f %f'", then it follows
this algorithm:
 While the free disk space on the partition to which the save.%u file is
 to be saved is less than 20% of the total disk space, perform the -N command, 
 replacing %f in the command with the name of the oldest save.%u file found.
So in this example, when the bin.x file is rotated the first time, it is found
that saving it would make the free space go below 20%, so save.0 is processed 
with the -N command (removed), leaving save.1 and save.2 unchanged ; this
happens to free up enough space to increase free space above the threshold - 
if not, further save.%u files would be processed until free space rises above
the threshold. The %u in "save.%u" always begins at 0, so once enough free 
space has been freed up, the bin.x file is saved to save.0, again leaving 
save.1 and save.2 unchanged. The next time around, if saving bin.x would make
free space fall below the threshold, save.1 would be chosen as the file to
rotate, since it would be the oldest save.%u file.
You should see messages from audbin in /var/log/messages each time a save.%u
file is process of the form:
 audbin: threshold 20.00% exceeded for filesystem /var - free blocks down to
19.90%: running notify command: rm -f /var/log/audit/save.0
The save.%u files may not always be created in numeric order, but you can see
their creation order with 'ls -ltr /var/log/audit/save.*'.
NOTE: the -T 20% threshold parameter and -N '...%f' command are meant to be
configurable, so you could say "-T 50000" to make the free space threshold
50000 disk blocks, or "-T 2%" to make it 2% of total disk blocks, and you can
make the "-N '...%f'" command something like "mv %f /backup/`date +'%Y-%m-%d'`".

  

Comment 10 Andrew P 2005-07-28 06:05:49 EDT
Additional info, somewhat border-topic :

I encountered this same "DoS" provoqued by LauS Audit.
I was unable to log in to the server with ssh or other, and shut down the 
audit service.
Shutting down audit service "resumes" the login capacities of the server. 

This is what I did to access the server, shut down audit, make some space 
in /var, and allow fellow admins to login :
Access the production server using "Webmin", and then use the command shell 
CGI to perform these tasks.

I have noticed that Webmin can be trusted as the "last man standing", letting 
access to a server with all other doors blocked.  
Be it a full "/", full "/var" , or suspended "audit".

My advice would be to have an https Webmin running on all servers, giving 
that "Last Resort" opportunity to access a broken server.



Comment 11 Andrew P 2005-07-28 08:53:48 EDT
Now, back to topic :

I have recovered access to the server, and modified the audit.conf as 
proposed :
But as soon as I startup the audit service, it gets back to "suspend" mode.

I tried :
- adding  "rm -f %f" 
- commenting the action { suspend }
- removing the threshold "-T ..."

I also put the threshold to a higher value, hoping it would provoke the rm 
action. 

How do you recover from a suspend situation ? 
# service audit stop
"cleaning /var"
# service audit start 
does not work 

Comment 12 Jason Vas Dias 2005-07-28 10:24:41 EDT
In reply to comment #11:

Please supply some further information:

- What version of laus are you running ( should be laus-0.1-70RHEL3 ).

- What does your audit.conf look like (please attach it to this bug report) -
  The 'notify' setting should look like:
   notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T X% -N 'rm -f %f'";
  (Note the quotes around "-N 'rm -f %f'", without which the command will not
   work).

- After you commented out the "suspend" action / removed the "-T" threshold,
  did you  'service audit restart' ? Changes to audit conf will only take
  effect  after a restart.

- After you cleaned the /var partition, what does 'df -k /var' report ? 
  What does the -T threshold setting in audit.conf say ?

- If the machine is still locking up without the "suspend" action in audit.conf,
  it is unlikely to be audit that causes the problem.

Please 'grep audit /var/log/messages' and append the output from around the
time the machine locked up to this bug report.
   
Comment 13 Andrew P 2005-07-29 04:42:19 EDT
Created attachment 117264 [details]
audit.conf file

with suspend mode commented out, and threshold out
Comment 14 Andrew P 2005-07-29 04:44:31 EDT
Created attachment 117265 [details]
audit.conf file

with the "rm log files" action
Comment 15 Andrew P 2005-07-29 04:58:30 EDT
version : laus-0.1-70RHEL3

see attachments for audit.conf 
- without "suspend action", and then also without threshold
- with "rm" on threshold 

I did use service audit stop and service audit start to take changes into 
account.

For now , /var and threshold are :
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/vg00/hd9var       1032088    420328    559332  43% /var
Threshold  20%  ( or 60 %  same effect)

I am pretty sure that "audit" is the culprit here. See why :
I start audit.
I start a ssh session, provide username and password : the screen stays 
freezed (I cannot access the server).
I stop audit.
Immediately, the frozen ssh login session resumes and gives a prompt.



Here is the output of /var/log/messages
Jul 28 05:00:00 frdatlxpih audbin[28603]: saving binary audit 
log /var/log/audit.d/bin.2
Jul 28 05:00:00 frdatlxpih audbin[28603]: threshold 20.00 exceeded for 
filesystem /var/log/audit.d/. - free blocks down to 5.39%
Jul 28 05:00:00 frdatlxpih auditd[2587]: Notify command /usr/sbin/audbin -
S /var/log/audit.d/save.%u -C -T 20% exited with status 1
Jul 28 05:00:00 frdatlxpih auditd[2587]: output error
Jul 28 05:00:00 frdatlxpih auditd[2587]: output error
Jul 28 05:00:00 frdatlxpih auditd[2587]: output error; suspending execution
Jul 28 11:01:34 frdatlxpih audit: auditd -TERM succeeded
Jul 28 11:04:19 frdatlxpih audit: Démarrage de auditd succeeded
Jul 28 11:04:19 frdatlxpih audit: auditd startup succeeded
Jul 28 11:12:21 frdatlxpih audit: auditd -TERM succeeded
Jul 28 13:46:32 frdatlxpih audit: Démarrage de auditd succeeded
Jul 28 13:46:32 frdatlxpih audit: auditd startup succeeded
Jul 28 14:15:20 frdatlxpih audit: auditd -TERM succeeded
Jul 28 14:15:33 frdatlxpih audit: Démarrage de auditd succeeded
Jul 28 14:15:33 frdatlxpih audit: auditd startup succeeded
Jul 28 14:27:06 frdatlxpih audit: auditd startup succeeded
Jul 28 14:29:23 frdatlxpih audit: auditd -TERM succeeded
Jul 28 14:31:38 frdatlxpih audit: Démarrage de auditd succeeded
Jul 28 14:31:38 frdatlxpih audit: auditd startup succeeded
Jul 28 14:40:00 frdatlxpih audit: auditd -TERM succeeded
Jul 28 14:40:04 frdatlxpih audit: Démarrage de auditd succeeded
Jul 28 14:40:04 frdatlxpih audit: auditd startup succeeded
Jul 28 14:41:43 frdatlxpih audit: auditd -TERM succeeded
Jul 28 14:42:51 frdatlxpih audit: Démarrage de auditd succeeded
Jul 28 14:42:51 frdatlxpih audit: auditd startup succeeded
Jul 28 14:43:16 frdatlxpih audit: auditd -TERM succeeded
Jul 28 14:44:02 frdatlxpih audit: Arrêt de auditd failed
Jul 28 14:46:31 frdatlxpih audit: Démarrage de auditd succeeded
Jul 28 14:46:31 frdatlxpih audit: auditd startup succeeded
Jul 28 14:46:41 frdatlxpih audit: auditd -TERM succeeded
Comment 16 Jason Vas Dias 2005-11-30 13:28:43 EST
The last log entries appended to this bug report show that the "-N 'rm -f %f'"
audbin notify audit.conf option was NOT in use when the problem occurred :

Jul 28 05:00:00 frdatlxpih auditd[2587]: Notify command /usr/sbin/audbin -
S /var/log/audit.d/save.%u -C -T 20% exited with status 1

If the "-N 'rm -f %f'" option had been in use, this would have been printed
in the above log message if the command had failed. 

I have tested using the "-N 'rm -f %f'" audbin option, and when the disk usage 
threshold is exceeded, audbin has always removed the oldest save.N files until
the free space is above the threshold.

I have also tested commenting out the default audit.conf error action:
   error {
#          action {
#                  type = suspend;
#                 };
and this works as expected; if disk usage is exceeded, auditd writes a 
syslog error message and does not enter suspend mode; further audit log
records are discarded.

We cannot specify a default audit.conf that could potentially delete / ignore /
lose audit data - suspend mode is the only option that avoids this possibility,
without further customization by administrators.

Our audit system achieved CAPP / EAL3 compliance because it was designed this
way; if the default configuration specified that audit data should be discarded
when disk space is exhausted without entering suspend mode, it is doubtful 
whether the system could be certified as compliant with these standards.

It was a mistake that audit was enabled (the auditd was started) by default
prior to U5; auditing should only be used on systems that actually require it,
and which have configured auditd to deal with rotated audit logs and disk 
space exhaustion to suit site requirements using the mechanisms described above.
We apologize for the problems this has caused. 
But the fix for this issue is very simple:
 o if you do not require audit data to be collected, disable audit:
   # chkconfig --del audit
 o if you do require audit data collection , enable the audit service
   and use the '-T' and '-N' audbin options (documented above and in man 
   audbin(8)) to implement a log rotation / backup / removal policy - then
   your system should never enter suspend mode.   
Sorry, but this remains "NOTABUG".

Note You need to log in before you can comment on or make changes to this bug.