429941 – [RHEL5 U2] Audit fails to shutdown properly

Bug 429941 - [RHEL5 U2] Audit fails to shutdown properly

Summary: [RHEL5 U2] Audit fails to shutdown properly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Paris
QA Contact:	Martin Jenner
Docs Contact:
URL:	http://rhts.lab.boston.redhat.com/tes...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-01-23 21:42 UTC by Jeff Burke
Modified:	2009-01-20 20:22 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-01-20 20:22:19 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:0225	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update	2009-01-20 16:06:24 UTC

Description Jeff Burke 2008-01-23 21:42:25 UTC

Description of problem:
 While running the audit-test-2088 test suite the system will hang.

Version-Release number of selected component (if applicable):
 2.6.18-71.el5

How reproducible:
 Rare

Steps to Reproduce:
1. Install RHEL5-U1. Update with kernel 2.6.18-71.el5
2. Install the audit-test-2088 RHTS rpm
3. While running the test restart audit daemon
  
Actual results:

INIT: Sending processes the TERM signal
Shutting down Avahi daemon: [  OK  ]
Stopping HAL daemon: [  OK  ]
Stopping yum-updatesd: [  OK  ]
Stopping atd: [  OK  ]
Stopping cups: [  OK  ]
Shutting down xfs: [  OK  ]
Shutting down console mouse services: [  OK  ]
Stopping sshd: [  OK  ]
Shutting down sm-client: [  OK  ]
Shutting down sendmail: [  OK  ]
Shutting down vsftpd: [  OK  ]
Stopping acpi daemon: [  OK  ]
Stopping crond: [  OK  ]
Stopping autofs:  [  OK  ]
Stopping automount: [  OK  ]
Shutting down ntpd: [  OK  ]
Killing mdmonitor: [  OK  ]
Stopping system message bus: [  OK  ]
Stopping RPC idmapd: [  OK  ]
Stopping NFS statd: [  OK  ]
Stopping mcstransd: [  OK  ]
Stopping portmap: [  OK  ]
Shutting down restorecond: [  OK  ]
Stopping auditd: [FAILED]

Expected results:
 System should reboot properly.

Additional info:
Here is a link to the log with some Alt+SysRq data
http://rhts.lab.boston.redhat.com/testlogs/14060/49430/403056/1734497-test_log--kernel-security-audit-audit-test-2088-EXTERNALWATCHDOG
http://rhts.lab.boston.redhat.com/testlogs/14060/49430/403056/TESTOUT.log

Note: After several days the above logs will be zipped to save on space. You may
need to add a .gz to the end of the urls.

Comment 1 Eric Paris 2008-01-24 19:55:44 UTC

Here's my first thought...  notice kauditd and auditd are both hung up on a
waitqueue inside the netlink code.  We either get on this waitqueue because the
netlink socket state is no longer 'NETLINK_CONNECTED' or because we ran out of
room to add the skb to the socket.  see netlink_attachskb()

I'm going to assume it is the 'too much stuff in the socket buffer to take
another skb' problem.  the auditd daemon is what is supposed to be draining that
socket but it is blocked trying to send a netlink message back to itself...

Would it be possible to instead of having audit_send_reply() directly call into
the netlink_unicast() to instead put the skbs on the queue for kauditd to send
later?  This way auditd could be freed up to drain the socket and kauditd could
start making progress again.

audit_send_reply() is only sent in response to a AUDIT_GET, AUDIT_SIGNAL_INFO,
or AUDIT_TTY_GET, request.  I assume this isn't a problem since there is
currently no gaurentee that the next skb auditd will pop will actually be this
reply, so making it even more likely it won't be this reply doesn't seem like it
would be a problem.

sgrubb am I correct in my belief that auditd will be just fine if these replies
come after some number of other messages which have already been placed on the
queue to be send by kauditd?

Comment 2 Eric Paris 2008-01-24 20:12:45 UTC

Basic summary of what I think happened:

1) auditd got busy doing something, not draining the socket
2) we generated enough audit messages to fill the socket, kauditd had to sleep
3) auditd sent a AUDIT_GET, AUDIT_SIGNAL_INFO, or AUDIT_TTY_GET request
4) auditd went to sleep waiting for its own reply

If we can put the reply in step 4 on the audit_skb_queue I think we wouldn't be
deadlocking.  If we can't, I think we need to fix userspace so we can   :)

Comment 3 Eric Paris 2008-01-24 20:21:43 UTC

Or we could do what audit_send_list() does and just run in a whole new kthread

Comment 4 Eric Paris 2008-01-24 20:23:14 UTC

sgrubb suggested a second netlink socket where we could have sync communication,
that could be used for this and for the audit_send_list() stuff.

(admittedly I still like my first answer)

Comment 5 Steve Grubb 2008-01-24 20:51:21 UTC

In reply to comment #1, auditd might be ok. We'd have to test it to see. During
shutdown, auditd would be trying to get signal info. Its not doing anything with
tty and it probably is not doing anything with a get command. I'd be 99% sure
its trying to query signal info. 

In the case of hup, usr1, & 2, all its looking for is an ACK that the command
was received. The actual data can arrive later. In the case of sigterm, the
audit daemon breaks out of its loop reading events and sends a set auditd pid 0
to stop the flow of events. Maybe its not getting an ack on setting pid 0. If we
set auditd_pid 0, the whole queue can be flushed (maybe to syslog) since we
don't care anymore.

It might be worthwhile patching a kernel to collect additional data.

Comment 6 Eric Paris 2008-04-02 15:41:45 UTC

https://www.redhat.com/archives/linux-audit/2008-March/msg00131.html

Comment 7 RHEL Program Management 2008-04-04 18:39:54 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Don Zickus 2008-07-23 18:54:39 UTC

in kernel-2.6.18-99.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 12 errata-xmlrpc 2009-01-20 20:22:19 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.