Bug 429941
Summary: | [RHEL5 U2] Audit fails to shutdown properly | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Burke <jburke> |
Component: | kernel | Assignee: | Eric Paris <eparis> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | aviro, duck, dzickus, eparis, mgahagan, sgrubb |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://rhts.lab.boston.redhat.com/testlogs/14060/49430/403056/1734497-test_log--kernel-security-audit-audit-test-2088-EXTERNALWATCHDOG | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-20 20:22:19 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jeff Burke
2008-01-23 21:42:25 UTC
Here's my first thought... notice kauditd and auditd are both hung up on a waitqueue inside the netlink code. We either get on this waitqueue because the netlink socket state is no longer 'NETLINK_CONNECTED' or because we ran out of room to add the skb to the socket. see netlink_attachskb() I'm going to assume it is the 'too much stuff in the socket buffer to take another skb' problem. the auditd daemon is what is supposed to be draining that socket but it is blocked trying to send a netlink message back to itself... Would it be possible to instead of having audit_send_reply() directly call into the netlink_unicast() to instead put the skbs on the queue for kauditd to send later? This way auditd could be freed up to drain the socket and kauditd could start making progress again. audit_send_reply() is only sent in response to a AUDIT_GET, AUDIT_SIGNAL_INFO, or AUDIT_TTY_GET, request. I assume this isn't a problem since there is currently no gaurentee that the next skb auditd will pop will actually be this reply, so making it even more likely it won't be this reply doesn't seem like it would be a problem. sgrubb am I correct in my belief that auditd will be just fine if these replies come after some number of other messages which have already been placed on the queue to be send by kauditd? Basic summary of what I think happened: 1) auditd got busy doing something, not draining the socket 2) we generated enough audit messages to fill the socket, kauditd had to sleep 3) auditd sent a AUDIT_GET, AUDIT_SIGNAL_INFO, or AUDIT_TTY_GET request 4) auditd went to sleep waiting for its own reply If we can put the reply in step 4 on the audit_skb_queue I think we wouldn't be deadlocking. If we can't, I think we need to fix userspace so we can :) Or we could do what audit_send_list() does and just run in a whole new kthread sgrubb suggested a second netlink socket where we could have sync communication, that could be used for this and for the audit_send_list() stuff. (admittedly I still like my first answer) In reply to comment #1, auditd might be ok. We'd have to test it to see. During shutdown, auditd would be trying to get signal info. Its not doing anything with tty and it probably is not doing anything with a get command. I'd be 99% sure its trying to query signal info. In the case of hup, usr1, & 2, all its looking for is an ACK that the command was received. The actual data can arrive later. In the case of sigterm, the audit daemon breaks out of its loop reading events and sends a set auditd pid 0 to stop the flow of events. Maybe its not getting an ack on setting pid 0. If we set auditd_pid 0, the whole queue can be flushed (maybe to syslog) since we don't care anymore. It might be worthwhile patching a kernel to collect additional data. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-99.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |