Bug 526819

Summary: system crashes in audit_update_watch()
Product: Red Hat Enterprise Linux 5 Reporter: Bill Braswell <bbraswel>
Component: kernelAssignee: Alexander Viro <aviro>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.2CC: cward, dhoward, dzickus, emcnabb, eparis, hjia, jpirko, jwilleford, sgrubb, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 06:50:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533861    
Attachments:
Description Flags
minimal backport of mainline fix none

Description Bill Braswell 2009-10-01 21:01:01 UTC
The customers system has crashed several times while running usermod.  Below is the area of code it is crashing in.

 971 static void audit_update_watch(struct audit_parent *parent,

...

 999  list_for_each_entry_safe(r, nextr, &owatch->rules, rlist) {
1000
1001     oentry = container_of(r, struct audit_entry, rule);
1002     list_del(&oentry->rule.rlist);
1003     list_del_rcu(&oentry->list);
1004
1005     nentry = audit_dupe_rule(&oentry->rule, nwatch);
1006     if (unlikely(IS_ERR(nentry)))
1007         audit_panic("error updating watch, removing");
1008     else {
1009         int h = audit_hash_ino((u32)ino);
1010         list_add(&nentry->rule.rlist, &nwatch->rules);
1011         list_add_rcu(&nentry->list, &audit_inode_hash[h]);
1012     }
1013
1014     call_rcu(&oentry->rcu, audit_free_rule_rcu);
1015  }
1016
1017  if (audit_enabled) {
1018     struct audit_buffer *ab;
1019     ab = audit_log_start(NULL, GFP_KERNEL,
1020     AUDIT_CONFIG_CHANGE);
1021     audit_log_format(ab,
1022        "op=updated rules specifying path=");
1023        audit_log_untrustedstring(ab, owatch->path);
1024     audit_log_format(ab, " with dev=%u ino=%lu\n",
1025        dev, ino);
1026     audit_log_format(ab, " list=%d res=1", r->listnr);
1027     audit_log_end(ab);
1028  }

line 999 “r” is populated from owatch->rules.

Line 1001, r is used to calculate the address of oentry.

Line 1002, oentry is dereferenced. It is a valid address or the system would have crashed at that point. Since oentry is valid, then conversely, “r” is also valid.

Between lines 1002 and 1014, oentry is used a few other times and is still valid.

Line 1015 the “for” loop started on line 999 ends.

Line 1026, the system crashes because “r” is now an invalid address. Also, oentry is no longer valid. They have been unmapped.

It looks as if something else has been manipulating the audit rules for this inode while audit_update_watch() is working with it.

https://bugzilla.redhat.com/show_bug.cgi?id=453042 may possibly be related since comment 4 mentions an item being removed from the watchlist when it is not suppose to be.  But with the number of changes to this area of code, it is hard to say for sure.


megatron.gsslab.rdu.redhat.com:/cores/20090810203804/work/

     KERNEL: /cores/20090810203804/work/vmlinux
   DUMPFILE: /cores/20090810203804/work/2009-0803-0402.05-ssnladen.1.core
       CPUS: 8
       DATE: Mon Aug  3 04:02:05 2009
     UPTIME: 150 days, 11:47:42
LOAD AVERAGE: 0.24, 0.05, 0.02
      TASKS: 211
   NODENAME: ssnladen.ces.census.gov
    RELEASE: 2.6.18-92.1.10.el5xen
    VERSION: #1 SMP Wed Jul 23 04:11:52 EDT 2008
    MACHINE: x86_64  (2800 Mhz)
     MEMORY: 58.6 GB
      PANIC: "Oops: 0000 [1] SMP " (check log for details)
        PID: 27986
    COMMAND: "usermod"
       TASK: ffff880e70dc80c0  [THREAD_INFO: ffff880e7ef74000]
        CPU: 3
      STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 27986  TASK: ffff880e70dc80c0  CPU: 3   COMMAND: "usermod"
#0 [ffff880e7ef75a10] xen_panic_event at ffffffff802705a3
#1 [ffff880e7ef75a40] notifier_call_chain at ffffffff80267749
#2 [ffff880e7ef75a60] panic at ffffffff8028ce39
#3 [ffff880e7ef75b50] oops_end at ffffffff802650e0
#4 [ffff880e7ef75b60] do_page_fault at ffffffff8026765f
#5 [ffff880e7ef75c50] error_exit at ffffffff80260823
   [exception RIP: audit_update_watch+618]
   RIP: ffffffff802ab7d5  RSP: ffff880e7ef75d08  RFLAGS: 00010296
   RAX: ffff880e6db3d87c  RBX: ffff880ea5929ac0  RCX: 0000000000000007
   RDX: ffff880e6db3d85c  RSI: 0000000000000020  RDI: ffff880e6db3d87b
   RBP: ffff880e72008fb0   R8: 00000000ffffffff   R9: 0000000000000020
   R10: 0000000000000000  R11: 0000000000000000  R12: ffff880e75ed9dc0
   R13: ffff880e720090c0  R14: ffffffff80685aa0  R15: ffff880e72008fb0
   ORIG_RAX: ffffffffffffffff  CS: e030  SS: e02b
#6 [ffff880e7ef75d00] audit_update_watch at ffffffff802ab7d5
#7 [ffff880e7ef75d60] inotify_inode_queue_event at ffffffff8020d77c
#8 [ffff880e7ef75dc0] vfs_rename at ffffffff8022b1f4
#9 [ffff880e7ef75e10] sys_renameat at ffffffff80237e16
#10 [ffff880e7ef75f80] tracesys at ffffffff802602f1 (via system_call)
   RIP: 0000003edb65f5b7  RSP: 00007fff4101e658  RFLAGS: 00000287
   RAX: ffffffffffffffda  RBX: ffffffff802602f1  RCX: ffffffffffffffff
   RDX: 0000000000000000  RSI: 0000000000611a00  RDI: 00007fff4101f790
   RBP: 0000000000000001   R8: 0000000000000000   R9: 00000000ffffffff
   R10: 5a5969655a6f426a  R11: 0000000000000287  R12: 00007fff4101f790
   R13: 0000000000611a00  R14: 0000000000000000  R15: 0000000000611a00
   ORIG_RAX: 0000000000000052  CS: 0033  SS: e02b

Comment 2 Alexander Viro 2009-10-02 13:46:37 UTC
Created attachment 363484 [details]
minimal backport of mainline fix

Comment 3 Alexander Viro 2009-10-02 13:50:40 UTC
That's not a race, it's a straight free-then-use bug fixed in mainline by eparis; yes, we need it in 5.4, the minimal backport in the attachment above.  Original mainline commit is

commit e85188f424c8eec7f311deed9a70bec57aeed741
Author: Eric Paris <eparis>
Date:   Thu Jun 11 14:31:34 2009 -0400

    Audit: dereferencing krule as if it were an audit_watch
    
    audit_update_watch() runs all of the rules for a given watch and duplicates
    them, attaches a new watch to them, and then when it finishes that process
    and has called free on all of the old rules (ok maybe still inside the rcu
    grace period) it proceeds to use the last element from list_for_each_entry_safe()
    as if it were a krule rather than being the audit_watch which was anchoring
    the list to output a message about audit rules changing.
    
    This patch unfies the audit message from two different places into a helper
    function and calls it from the correct location in audit_update_rules().  We
    will now get an audit message about the config changing for each rule (with
    each rules filterkey) rather than the previous garbage.

Comment 4 RHEL Program Management 2009-10-05 16:21:18 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Alexander Viro 2009-10-05 19:45:28 UTC
Patch posted

Comment 7 Issue Tracker 2009-10-14 16:13:41 UTC
Event posted on 2009-10-14 12:13 EDT by jwilleford

Still looking for this for 5.4.z.  What is the update on this issue?


This event sent from IssueTracker by jwilleford 
 issue 325421

Comment 15 Don Zickus 2009-10-21 19:13:37 UTC
in kernel-2.6.18-170.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 29 Chris Ward 2010-02-11 10:18:56 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 32 errata-xmlrpc 2010-03-30 06:50:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 33 Red Hat Bugzilla 2023-09-14 01:18:17 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days