Description of problem: Jan 17 11:13:11 lxwstest sshd(pam_unix)[3998]: session opened for user clarkp by (uid=0) Jan 17 11:14:43 lxwstest kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 Jan 17 11:14:43 lxwstest kernel: printing eip: Jan 17 11:14:43 lxwstest kernel: c013e454 Jan 17 11:14:43 lxwstest kernel: *pde = 1e164001 Jan 17 11:14:43 lxwstest kernel: Oops: 0000 [#1] Jan 17 11:14:43 lxwstest kernel: SMP Jan 17 11:14:43 lxwstest kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod button battery ac md5 ipv6 uhci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore 3c59x mii floppy ext3 jbd Jan 17 11:14:43 lxwstest kernel: CPU: 0 Jan 17 11:14:43 lxwstest kernel: EIP: 0060:[<c013e454>] Not tainted VLI Jan 17 11:14:43 lxwstest kernel: EFLAGS: 00010202 (2.6.9-42.0.3.ELsmp) Jan 17 11:14:43 lxwstest kernel: EIP is at audit_data_get+0x83/0xdd Jan 17 11:14:43 lxwstest kernel: eax: 00000000 ebx: d8057de8 ecx: 00000012 edx: dfe214dc Jan 17 11:14:43 lxwstest kernel: esi: 00000000 edi: 00000001 ebp: d755804c esp: de291e7c Jan 17 11:14:43 lxwstest kernel: ds: 007b es: 007b ss: 0068 Jan 17 11:14:43 lxwstest kernel: Process useradd (pid: 4036, threadinfo=de291000 task=d74ba230) Jan 17 11:14:43 lxwstest kernel: Stack: d75580c0 d91bc38c d75580c7 c013e7d7 d91bc38c dee02b20 00000000 df8e44a4 Jan 17 11:14:43 lxwstest kernel: d755804c df8e4474 df8e447c c017037d e08a09e0 df8e4474 00000000 df7d5de8 Jan 17 11:14:43 lxwstest kernel: c01694e7 c16debc0 d755804c c16debc0 00000000 c16debc0 c16debc0 d755804c Jan 17 11:14:43 lxwstest kernel: Call Trace: Jan 17 11:14:43 lxwstest kernel: [<c013e7d7>] audit_update_watch+0xbc/0x1b0 Jan 17 11:14:43 lxwstest kernel: [<c017037d>] d_move+0x188/0x1ce Jan 17 11:14:43 lxwstest kernel: [<c01694e7>] vfs_rename_other+0xd8/0x125 Jan 17 11:14:43 lxwstest kernel: [<c016981f>] vfs_rename+0x2eb/0x338 Jan 17 11:14:43 lxwstest kernel: [<c01699b9>] sys_rename+0x14d/0x1e0 Jan 17 11:14:43 lxwstest kernel: [<c010a142>] do_syscall_trace+0xc0/0xca Jan 17 11:14:43 lxwstest kernel: [<c02d47cb>] syscall_call+0x7/0xb Jan 17 11:14:43 lxwstest kernel: Code: 58 08 73 0c 8b 02 8d 50 04 8b 40 04 85 c0 75 ef 8b 02 85 c0 74 0b 39 58 08 75 06 ff 00 89 c6 eb 54 85 ff 74 50 8b 35 f8 ad 43 c0 <8b> 46 04 ff 0d f4 ad 43 c0 a3 f8 ad 43 c0 c7 46 10 00 00 00 00 Jan 17 11:14:43 lxwstest kernel: <0>Fatal exception: panic in 5 seconds Jan 17 11:51:12 lxwstest syslogd 1.4.1: restart. Jan 17 11:51:12 lxwstest syslog: syslogd startup succeeded Jan 17 11:51:12 lxwstest kernel: klogd 1.4.1, log source = /proc/kmsg started. Jan 17 11:51:12 lxwstest kernel: Linux version 2.6.9-42.0.3.ELsmp (brewbuilder.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #1 SMP Mon Sep 25 17:28:02 EDT 2006 Version-Release number of selected component (if applicable): 2.6.9-42.0.3.EL How reproducible: Steps to Reproduce: 1. auditctl -w /etc/shadow 2. useradd userb Actual results: Machine freezes Expected results: user to be added Additional info: RH support - ticket #1241786
if (ret) { ret->count++; } else if (allocate) { ret = audit_data_pool; audit_data_pool = ret->next_hash; <----------- Crash Here audit_pool_size--; dprintk("allocate from pool. %d left\n", audit_pool_size);
Created attachment 145943 [details] potential fix for problem This is an untested proposed patch for this problem.
I don't like that patch -- isn't the whole point of the pool that we're preallocating data structures so that we can use them later, in a context where we cannot allocate? If we don't have enough in the pool then there's a real underlying problem which needs to be fixed.
Created attachment 146837 [details] syslog kern.debug output with dprintk on, working on serial access
QE ack for RHEL4.5.
Created attachment 147495 [details] Potential fix This message was sent to the upstream audit list. I am still hoping for some comments.... ********************** Since the upstream filesystem auditing is different than RHEL4 this problem (I believe) is RHEL4 specific. Lets assume I add the rule auditctl -w /tmp/watched_file then I run touch /tmp/unwatched_file mv -f /tmp/unwatech_file /tmp/watched_file For the purposes of this discussion lets call the inode for the original /tmp/watched_file i1 and the indoe for the original /tmp/unwatched_file i2. The RHEL4 kernel panics. The reason is because it wants to allocate a audit_inode_data to i2 but there is no audit_inode_data to allocate. Thus we get the panic as seen in RH BZ 223129. When the watch was originally inserted the kernel mallocs 2 audit_inode_data structs. One for the inode to be watched and one for the parent of the that inode. In our case the panic happens because those 2 structs are allocated for i1 and for the inode associated with /tmp. From looking at the code I saw that in d_move we were dropping the audit_inode_data associated with i2 (inside that function i2 is pointed at by "dentry") at the beginning. But we were not dropping the watch on i1. It appears that the watch was intended to have been dropped on i1 later in i_put. At the end of d_move we add the watch for i2. It was at this point that things blew up (as i_put had not yet released the data structure.) My first attempt at fixing this simply was to change d_move such that it would drop the audit_inode_data for both i1 and i2 and then add it back to i2 once i2 was the inode which needed the watch. This however simply moved the panic into i_put when it wanted to remove the audit_inode_data from the original i1. The reason it panics there is because when trying to fetch the audit_inode_data to free it, it was actually trying to allocate a new struct so it had something to free! My solution simply passes a new flag to audit_data_get which states if the intent of requesting the audit_inode_data is to use it or to remove it. If the intent of getting the audit_inode_data was to simply remove that struct from the inode there is no need to allocate a new one just to have it removed. This patch appears to fix the panic and to correctly watch the correct inodes at the correct times. I'd love comment as there may be other solutions to either of the problems. Maybe we like the removal of the watch on i1 in d_move but we don't like the new flags. In that case we could just 'mess' around with i1 at the end of d_move such that it wouldn't watch the watch in i_put and wouldn't ever get into audit_data_get to cause the secondary panic. Maybe there is something I'm missing altogether. Anyway comments please before I put this into a RHEL4 kernel would be appreciated. -Eric
committed in stream U5 build 47. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |FIXEDAWAITINGTEST Public Release| |expected in 42.47 or later Resolution| |FIX_BY_IBM Target Milestone|--- |RHEL4_U5 ------- Additional Comments From sglass.com 2007-02-13 10:51 EDT ------- This will be in the RHEL 4.5 beta.
committed in stream E5 build 42.0.10.
This issue affects _only_ the Red Hat Enterprise Linux 4 kernel. Other kernel versions or upstream kernels are _not_ known be affected.
verified on an x86_64 xen guest and an ia64 system.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2007-0085.html
committed in stream U5 build 49. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/