Description of problem: System regularly panics with this oops: vmcore is available on megatron:/cores/20110121044439 KERNEL: usr/lib/debug/lib/modules/2.6.18-194.17.1.el5/vmlinux DUMPFILE: 00399921_vmcore_238 CPUS: 16 DATE: Mon Jan 17 09:33:17 2011 UPTIME: 00:12:27 LOAD AVERAGE: 0.12, 0.07, 0.05 TASKS: 289 NODENAME: gnx10199 RELEASE: 2.6.18-194.17.1.el5 VERSION: #1 SMP Mon Sep 20 07:12:06 EDT 2010 MACHINE: x86_64 (2931 Mhz) MEMORY: 63.1 GB PANIC: "" general protection fault: 0000 [1] SMP last sysfs file: /class/misc/autofs/dev CPU 5 Modules linked in: nfs fscache nfs_acl autofs4 hidp l2cap bluetooth lockd sunrpc xt_tcpudp iptable_filter ip_tables x_tables dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev shpchp ide_cd sg cdrom bnx2 pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 5743, comm: rpm Not tainted 2.6.18-194.17.1.el5 #1 RIP: 0010:[<ffffffff8012e13f>] [<ffffffff8012e13f>] selinux_bprm_post_apply_creds+0xad/0x440 RSP: 0018:ffff81105c9c7cb8 EFLAGS: 00010286 RAX: 0001001000000000 RBX: ffff811073287000 RCX: ffff81105ab517a0 RDX: ffffffff8005dde9 RSI: ffff81106fd548c0 RDI: ffffffff803f1300 RBP: 0000003a2021bb90 R08: 0000000000000003 R09: 0000003a2021cae8 R10: 0000000000000873 R11: ffffffff8012e092 R12: ffff81106fd548c0 R13: ffff811078883000 R14: ffff81107e692000 R15: ffff811061dcb2e0 FS: 0000000000000000(0000) GS:ffff810139dc04c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003a2021cae8 CR3: 0000001059531000 CR4: 00000000000006e0 Process rpm (pid: 5743, threadinfo ffff81105c9c6000, task ffff81105ab517a0) Stack: ffff811078883000 0000003a2021cae8 0000003a2021bb90 ffff81105f5d7cc0 ffff81107a62c6b8 0000000000000873 0000000000000000 ffff81105c9c7c08 0000000000000008 00000000000000a3 0000000000000000 0000000000000000 Call Trace: [<ffffffff8012faa1>] selinux_bprm_apply_creds+0x19/0xc2 [<ffffffff80019381>] load_elf_binary+0x1052/0x1835 [<ffffffff800e66b9>] get_arg_page+0x3c/0x95 [<ffffffff80017e20>] copy_strings+0x1ec/0x1fd [<ffffffff8003f8eb>] search_binary_handler+0xbb/0x26d [<ffffffff8003ee7d>] do_execve+0x16a/0x1f7 [<ffffffff80054f30>] sys_execve+0x36/0x4c [<ffffffff8005d4d3>] stub_execve+0x67/0xb0 Code: 48 8b 70 10 65 48 8b 3c 25 00 00 00 00 31 c9 ba 06 00 00 00 RIP [<ffffffff8012e13f>] selinux_bprm_post_apply_creds+0xad/0x440 RSP <ffff81105c9c7cb8> Version-Release number of selected component (if applicable): kernel-2.6.18-194.17.1.el5 kernel-2.6.18-238.1.1.el5 How reproducible: Regularly Steps to Reproduce: Unknown but clearly selinux is enabled.
Let's disassemble selinux_bprm_post_apply_creds() and see where we panicked: crash> dis -r ffffffff8012e13f 0xffffffff8012e092 <selinux_bprm_post_apply_creds>: push %r15 0xffffffff8012e094 <selinux_bprm_post_apply_creds+2>: push %r14 0xffffffff8012e096 <selinux_bprm_post_apply_creds+4>: push %r13 0xffffffff8012e098 <selinux_bprm_post_apply_creds+6>: push %r12 0xffffffff8012e09a <selinux_bprm_post_apply_creds+8>: push %rbp 0xffffffff8012e09b <selinux_bprm_post_apply_creds+9>: push %rbx 0xffffffff8012e09c <selinux_bprm_post_apply_creds+10>: sub $0x98,%rsp 0xffffffff8012e0a3 <selinux_bprm_post_apply_creds+17>: mov %gs:0x0,%rax 0xffffffff8012e0ac <selinux_bprm_post_apply_creds+26>: mov 0x620(%rax),%r15 0xffffffff8012e0b3 <selinux_bprm_post_apply_creds+33>: mov 0x1b8(%rdi),%rax 0xffffffff8012e0ba <selinux_bprm_post_apply_creds+40>: cmpb $0x0,0xd(%rax) 0xffffffff8012e0be <selinux_bprm_post_apply_creds+44>: je 0xffffffff8012e0d8 <selinux_bprm_post_apply_creds+70> 0xffffffff8012e0c0 <selinux_bprm_post_apply_creds+46>: mov %gs:0x0,%rsi 0xffffffff8012e0c9 <selinux_bprm_post_apply_creds+55>: mov $0x9,%edi 0xffffffff8012e0ce <selinux_bprm_post_apply_creds+60>: callq 0xffffffff80099f9d <force_sig_specific> 0xffffffff8012e0d3 <selinux_bprm_post_apply_creds+65>: jmpq 0xffffffff8012e4c0 <selinux_bprm_post_apply_creds+1070> 0xffffffff8012e0d8 <selinux_bprm_post_apply_creds+70>: mov 0x8(%r15),%eax 0xffffffff8012e0dc <selinux_bprm_post_apply_creds+74>: cmp 0xc(%r15),%eax 0xffffffff8012e0e0 <selinux_bprm_post_apply_creds+78>: je 0xffffffff8012e4c0 <selinux_bprm_post_apply_creds+1070> 0xffffffff8012e0e6 <selinux_bprm_post_apply_creds+84>: mov %gs:0x0,%rax 0xffffffff8012e0ef <selinux_bprm_post_apply_creds+93>: mov 0x598(%rax),%rax 0xffffffff8012e0f6 <selinux_bprm_post_apply_creds+100>: mov $0xffffffff803328c0,%rdi 0xffffffff8012e0fd <selinux_bprm_post_apply_creds+107>: mov %rax,0x18(%rsp) 0xffffffff8012e102 <selinux_bprm_post_apply_creds+112>: callq 0xffffffff80063af8 <mutex_lock> 0xffffffff8012e107 <selinux_bprm_post_apply_creds+117>: mov %gs:0x0,%rax 0xffffffff8012e110 <selinux_bprm_post_apply_creds+126>: mov 0x5a8(%rax),%rax 0xffffffff8012e117 <selinux_bprm_post_apply_creds+133>: mov 0xe0(%rax),%rbx 0xffffffff8012e11e <selinux_bprm_post_apply_creds+140>: test %rbx,%rbx 0xffffffff8012e121 <selinux_bprm_post_apply_creds+143>: je 0xffffffff8012e19b <selinux_bprm_post_apply_creds+265> 0xffffffff8012e123 <selinux_bprm_post_apply_creds+145>: mov $0xffffffff803f1300,%rdi 0xffffffff8012e12a <selinux_bprm_post_apply_creds+152>: callq 0xffffffff80064a85 <_spin_lock> 0xffffffff8012e12f <selinux_bprm_post_apply_creds+157>: mov 0x290(%rbx),%rax 0xffffffff8012e136 <selinux_bprm_post_apply_creds+164>: test %rax,%rax 0xffffffff8012e139 <selinux_bprm_post_apply_creds+167>: je 0xffffffff8012e191 <selinux_bprm_post_apply_creds+255> 0xffffffff8012e13b <selinux_bprm_post_apply_creds+169>: mov 0x10(%rax),%rax 0xffffffff8012e13f <selinux_bprm_post_apply_creds+173>: mov 0x10(%rax),%rsi %rax holds a bad pointer - it's 0x0001001000000000 which is clearly not a valid pointer. The assembly code above actually corresponds to this function which has been inlined: static inline void flush_unauthorized_files(struct files_struct * files) { struct avc_audit_data ad; struct file *file, *devnull = NULL; struct tty_struct *tty; struct fdtable *fdt; long j = -1; mutex_lock(&tty_mutex); tty = current->signal->tty; if (tty) { file_list_lock(); file = list_entry(tty->tty_files.next, typeof(*file), f_u.fu_list); if (file) { /* Revalidate access to controlling tty. Use inode_has_perm on the tty inode directly rather than using file_has_perm, as this particular open file may belong to another process and we are only interested in the inode-based check here. */ struct inode *inode = file->f_dentry->d_inode; ... The last two instructions were dereferencing f_dentry and then d_inode but f_dentry had the bad value. The 'tty' object is still in %rbx and we add 0x290 to get the address of the tty_files field: crash> px 0xffff811073287000+0x290 $3 = 0xffff811073287290 And then dereference it to get it's value: crash> rd -64 0xffff811073287290 ffff811073287290: ffff811073287290 .r(s.... Well that's interesting - it points to itself. This indicates the list is empty and we should not be trying to process the list entry. So what happens is the list head is interpreted as a file pointer and we get garbage: crash> file 0xffff811073287290 struct file { f_u = { fu_list = { next = 0xffff811073287290, prev = 0xffff811073287290 }, fu_rcuhead = { next = 0xffff811073287290, func = 0xffff811073287290 } }, f_dentry = 0x1001000000000, <---- bad value that causes the panic. f_vfsmnt = 0x10006d4d8, f_op = 0x0, f_count = { counter = 351151128 }, f_flags = 0, f_mode = 0, f_pos = 0, f_owner = { lock = { raw_lock = { lock = 0 } }, pid = 0, uid = 1535176704, euid = 4294934800, security = 0x0, signum = 0 }, f_uid = 0, f_gid = 0, f_ra = { start = 0, size = 0, flags = 0, cache_hit = 0, prev_page = 0, ahead_start = 0, ahead_size = 0, ra_pages = 0, mmap_hit = 0, mmap_miss = 0 }, f_version = 0, f_security = 0x0, private_data = 0x0, f_ep_links = { next = 0x0, prev = 0x0 }, f_ep_lock = { raw_lock = { slock = 0 } }, f_mapping = 0x0 } I think the quick fix here is to add a list_empty(tty->tty_files) check inside the file_list_lock(). The real question is why we've lost the controlling tty.
Upstream commit to fix this issue: commit 37dd0bd04a3240d2922786d501e2f12cec858fbf Author: Eric Paris <eparis> Date: Fri Oct 31 17:40:00 2008 -0400 SELinux: properly handle empty tty_files list SELinux has wrongly (since 2004) had an incorrect test for an empty tty->tty_files list. With an empty list selinux would be pointing to part of the tty struct itself and would then proceed to dereference that value and again dereference that result. An F10 change to plymouth on a ppc64 system is actually currently triggering this bug. This patch uses list_empty() to handle empty lists rather than looking at a meaningless location. [note, this fixes the oops reported in https://bugzilla.redhat.com/show_bug.cgi?id=469079] Signed-off-by: Eric Paris <eparis> Signed-off-by: James Morris <jmorris> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 3e3fde7..f85597a 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2126,14 +2126,16 @@ static inline void flush_unauthorized_files(struct files_struct *files) tty = get_current_tty(); if (tty) { file_list_lock(); - file = list_entry(tty->tty_files.next, typeof(*file), f_u.fu_list); - if (file) { + if (!list_empty(&tty->tty_files)) { + struct inode *inode; + /* Revalidate access to controlling tty. Use inode_has_perm on the tty inode directly rather than using file_has_perm, as this particular open file may belong to another process and we are only interested in the inode-based check here. */ - struct inode *inode = file->f_path.dentry->d_inode; + file = list_first_entry(&tty->tty_files, struct file, f_u.fu_list); + inode = file->f_path.dentry->d_inode; if (inode_has_perm(current, inode, FILE__READ | FILE__WRITE, NULL)) { drop_tty = 1;
Created attachment 476312 [details] Patch to check for empty tty_files list
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Can anyone describe how to reproduce this?
in kernel-2.6.18-245.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html