Bug 469079 - IBM Power5 systems require selinux=0 to boot after install
Summary: IBM Power5 systems require selinux=0 to boot after install
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Eric Paris
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F10Blocker, F10FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2008-10-29 19:20 UTC by James Laska
Modified: 2013-09-02 06:28 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-11-06 16:58:01 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
objdump -d of selinux-bprm-post-apply-creds (18.42 KB, text/plain)
2008-10-30 18:11 UTC, Eric Paris
no flags Details

Description James Laska 2008-10-29 19:20:18 UTC
Description of problem:

Unable to boot F10 on an IBM JS21 system.  The system stops during boot and displays no activity.  There is no VGA on this system, and sysrq-t output can be observed at: http://fpaste.org/paste/8257

Version-Release number of selected component (if applicable):


How reproducible:
Everytime on ibm-js21-03.test.redhat.com

Steps to Reproduce:
1. Install F10 (no encrypted devices)
2. Prior to reboot, remove rhgb and quiet from yaboot.conf
3. Reboot into installed system
  
Actual results:

System starts to boot, but stops after probing disks


Expected results:

Boots into expected runlevel

Comment 1 James Laska 2008-10-29 19:57:27 UTC
Recreated on ibm-505-lp1 a power5 virtualized guest.  This output includes booting with "plymouth:debug" enabled.

http://fpaste.org/paste/8272

Comment 2 James Laska 2008-10-29 23:29:04 UTC
At the suggestion of Ray Strode, I booted with "plymouth:nolog plymouth:debug" this now shows the system in a kernel panic (full boot log available at http://fpaste.org/paste/8280):

Unable to handle kernel paging request for data at address 0xfffb70b7
Faulting instruction address: 0xc0000000001fee54
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in: ipr
NIP: c0000000001fee54 LR: c0000000001fee30 CTR: c0000000001fed7c
REGS: c0000000f4053600 TRAP: 0300   Not tainted  (2.6.27.4-58.fc10.ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24002488  XER: 20000001
DAR: 00000000fffb70b7, DSISR: 0000000040000000
TASK = c0000000f404c000[1] 'init' THREAD: c0000000f4050000 CPU: 0
GPR00: c0000000001fee30 c0000000f4053880 c0000000008f6b00 c0000000f404c000 
GPR04: 0000000000000058 0000000000000006 0000000000000000 c0000000f40538c0 
GPR08: 0000000000000000 00000000fffb70a7 0000000000d43000 0000000000000000 
GPR12: 0000000024002482 c00000000092f400 c0000000ed427b90 0000000000000001 
GPR16: c0000000edc383b4 c0000000ed467b00 c0000000ed467c00 00000000f7fd6cb4 
GPR20: 00000000f7fbe000 c0000000ed467800 0000000000000000 c0000000f4054000 
GPR24: c0000000f4803160 00000000f7fbe000 0000000000000000 00000000f7ffe648 
GPR28: 000000000003f4b0 0000000000000001 c000000000897918 c0000000ee760800 
NIP [c0000000001fee54] .selinux_bprm_post_apply_creds+0xd8/0x554
LR [c0000000001fee30] .selinux_bprm_post_apply_creds+0xb4/0x554
Call Trace:
[c0000000f4053880] [c0000000001fee30] .selinux_bprm_post_apply_creds+0xb4/0x554 (unreliable)
[c0000000f40539d0] [c0000000001f0948] .security_bprm_post_apply_creds+0x38/0x50
[c0000000f4053a50] [c000000000142e54] .compute_creds+0xf8/0x114
[c0000000f4053ae0] [c00000000018f74c] .load_elf_binary+0xf10/0x1690
[c0000000f4053c20] [c000000000142b28] .search_binary_handler+0x124/0x358
[c0000000f4053ce0] [c000000000181a0c] .compat_do_execve+0x180/0x24c
[c0000000f4053d90] [c000000000015668] .compat_sys_execve+0x74/0xb0
[c0000000f4053e30] [c000000000008770] syscall_exit+0x0/0x40
Instruction dump:
4182006c e87e8278 4836257d 60000000 e93f01e0 2fa90000 419e0028 e86d01b0 
e9290018 38a00006 38c00000 3ba00001 <e8890010> 4bffa689 2fa30000 409e0008 
---[ end trace f0a5452ca0e0233e ]---

Comment 3 James Laska 2008-10-29 23:38:47 UTC
As the previous kernel suggests, there must be something floating around in selinux land ... booting with "selinux=0" resolves the issue.

Comment 4 Stephen Smalley 2008-10-30 16:56:49 UTC
Can you disassemble the instruction dump?

Comment 5 Eric Paris 2008-10-30 18:03:30 UTC
Quick first pokes:

[root@ibm-505-lp1 2.6.27.4-58.fc10.ppc64]# addr2line --exe=vmlinux --inline 0xc0000000001fee54
/usr/src/debug/kernel-2.6.27/linux-2.6.27.ppc64/security/selinux/hooks.c:2135
/usr/src/debug/kernel-2.6.27/linux-2.6.27.ppc64/security/selinux/hooks.c:2281

inside flush_unauthorized_files()

2125  if (tty) {
2126      file_list_lock();
2127      file = list_entry(tty->tty_files.next, typeof(*file), f_u.fu_list);
2128      if (file) {
2129              /* Revalidate access to controlling tty.
2130                 Use inode_has_perm on the tty inode directly rather
2131                 than using file_has_perm, as this particular open
2132                 file may belong to another process and we are only
2133                 interested in the inode-based check here. */
2134              struct inode *inode = file->f_path.dentry->d_inode;
2135              if (inode_has_perm(current, inode,
2136                                 FILE__READ | FILE__WRITE, NULL)) {
2137                      drop_tty = 1;
2138              }
2139      }
2140      file_list_unlock();
2141  }

Comment 6 Eric Paris 2008-10-30 18:11:18 UTC
Created attachment 321970 [details]
objdump -d of selinux-bprm-post-apply-creds

Comment 7 Eric Paris 2008-10-30 18:58:12 UTC
Obviously I really need to do some looking, but can tty->tty_files be empty?  can list_entry really return a NULL value?  I thought, list_entry basically just pointed backwards at memory from .next by some offset...

Seems to be what we really meant was

if (!list_empty(tty->tty_files))
   file = list_first_entry(tty->tty_files, struct file, f_u.fu_list)

have we just always had non-empty tty_files list and on this platform we have an empty one?  Or has 'that place in tty where file points' after the whole list_entry thing, just been 0 on every platform that has mattered so far?

I could be way off, but first look, this doesn't seem right...

Comment 8 Stephen Smalley 2008-10-30 20:48:59 UTC
Looks like you're right.
It has been that way since the tty revalidation was merged in 2004.

Comment 9 Eric Paris 2008-10-30 21:24:24 UTC
I managed to boot a kernel with the patch I describe above, but I didn't see the printk I expected in the list_empty() case.  I'll keep looking to make sure this was the real problem....

Comment 10 Eric Paris 2008-10-30 22:05:24 UTC
SUCCESS!

type=1404 audit(1225402059.983:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1225402060.230:3): policy loaded auid=4294967295 ses=4294967295
[main.c]                                    on_newroot:new root mounted at "/sysroot", switching to it
[./plugin.c]                                on_boot_output:writing 'Switching to new root and running init.
' to all windows (41 bytes)
inside flush_unauthorized_files with tty->tty_files empty
		Welcome to Fedora 
		Press 'I' to enter interactive startup.


No idea why, maybe we should figure that out?   but tty->tty_files is empty and I was able to boot without a problem....

Comment 11 Stephen Smalley 2008-10-31 13:49:32 UTC
It seems like a legal case to me for tty_files to be empty, and a (longstanding) bug in SELinux that we didn't handle it correctly in the first place.
Can you trigger it by closing all references to a given tty and then exec'ing a domain-changing program?  Although I suppose the caller might hold a reference and thus it is difficult to force it to occur with an actual revoke-style operation.

Comment 12 Eric Paris 2008-11-03 14:53:17 UTC
Checked a fix in to the devel branch.  Upstream: 37dd0bd04a3240d2922786d501e2f12cec858fbf

Comment 13 Tom "spot" Callaway 2008-11-06 16:58:01 UTC
Fixed in 2.6.27.4-76+


Note You need to log in before you can comment on or make changes to this bug.