Bug 121731

Summary: (SELINUX)oops in selinux_socket_sock
Product: [Fedora] Fedora Reporter: Dan Christian <dac>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 2CC: jmorris, pfrields, sdsmall
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-07 06:24:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
text of stack trace (from serial console)
none
Stack trace from oops in 2.6.3
none
Another oops (same top level, but different call trace)
none
Another stack trace
none
yet another stack trace
none
Call trace from 3.6.5-1.327smp none

Description Dan Christian 2004-04-26 20:46:44 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2)
Gecko/20040301

Description of problem:
The kernel will oops in selinux_socket_sock after a few hours of load.

I'll attach a stack trace in a minue.

I've also reproced this under 2.6.3-1.110smp (took over 24hrs).

The hardware is a dual Xeon.  Multiple (identical) boxes act the same way.


Version-Release number of selected component (if applicable):
2.6.5-1.315smp

How reproducible:
Always

Steps to Reproduce:
1.  Boot system
2.  Run lots of mail through sendmail (with a milter filter)
3.  Wait for the oops (2-30hours)
    

Additional info:

[support@proofpoint support]$ /sbin/lsmod
Module                  Size  Used by
ipv6                  224128  14
tg3                    71044  0
ipt_REJECT              8704  1
ipt_state               5376  1
ip_conntrack           29612  1 ipt_state
iptable_filter          6016  1
ip_tables              18176  3 ipt_REJECT,ipt_state,iptable_filter
p4_clockmod             8068  0
microcode              10528  0
button                  8472  0
battery                10892  0
asus_acpi              12440  0
ac                      7436  0
ext3                   95912  3
jbd                    58008  1 ext3
megaraid               37576  4
sd_mod                 20224  5
scsi_mod               99912  2 megaraid,sd_mod

Comment 1 Dan Christian 2004-04-26 20:49:01 UTC
Created attachment 99697 [details]
text of stack trace (from serial console)

I've reproduced this multiple times under 2.6.3.  The stack traces are very
similar.

Comment 2 Dan Christian 2004-04-26 21:15:12 UTC
Created attachment 99702 [details]
Stack trace from oops in 2.6.3

Comment 3 Dan Christian 2004-04-26 22:23:27 UTC
Created attachment 99707 [details]
Another oops (same top level, but different call trace)

Comment 4 Dan Christian 2004-04-27 16:50:24 UTC
Created attachment 99717 [details]
Another stack trace

Comment 5 Dan Christian 2004-04-27 16:52:53 UTC
Created attachment 99718 [details]
yet another stack trace

I'll stop posting stack traces unless someone asks for one.
It certainly seems to be repeatable (given a few hours).

Comment 6 James Morris 2004-04-28 13:50:44 UTC
Could you please post a trace using kernel-smp-2.6.5-1.327.i686.rpm ?


Comment 7 Stephen Smalley 2004-04-28 13:55:23 UTC
Looks similar to the problem reported earlier by akpm.
Unless I misread the code, it looks like isec is null at the point of
the dereferencing of isec->sclass after the self_netif_lookup, which
implies that the socket inode was freed in the midst of sock_rcv_skb.
Suggestions:
- Add an explicit null test for isec and return with a warning.
- Take the sk_callback_lock around the use of the socket inode?
- Eliminate the need for accessing the socket inode by applying the
patch I sent a while back to allow use of sk security field for INET
sockets.

Comment 8 Dan Christian 2004-04-28 16:00:54 UTC
I have yet to see an oops on 326 or 327.  All of them have the vdso=0
boot argument.

I have 4 machines running them now.  I can take days for this to show,
I'll post an oops when I get one.




Comment 9 Dan Christian 2004-04-28 23:13:50 UTC
Created attachment 99760 [details]
Call trace from 3.6.5-1.327smp

Comment 10 James Morris 2004-04-29 01:14:34 UTC
Thanks for that, it confirms that the oops is happening at the first
dereference of the inode security field.

        isec = inode->i_security;  <--- here

        switch (isec->sclass) {
        case SECCLASS_UDP_SOCKET:
                netif_perm = NETIF__UDP_RECV;


Comment 11 James Morris 2004-05-20 15:56:50 UTC
A fix has been included in the latest Fedora kernel.  Please let us
know if this works.

Comment 13 Dave Jones 2004-12-07 06:24:30 UTC
6 months with no comment - closing.