Bug 454494

Summary: oops in free_uid when using smbd
Product: Red Hat Enterprise Linux 5 Reporter: Guy Streeter <streeter>
Component: kernelAssignee: Anton Arapov <anton>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.1CC: arozansk, dzickus, nobody, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-21 15:34:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed patch none

Description Guy Streeter 2008-07-08 19:26:26 UTC
+++ This bug was initially created as a clone of Bug #444653 +++

Description of problem:
When smbd is under heavy load the RHEL4.5 kernel (and likely all current RHEL
kernels, RHEL5 included) will eventually hit a race that causes free_uid to NULL
pointer.

Version-Release number of selected component (if applicable):
2.6.9-55.0.12.ELsmp

How reproducible:
In production environments that make heavy use of smbd; the issue has hit 4
times in the past 2 weeks.

Steps to Reproduce:
1. run RHEL4.5 kernel
2. put heavy load on samba with many users
3. eventually you'll lose this race
  
Actual results:
Unable to handle kernel paging request at 0000000000100108 RIP: 
<ffffffff801411f7>{free_uid+45}
...
Process smbd (pid: 2227, threadinfo 0000010038b72000, task 0000010087814030)
Stack: 0000000000000000 0000000000000002 0000010237e066e8 ffffffff801419c9 
       0000000000000000 0000010038b73e78 0000010087814030 0000010087814708 
       0000010038b73f58 ffffffff80141a7e 
Call Trace:<ffffffff801419c9>{__dequeue_signal+347}
           <ffffffff80141a7e>{dequeue_signal+58} 
           <ffffffff801435ca>{get_signal_to_deliver+338}
           <ffffffff8010f6fb>{do_signal+131} 
           <ffffffff8030c8f6>{thread_return+88}
           <ffffffff801102f3>{sysret_signal+28} 
           <ffffffff801105df>{ptregscall_common+103} 

Code: 48 89 50 08 48 89 02 48 c7 41 08 00 02 20 00 48 8b 7b 38 48 
RIP <ffffffff801411f7>{free_uid+45} RSP <0000010038b73d98>
CR2: 0000000000100108

Expected results:
No NULL pointer.

Additional info:
Linus Torvalds fixed the issue upstream in 2.6.19-rc4:
http://lkml.org/lkml/2006/11/4/45
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=45c18b0

RHEL5 also doesn't have this fix and it should.

-- Additional comment from snitzer on 2008-04-29 15:31 EST --
I mistakenly said "NULL pointer" in a couple places where I should've said "Oops"

Comment 2 RHEL Program Management 2008-07-09 20:41:06 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Anton Arapov 2008-07-09 21:00:07 UTC
Created attachment 311421 [details]
proposed patch

Comment 7 Vince Worthington 2008-07-21 15:34:26 UTC
Per Don Zickus this is a dupe of 441762 which is committed as of -98.EL, will go
ahead and mark this a dupe.

--vince

*** This bug has been marked as a duplicate of 441762 ***