Bug 444653

Summary: oops in free_uid when using smbd
Product: Red Hat Enterprise Linux 4 Reporter: Mike Snitzer <snitzer>
Component: kernelAssignee: Michal Schmidt <mschmidt>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.5.zCC: jlayton, jwest, peterm, tao, vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-22 17:19:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 391511, 461297    

Description Mike Snitzer 2008-04-29 19:26:23 UTC
Description of problem:
When smbd is under heavy load the RHEL4.5 kernel (and likely all current RHEL
kernels, RHEL5 included) will eventually hit a race that causes free_uid to NULL
pointer.

Version-Release number of selected component (if applicable):
2.6.9-55.0.12.ELsmp

How reproducible:
In production environments that make heavy use of smbd; the issue has hit 4
times in the past 2 weeks.

Steps to Reproduce:
1. run RHEL4.5 kernel
2. put heavy load on samba with many users
3. eventually you'll lose this race
  
Actual results:
Unable to handle kernel paging request at 0000000000100108 RIP: 
<ffffffff801411f7>{free_uid+45}
...
Process smbd (pid: 2227, threadinfo 0000010038b72000, task 0000010087814030)
Stack: 0000000000000000 0000000000000002 0000010237e066e8 ffffffff801419c9 
       0000000000000000 0000010038b73e78 0000010087814030 0000010087814708 
       0000010038b73f58 ffffffff80141a7e 
Call Trace:<ffffffff801419c9>{__dequeue_signal+347}
           <ffffffff80141a7e>{dequeue_signal+58} 
           <ffffffff801435ca>{get_signal_to_deliver+338}
           <ffffffff8010f6fb>{do_signal+131} 
           <ffffffff8030c8f6>{thread_return+88}
           <ffffffff801102f3>{sysret_signal+28} 
           <ffffffff801105df>{ptregscall_common+103} 

Code: 48 89 50 08 48 89 02 48 c7 41 08 00 02 20 00 48 8b 7b 38 48 
RIP <ffffffff801411f7>{free_uid+45} RSP <0000010038b73d98>
CR2: 0000000000100108

Expected results:
No NULL pointer.

Additional info:
Linus Torvalds fixed the issue upstream in 2.6.19-rc4:
http://lkml.org/lkml/2006/11/4/45
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=45c18b0

RHEL5 also doesn't have this fix and it should.

Comment 1 Mike Snitzer 2008-04-29 19:31:48 UTC
I mistakenly said "NULL pointer" in a couple places where I should've said "Oops"

Comment 2 Issue Tracker 2008-07-15 16:41:59 UTC
(just triaging)

99% sure this is the same problem reported in IT 173279 / BZ 441282.  A
hotfix kernel was released for this issue last week and the 4.7 kernel
will also have the fix.

Hotfix # is 2756.

--vince


This event sent from IssueTracker by vincew 
 issue 191745

Comment 5 RHEL Program Management 2008-09-03 12:56:13 UTC
Updating PM score.

Comment 6 Peter Martuccelli 2008-09-22 17:19:45 UTC

*** This bug has been marked as a duplicate of bug 441282 ***