223134 – write_ldt() race leads to BUG_ON() panic in __kmap_atomic()

Bug 223134 - write_ldt() race leads to BUG_ON() panic in __kmap_atomic()

Summary: write_ldt() race leads to BUG_ON() panic in __kmap_atomic()

Keywords:
Status:	CLOSED DUPLICATE of bug 223851
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Jerome Marchand
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-01-17 22:45 UTC by Kurtis D. Rader
Modified:	2007-11-17 01:14 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-02-13 15:47:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Kurtis D. Rader 2007-01-17 22:45:21 UTC

Description of problem:

kernel BUG at include/asm/atomic_kmap.h:60!

Version-Release number of selected component (if applicable):

RHEL4.4 kernel 2.6.9-42.0.3.ELsmp i686

How reproducible:

Four panics in two months

Steps to Reproduce:

This is a very subtle, difficult to hit, race. I have dumps from the two most 
recent failures (the first two dumps are no longer available). In both cases we 
have two new, independent, tasks created within one millisecond of each other. 
One task has just gone to sleep in io_schedule(). The other is on-proc 
executing a modify_ldt() syscall. All other CPUs are idle.

The crux of the problem is that when one CPU executes

    smp_call_function(flush_ldt, NULL, 1, 1);

in alloc_ldt() there is no guarantee any of the other CPUs will still be 
running another thread of the same process. But more to the point, any of the 
other CPUs could be in the middle of a write_ldt() -> alloc_ldt() -> 
load_LDT_nolock() -> __kunmap_atomic_type() sequence for a different mm 
context. The resulting recursive load_LDT() call (via the 
IPI flush_ldt() function call) can trigger the assertion in 
__kunmap_atomic_type(). The disabling of preemption doesn't help since the IPI 
can still be delivered.

Additional info:

This is the same as RHbug#160539 (which was against RHEL3).

It appears to me that the preempt_enable()/preempt_disable() in alloc_ldt() 
bounding the "if(reload){" block should be replaced with local_irq_disable()/
_enable() calls. That will both inhibit preemption and prevent the recursive 
load_LDT() call.

Comment 1 Jerome Marchand 2007-02-13 15:47:17 UTC


*** This bug has been marked as a duplicate of 223851 ***

Note You need to log in before you can comment on or make changes to this bug.