Bug 160539
| Summary: | [RHEL3] hidden bomb of kmap_atomic/kunmap_atomic bug? | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Issue Tracker <tao> |
| Component: | kernel | Assignee: | Dave Anderson <anderson> |
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.0 | CC: | dkl, hirosuke.shinriki, petrides, tao |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2006-03-15 16:05:43 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 168424 | ||
|
Description
Issue Tracker
2005-06-15 17:45:03 UTC
I don't think adding spin_lock_irqsave/restore in kmap_atomic() is going to fly.
It's worked that way since the beginning of time -- I'm sure as hell not going
to propose a spinlock be put in there! ;-)
Anyway, this is the first time I've ever looked at this code, so help me get up
to speed. The problem as described is in alloc_ldt() when reload is set:
static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
{
int oldsize, newsize, i;
if (mincount <= pc->size)
return 0;
/*
* LDT got larger - reallocate if necessary.
*/
oldsize = pc->size;
mincount = (mincount+511)&(~511);
newsize = mincount*LDT_ENTRY_SIZE;
for (i = 0; i < newsize; i += PAGE_SIZE) {
int nr = i/PAGE_SIZE;
BUG_ON(i >= 64*1024);
if (!pc->ldt_pages[nr]) {
pc->ldt_pages[nr] = alloc_page(GFP_HIGHUSER);
if (!pc->ldt_pages[nr])
return -ENOMEM;
clear_highpage(pc->ldt_pages[nr]);
}
}
pc->size = mincount;
if (reload) {
load_LDT(pc);
#ifdef CONFIG_SMP
if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
smp_call_function(flush_ldt, 0, 1, 1);
#endif
}
return 0;
}
where:
1. alloc_ldt() called load_LDT() on cpu A
2. alloc_ldt() called load_LDT() on CPU B, which returned, and then cpu B
issued the flush_ldt() IPI:
static void flush_ldt(void *mm)
{
if (current->active_mm)
load_LDT(¤t->active_mm->context);
}
3. The IPI was received on cpu A while it was looping in alloc_ldt(), which
caused it to essentially re-enter load_LDT().
Therefore, we could have a situation in load_LDT() where CPU A called
__kunmap_atomic_type() to clear a kmap pte, but got interrupted by the
flush_ldt() IPI before it did the subsequent __kmap_atomic(). The "re-entrant"
call to load_LDT() set up all the ptes, and upon completion, CPU A returned
back to the interrupted load_LDT() instance to do the __kmap_atomic() -- which
then sees the non-zero status of the kmap pte that it thought it just cleared,
causing an oops on "line 56 of the header":
/*
* load one particular LDT into the current CPU
*/
void load_LDT(mm_context_t *pc)
{
struct page **pages = pc->ldt_pages;
int count = pc->size;
int nr_pages, i;
if (!count) {
pages = &default_ldt_page;
count = 5;
}
nr_pages = (count*LDT_ENTRY_SIZE + PAGE_SIZE-1) / PAGE_SIZE;
for (i = 0; i < nr_pages; i++) {
__kunmap_atomic_type(KM_LDT_PAGE0 - i);
__kmap_atomic(pages[i], KM_LDT_PAGE0 - i);
}
set_ldt_desc(smp_processor_id(),
(void *)__kmap_atomic_vaddr(KM_LDT_PAGE0), count);
load_LDT_desc();
}
Since the only caller of alloc_ldt() with reload set is write_ldt(), and
write_ldt() is only called from the sys_modify_ldt(), then that call to
load_IDT() in alloc_ldt() could only be done in process context. I'm wondering
whether it would suffice to simply surround the calls to load_LDT() with
local_irq_disable() and local_irq_enable():
if (reload) {
+ local_irq_disable();
load_LDT(pc);
+ local_irq_enable();
#ifdef CONFIG_SMP
if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
smp_call_function(flush_ldt, 0, 1, 1);
#endif
}
After unsuccessfully attempting to reproduce this problem with my own
hand-rolled multithreaded program -- with each thread doing nothing but
modify_ldt system calls (writes) -- I realized that the scenario that I
described, which presumably is the same scenario described by the reporter
below, is impossible:
cpu 1
------
sys_modify_ldt
write_ldt
alloc_ldt
smp_call_function(flush_ldt, 0, 1, 1)
cpu 2
------
sys_modify_ldt
write_ldt
alloc_ldt
load_LDT
__kunmap_atomic_type(KM_LDT_PAGE0 - i)
IPI: flush_ldt <------- unexpect intr
load_LDT
__kunmap_atomic_type(KM_LDT_PAGE0 - i);
__kmap_atomic(pages[i], KM_LDT_PAGE0 - i); <--- the concerned entry is used
after this point
__kmap_atomic(pages[i], KM_LDT_PAGE0 - i); <---- oops !!
Two processes with the same mm_struct cannot be operating simultaneously
in load_LDT(). The crucial section which makes its own call to load_LDT(),
and then issues the IPI, is in alloc_ldt() here:
if (reload) {
load_LDT(pc);
#ifdef CONFIG_SMP
if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
smp_call_function(flush_ldt, 0, 1, 1);
#endif
}
However, alloc_ldt() can only be called by one owner at a time by
write_ldt(), because it's gated by the shared mm->context.sem semaphore:
down(&mm->context.sem);
if (ldt_info.entry_number >= mm->context.size) {
error = alloc_ldt(¤t->mm->context,
ldt_info.entry_number+1, 1);
if (error < 0)
goto out_unlock;
}
The only other load_LDT() caller is switch_mm(), but that would be modifying
the kmap entries on another cpu.
So -- any other ideas?
I have one -- I was presuming the other cpu sending the IPI was
a thread of the same process. The true test would be to have
dissimilar processes issuing the IPIs.
The issue tracker (for some reason it didn't get forwarded to this bugzilla)
indicates:
> using save_flags/cli/restore_flags should be enough, since every cpu has
> its own local km_type.
Since by definition we have to be process context while executing alloc_ldt(),
why wouldn't local_irq_disable() and local_irq_enable() suffice?
As far as reproducing this goes, regardless whether I:
1. continuously fork new processes that do the modify_ldt() calls, or
2. have the same set of processes spin doing modify_ldt() calls, or
3. have a combination of the two test procedures above simultaneously running,
I can never get the IPI to be issued from alloc_ldt():
if (reload) {
load_LDT(pc);
#ifdef CONFIG_SMP
if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
smp_call_function(flush_ldt, 0, 1, 1);
#endif
}
It would seem that if test scenario 2 (spinning processes) is taking place,
at least some of the test processes would find themselves switching cpus.
I can get up to 6000 load_LDT calls per second (kmap'ing all 16 ldt entries),
but can't get the IPI to take place, which is essential for the oops.
I suppose I could hack up alloc_ldt() to always do the smp_call_function(),
but I would really like to be able to reproduce this legitimately.
Couple of data points...
I haven't been able to reproduce this problem with a "real" kernel
as of yet. I've got a 4-cpu box with one single-threaded process
continously doing modify_ldt write system calls, and one 3-threaded
process with each thread doing the same. I can see with top that
the threaded processes are quite often running simultaneously on at
least 2 or 3 cpus, while the single-threaded process is running on a
different cpu. Recall that in order for the flush_ldt IPI to even
be issued, the issuer must be a multi-threaded process with one of
its other threads running on another cpu:
if (reload) {
load_LDT(pc);
#ifdef CONFIG_SMP
if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
smp_call_function(flush_ldt, 0, 1, 1);
#endif
}
That being said, I went ahead on a 2-cpu box and kludged up a kernel
that does the smp_call_function() above every time. By doing that,
I was able to get a "kernel BUG in header file at line 56" oops,
as seen in the 3 oops in the supplied IT file:
https://enterprise.redhat.com/issue-tracker/download/8/850/1118463956-kmap_atomic_oopses
FWIW, here are the backtraces, where the "ldt" process on cpu 0
issued the smp_call_function() to another "ldt" process on cpu 1,
which caught it in between the __kunmap_atomic_type() and
__kmap_atomic() calls in load_LDT():
crash> bt -a
PID: 6130 TASK: d4436000 CPU: 0 COMMAND: "ldt"
#0 [d4437ed0] smp_call_function_interrupt at c011cbcf
#1 [d4437ed8] call_call_function_interrupt at c03feca8
EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00000001 EBP: 00002000
DS: 0068 ESI: d4436000 ES: 0068 EDI: 00000001
CS: 0060 EIP: c011caa4 ERR: fffffffb EFLAGS: 00000297
#2 [d4437f0c] smp_call_function at c011caa4
#3 [d4437f48] alloc_ldt at c0111a96
#4 [d4437f78] write_ldt at c011242b
#5 [d4437fac] sys_modify_ldt at c01124a3
#6 [d4437fc0] system_call at c03fe068
EAX: 0000007b EBX: 00000001 ECX: bfffe6e0 EDX: 00000010
DS: 002b ESI: bfffe7d4 ES: 002b EDI: 00fb0798
SS: 002b ESP: bfffe6b4 EBP: bfffe708
CS: 0023 EIP: b75fe97e ERR: 0000007b EFLAGS: 00000296
PID: 6129 TASK: d4464000 CPU: 1 COMMAND: "ldt"
#0 [d4465dd0] netconsole_netdump at e2dd0703
#1 [d4465de4] try_crashdump at c0129053
#2 [d4465df4] die at c010c6a2
#3 [d4465e08] do_invalid_op at c010c8b2
#4 [d4465ea8] error_code (via invalid_op) at c03fe1c0
EAX: 00000025 EBX: fffaa000 ECX: 00000001 EDX: c038c018 EBP: c0009d50
DS: 0068 ESI: 13df6363 ES: 0068 EDI: 00000363
CS: 0060 EIP: c0129007 ERR: ffffffff EFLAGS: 00010286
#5 [d4465ee4] __out_of_line_bug at c0129007
#6 [d4465ef0] load_LDT at c011274f
#7 [d4465f48] alloc_ldt at c0111a37
#8 [d4465f78] write_ldt at c011242b
#9 [d4465fac] sys_modify_ldt at c01124a3
#10 [d4465fc0] system_call at c03fe068
EAX: 0000007b EBX: 00000001 ECX: bfffed00 EDX: 00000010
DS: 002b ESI: bfffedf4 ES: 002b EDI: 00fb0798
SS: 002b ESP: bfffecd4 EBP: bfffed28
CS: 0023 EIP: b75fe97e ERR: 0000007b EFLAGS: 00000292
crash>
So it's a "proof-of-concept" reproducer, since if the issuer were
to have been multi-threaded with a cohort thread running on another
cpu, the oops would have occurred.
I'll run the same test with the local_irq_disable/enable wrappers.
Can this fix wait for RHEL3-U7, or should a (belated) attempt to propose it for RHEL3-U6 be made? Putting in NEEDINFO. Btw, U6 is now closed. Patch posted to rhkernel-list: http://post-office.corp.redhat.com/archives/rhkernel-list/2005-August/msg00565.html A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.2.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html |