Red Hat Bugzilla – Bug 160539
[RHEL3] hidden bomb of kmap_atomic/kunmap_atomic bug?
Last modified: 2008-10-15 13:24:04 EDT
Escalated to Bugzilla from IssueTracker
I don't think adding spin_lock_irqsave/restore in kmap_atomic() is going to fly. It's worked that way since the beginning of time -- I'm sure as hell not going to propose a spinlock be put in there! ;-) Anyway, this is the first time I've ever looked at this code, so help me get up to speed. The problem as described is in alloc_ldt() when reload is set: static int alloc_ldt(mm_context_t *pc, int mincount, int reload) { int oldsize, newsize, i; if (mincount <= pc->size) return 0; /* * LDT got larger - reallocate if necessary. */ oldsize = pc->size; mincount = (mincount+511)&(~511); newsize = mincount*LDT_ENTRY_SIZE; for (i = 0; i < newsize; i += PAGE_SIZE) { int nr = i/PAGE_SIZE; BUG_ON(i >= 64*1024); if (!pc->ldt_pages[nr]) { pc->ldt_pages[nr] = alloc_page(GFP_HIGHUSER); if (!pc->ldt_pages[nr]) return -ENOMEM; clear_highpage(pc->ldt_pages[nr]); } } pc->size = mincount; if (reload) { load_LDT(pc); #ifdef CONFIG_SMP if (current->mm->cpu_vm_mask != (1<<smp_processor_id())) smp_call_function(flush_ldt, 0, 1, 1); #endif } return 0; } where: 1. alloc_ldt() called load_LDT() on cpu A 2. alloc_ldt() called load_LDT() on CPU B, which returned, and then cpu B issued the flush_ldt() IPI: static void flush_ldt(void *mm) { if (current->active_mm) load_LDT(¤t->active_mm->context); } 3. The IPI was received on cpu A while it was looping in alloc_ldt(), which caused it to essentially re-enter load_LDT(). Therefore, we could have a situation in load_LDT() where CPU A called __kunmap_atomic_type() to clear a kmap pte, but got interrupted by the flush_ldt() IPI before it did the subsequent __kmap_atomic(). The "re-entrant" call to load_LDT() set up all the ptes, and upon completion, CPU A returned back to the interrupted load_LDT() instance to do the __kmap_atomic() -- which then sees the non-zero status of the kmap pte that it thought it just cleared, causing an oops on "line 56 of the header": /* * load one particular LDT into the current CPU */ void load_LDT(mm_context_t *pc) { struct page **pages = pc->ldt_pages; int count = pc->size; int nr_pages, i; if (!count) { pages = &default_ldt_page; count = 5; } nr_pages = (count*LDT_ENTRY_SIZE + PAGE_SIZE-1) / PAGE_SIZE; for (i = 0; i < nr_pages; i++) { __kunmap_atomic_type(KM_LDT_PAGE0 - i); __kmap_atomic(pages[i], KM_LDT_PAGE0 - i); } set_ldt_desc(smp_processor_id(), (void *)__kmap_atomic_vaddr(KM_LDT_PAGE0), count); load_LDT_desc(); } Since the only caller of alloc_ldt() with reload set is write_ldt(), and write_ldt() is only called from the sys_modify_ldt(), then that call to load_IDT() in alloc_ldt() could only be done in process context. I'm wondering whether it would suffice to simply surround the calls to load_LDT() with local_irq_disable() and local_irq_enable(): if (reload) { + local_irq_disable(); load_LDT(pc); + local_irq_enable(); #ifdef CONFIG_SMP if (current->mm->cpu_vm_mask != (1<<smp_processor_id())) smp_call_function(flush_ldt, 0, 1, 1); #endif }
After unsuccessfully attempting to reproduce this problem with my own hand-rolled multithreaded program -- with each thread doing nothing but modify_ldt system calls (writes) -- I realized that the scenario that I described, which presumably is the same scenario described by the reporter below, is impossible: cpu 1 ------ sys_modify_ldt write_ldt alloc_ldt smp_call_function(flush_ldt, 0, 1, 1) cpu 2 ------ sys_modify_ldt write_ldt alloc_ldt load_LDT __kunmap_atomic_type(KM_LDT_PAGE0 - i) IPI: flush_ldt <------- unexpect intr load_LDT __kunmap_atomic_type(KM_LDT_PAGE0 - i); __kmap_atomic(pages[i], KM_LDT_PAGE0 - i); <--- the concerned entry is used after this point __kmap_atomic(pages[i], KM_LDT_PAGE0 - i); <---- oops !! Two processes with the same mm_struct cannot be operating simultaneously in load_LDT(). The crucial section which makes its own call to load_LDT(), and then issues the IPI, is in alloc_ldt() here: if (reload) { load_LDT(pc); #ifdef CONFIG_SMP if (current->mm->cpu_vm_mask != (1<<smp_processor_id())) smp_call_function(flush_ldt, 0, 1, 1); #endif } However, alloc_ldt() can only be called by one owner at a time by write_ldt(), because it's gated by the shared mm->context.sem semaphore: down(&mm->context.sem); if (ldt_info.entry_number >= mm->context.size) { error = alloc_ldt(¤t->mm->context, ldt_info.entry_number+1, 1); if (error < 0) goto out_unlock; } The only other load_LDT() caller is switch_mm(), but that would be modifying the kmap entries on another cpu. So -- any other ideas?
I have one -- I was presuming the other cpu sending the IPI was a thread of the same process. The true test would be to have dissimilar processes issuing the IPIs. The issue tracker (for some reason it didn't get forwarded to this bugzilla) indicates: > using save_flags/cli/restore_flags should be enough, since every cpu has > its own local km_type. Since by definition we have to be process context while executing alloc_ldt(), why wouldn't local_irq_disable() and local_irq_enable() suffice?
As far as reproducing this goes, regardless whether I: 1. continuously fork new processes that do the modify_ldt() calls, or 2. have the same set of processes spin doing modify_ldt() calls, or 3. have a combination of the two test procedures above simultaneously running, I can never get the IPI to be issued from alloc_ldt(): if (reload) { load_LDT(pc); #ifdef CONFIG_SMP if (current->mm->cpu_vm_mask != (1<<smp_processor_id())) smp_call_function(flush_ldt, 0, 1, 1); #endif } It would seem that if test scenario 2 (spinning processes) is taking place, at least some of the test processes would find themselves switching cpus. I can get up to 6000 load_LDT calls per second (kmap'ing all 16 ldt entries), but can't get the IPI to take place, which is essential for the oops. I suppose I could hack up alloc_ldt() to always do the smp_call_function(), but I would really like to be able to reproduce this legitimately.
Couple of data points... I haven't been able to reproduce this problem with a "real" kernel as of yet. I've got a 4-cpu box with one single-threaded process continously doing modify_ldt write system calls, and one 3-threaded process with each thread doing the same. I can see with top that the threaded processes are quite often running simultaneously on at least 2 or 3 cpus, while the single-threaded process is running on a different cpu. Recall that in order for the flush_ldt IPI to even be issued, the issuer must be a multi-threaded process with one of its other threads running on another cpu: if (reload) { load_LDT(pc); #ifdef CONFIG_SMP if (current->mm->cpu_vm_mask != (1<<smp_processor_id())) smp_call_function(flush_ldt, 0, 1, 1); #endif } That being said, I went ahead on a 2-cpu box and kludged up a kernel that does the smp_call_function() above every time. By doing that, I was able to get a "kernel BUG in header file at line 56" oops, as seen in the 3 oops in the supplied IT file: https://enterprise.redhat.com/issue-tracker/download/8/850/1118463956-kmap_atomic_oopses FWIW, here are the backtraces, where the "ldt" process on cpu 0 issued the smp_call_function() to another "ldt" process on cpu 1, which caught it in between the __kunmap_atomic_type() and __kmap_atomic() calls in load_LDT(): crash> bt -a PID: 6130 TASK: d4436000 CPU: 0 COMMAND: "ldt" #0 [d4437ed0] smp_call_function_interrupt at c011cbcf #1 [d4437ed8] call_call_function_interrupt at c03feca8 EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00000001 EBP: 00002000 DS: 0068 ESI: d4436000 ES: 0068 EDI: 00000001 CS: 0060 EIP: c011caa4 ERR: fffffffb EFLAGS: 00000297 #2 [d4437f0c] smp_call_function at c011caa4 #3 [d4437f48] alloc_ldt at c0111a96 #4 [d4437f78] write_ldt at c011242b #5 [d4437fac] sys_modify_ldt at c01124a3 #6 [d4437fc0] system_call at c03fe068 EAX: 0000007b EBX: 00000001 ECX: bfffe6e0 EDX: 00000010 DS: 002b ESI: bfffe7d4 ES: 002b EDI: 00fb0798 SS: 002b ESP: bfffe6b4 EBP: bfffe708 CS: 0023 EIP: b75fe97e ERR: 0000007b EFLAGS: 00000296 PID: 6129 TASK: d4464000 CPU: 1 COMMAND: "ldt" #0 [d4465dd0] netconsole_netdump at e2dd0703 #1 [d4465de4] try_crashdump at c0129053 #2 [d4465df4] die at c010c6a2 #3 [d4465e08] do_invalid_op at c010c8b2 #4 [d4465ea8] error_code (via invalid_op) at c03fe1c0 EAX: 00000025 EBX: fffaa000 ECX: 00000001 EDX: c038c018 EBP: c0009d50 DS: 0068 ESI: 13df6363 ES: 0068 EDI: 00000363 CS: 0060 EIP: c0129007 ERR: ffffffff EFLAGS: 00010286 #5 [d4465ee4] __out_of_line_bug at c0129007 #6 [d4465ef0] load_LDT at c011274f #7 [d4465f48] alloc_ldt at c0111a37 #8 [d4465f78] write_ldt at c011242b #9 [d4465fac] sys_modify_ldt at c01124a3 #10 [d4465fc0] system_call at c03fe068 EAX: 0000007b EBX: 00000001 ECX: bfffed00 EDX: 00000010 DS: 002b ESI: bfffedf4 ES: 002b EDI: 00fb0798 SS: 002b ESP: bfffecd4 EBP: bfffed28 CS: 0023 EIP: b75fe97e ERR: 0000007b EFLAGS: 00000292 crash> So it's a "proof-of-concept" reproducer, since if the issuer were to have been multi-threaded with a cohort thread running on another cpu, the oops would have occurred. I'll run the same test with the local_irq_disable/enable wrappers.
Can this fix wait for RHEL3-U7, or should a (belated) attempt to propose it for RHEL3-U6 be made?
Putting in NEEDINFO. Btw, U6 is now closed.
Patch posted to rhkernel-list: http://post-office.corp.redhat.com/archives/rhkernel-list/2005-August/msg00565.html
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.2.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html