Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 3 product line. The current stable release is 3.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 160539

Summary:	[RHEL3] hidden bomb of kmap_atomic/kunmap_atomic bug?
Product:	Red Hat Enterprise Linux 3	Reporter:	Issue Tracker <tao>
Component:	kernel	Assignee:	Dave Anderson <anderson>
Status:	CLOSED ERRATA	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.0	CC:	dkl, hirosuke.shinriki, petrides, tao
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	RHSA-2006-0144	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-03-15 16:05:43 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	168424

Description Issue Tracker 2005-06-15 17:45:03 UTC

Escalated to Bugzilla from IssueTracker

Comment 23 Dave Anderson 2005-06-20 21:07:29 UTC

I don't think adding spin_lock_irqsave/restore in kmap_atomic() is going to fly.
It's worked that way since the beginning of time -- I'm sure as hell not going
to propose a spinlock be put in there!  ;-)

Anyway, this is the first time I've ever looked at this code, so help me get up
to speed.  The problem as described is in alloc_ldt() when reload is set:

static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
{
        int oldsize, newsize, i;

        if (mincount <= pc->size)
                return 0;
        /*
         * LDT got larger - reallocate if necessary.
         */
        oldsize = pc->size;
        mincount = (mincount+511)&(~511);
        newsize = mincount*LDT_ENTRY_SIZE;
        for (i = 0; i < newsize; i += PAGE_SIZE) {
                int nr = i/PAGE_SIZE;
                BUG_ON(i >= 64*1024);
                if (!pc->ldt_pages[nr]) {
                        pc->ldt_pages[nr] = alloc_page(GFP_HIGHUSER);
                        if (!pc->ldt_pages[nr])
                                return -ENOMEM;
                        clear_highpage(pc->ldt_pages[nr]);
                }
        }
        pc->size = mincount;
        if (reload) {
                load_LDT(pc);
#ifdef CONFIG_SMP
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);
#endif
        }
        return 0;
}

where:

1. alloc_ldt() called load_LDT() on cpu A

2. alloc_ldt() called load_LDT() on CPU B, which returned, and then cpu B
   issued the flush_ldt() IPI:

   static void flush_ldt(void *mm)
   {
           if (current->active_mm)
                   load_LDT(&current->active_mm->context);
   }

3. The IPI was received on cpu A while it was looping in alloc_ldt(), which
   caused it to essentially re-enter load_LDT().

Therefore, we could have a situation in load_LDT() where CPU A called
__kunmap_atomic_type() to clear a kmap pte, but got interrupted by the
flush_ldt() IPI before it did the subsequent __kmap_atomic().  The "re-entrant"
call to load_LDT() set up all the ptes, and upon completion, CPU A returned
back to the interrupted load_LDT() instance to do the __kmap_atomic() -- which
then sees the non-zero status of the kmap pte that it thought it just cleared,
causing an oops on "line 56 of the header": 

/*
 * load one particular LDT into the current CPU
 */
void load_LDT(mm_context_t *pc)
{
        struct page **pages = pc->ldt_pages;
        int count = pc->size;
        int nr_pages, i;

        if (!count) {
                pages = &default_ldt_page;
                count = 5;
        }
        nr_pages = (count*LDT_ENTRY_SIZE + PAGE_SIZE-1) / PAGE_SIZE;

        for (i = 0; i < nr_pages; i++) {
                __kunmap_atomic_type(KM_LDT_PAGE0 - i);
                __kmap_atomic(pages[i], KM_LDT_PAGE0 - i);
        }
        set_ldt_desc(smp_processor_id(),
                        (void *)__kmap_atomic_vaddr(KM_LDT_PAGE0), count);
        load_LDT_desc();
}

Since the only caller of alloc_ldt() with reload set is write_ldt(), and
write_ldt() is only called from the sys_modify_ldt(), then that call to
load_IDT() in alloc_ldt() could only be done in process context.  I'm wondering
whether it would suffice to simply surround the calls to load_LDT() with
local_irq_disable() and local_irq_enable(): 

          if (reload) {
+                 local_irq_disable();
                  load_LDT(pc);
+                 local_irq_enable();
  #ifdef CONFIG_SMP
                  if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                          smp_call_function(flush_ldt, 0, 1, 1);
  #endif
        }

Comment 24 Dave Anderson 2005-06-22 18:17:09 UTC

After unsuccessfully attempting to reproduce this problem with my own
hand-rolled multithreaded program -- with each thread doing nothing but
modify_ldt system calls (writes) -- I realized that the scenario that I
described, which presumably is the same scenario described by the reporter
below, is impossible:

cpu 1
------
sys_modify_ldt
write_ldt
alloc_ldt
smp_call_function(flush_ldt, 0, 1, 1)

cpu 2
------
sys_modify_ldt
write_ldt
alloc_ldt
load_LDT
       __kunmap_atomic_type(KM_LDT_PAGE0 - i)
IPI: flush_ldt    <------- unexpect intr
load_LDT
	__kunmap_atomic_type(KM_LDT_PAGE0 - i);
	__kmap_atomic(pages[i], KM_LDT_PAGE0 - i);	<--- the concerned entry is used
after this point
       __kmap_atomic(pages[i], KM_LDT_PAGE0 - i); <---- oops !!

Two processes with the same mm_struct cannot be operating simultaneously
in load_LDT().  The crucial section which makes its own call to load_LDT(),
and then issues the IPI, is in alloc_ldt() here:

        if (reload) {
                load_LDT(pc);
#ifdef CONFIG_SMP
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);
#endif
        }

However, alloc_ldt() can only be called by one owner at a time by
write_ldt(), because it's gated by the shared mm->context.sem semaphore:

        down(&mm->context.sem);
        if (ldt_info.entry_number >= mm->context.size) {
                error = alloc_ldt(&current->mm->context,
                                       ldt_info.entry_number+1, 1);
                if (error < 0)
                        goto out_unlock;
        }

The only other load_LDT() caller is switch_mm(), but that would be modifying
the kmap entries on another cpu.

So -- any other ideas?

Comment 25 Dave Anderson 2005-06-22 18:37:16 UTC

I have one -- I was presuming the other cpu sending the IPI was
a thread of the same process.  The true test would be to have
dissimilar processes issuing the IPIs.

The issue tracker (for some reason it didn't get forwarded to this bugzilla)
indicates:

> using save_flags/cli/restore_flags should be enough, since every cpu has
> its own local km_type.

Since by definition we have to be process context while executing alloc_ldt(),
why wouldn't local_irq_disable() and local_irq_enable() suffice?

Comment 26 Dave Anderson 2005-06-22 21:22:53 UTC

As far as reproducing this goes, regardless whether I:

  1. continuously fork new processes that do the modify_ldt() calls, or
  2. have the same set of processes spin doing modify_ldt() calls, or
  3. have a combination of the two test procedures above simultaneously running,

I can never get the IPI to be issued from alloc_ldt():

        if (reload) {
                load_LDT(pc);
#ifdef CONFIG_SMP
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);
#endif
        }

It would seem that if test scenario 2 (spinning processes) is taking place,
at least some of the test processes would find themselves switching cpus.
I can get up to 6000 load_LDT calls per second (kmap'ing all 16 ldt entries),
but can't get the IPI to take place, which is essential for the oops.

I suppose I could hack up alloc_ldt() to always do the smp_call_function(),
but I would really like to be able to reproduce this legitimately.

Comment 27 Dave Anderson 2005-06-23 19:35:40 UTC

Couple of data points...

I haven't been able to reproduce this problem with a "real" kernel
as of yet.  I've got a 4-cpu box with one single-threaded process
continously doing modify_ldt write system calls, and one 3-threaded
process with each thread doing the same.  I can see with top that
the threaded processes are quite often running simultaneously on at
least 2 or 3 cpus, while the single-threaded process is running on a
different cpu.  Recall that in order for the flush_ldt IPI to even
be issued, the issuer must be a multi-threaded process with one of
its other threads running on another cpu:

        if (reload) {
                load_LDT(pc);
#ifdef CONFIG_SMP
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);
#endif
        }

That being said, I went ahead on a 2-cpu box and kludged up a kernel 
that does the smp_call_function() above every time.  By doing that,
I was able to get a "kernel BUG in header file at line 56" oops,
as seen in the 3 oops in the supplied IT file:

https://enterprise.redhat.com/issue-tracker/download/8/850/1118463956-kmap_atomic_oopses

FWIW, here are the backtraces, where the "ldt" process on cpu 0
issued the smp_call_function() to another "ldt" process on cpu 1,
which caught it in between the __kunmap_atomic_type() and
__kmap_atomic() calls in load_LDT():

crash> bt -a
PID: 6130   TASK: d4436000  CPU: 0   COMMAND: "ldt"
 #0 [d4437ed0] smp_call_function_interrupt at c011cbcf
 #1 [d4437ed8] call_call_function_interrupt at c03feca8
    EAX: 00000000  EBX: 00000001  ECX: 00000000  EDX: 00000001  EBP: 00002000
    DS:  0068      ESI: d4436000  ES:  0068      EDI: 00000001
    CS:  0060      EIP: c011caa4  ERR: fffffffb  EFLAGS: 00000297
 #2 [d4437f0c] smp_call_function at c011caa4
 #3 [d4437f48] alloc_ldt at c0111a96
 #4 [d4437f78] write_ldt at c011242b
 #5 [d4437fac] sys_modify_ldt at c01124a3
 #6 [d4437fc0] system_call at c03fe068
    EAX: 0000007b  EBX: 00000001  ECX: bfffe6e0  EDX: 00000010
    DS:  002b      ESI: bfffe7d4  ES:  002b      EDI: 00fb0798
    SS:  002b      ESP: bfffe6b4  EBP: bfffe708
    CS:  0023      EIP: b75fe97e  ERR: 0000007b  EFLAGS: 00000296

PID: 6129   TASK: d4464000  CPU: 1   COMMAND: "ldt"
 #0 [d4465dd0] netconsole_netdump at e2dd0703
 #1 [d4465de4] try_crashdump at c0129053
 #2 [d4465df4] die at c010c6a2
 #3 [d4465e08] do_invalid_op at c010c8b2
 #4 [d4465ea8] error_code (via invalid_op) at c03fe1c0
    EAX: 00000025  EBX: fffaa000  ECX: 00000001  EDX: c038c018  EBP: c0009d50
    DS:  0068      ESI: 13df6363  ES:  0068      EDI: 00000363
    CS:  0060      EIP: c0129007  ERR: ffffffff  EFLAGS: 00010286
 #5 [d4465ee4] __out_of_line_bug at c0129007
 #6 [d4465ef0] load_LDT at c011274f
 #7 [d4465f48] alloc_ldt at c0111a37
 #8 [d4465f78] write_ldt at c011242b
 #9 [d4465fac] sys_modify_ldt at c01124a3
#10 [d4465fc0] system_call at c03fe068
    EAX: 0000007b  EBX: 00000001  ECX: bfffed00  EDX: 00000010
    DS:  002b      ESI: bfffedf4  ES:  002b      EDI: 00fb0798
    SS:  002b      ESP: bfffecd4  EBP: bfffed28
    CS:  0023      EIP: b75fe97e  ERR: 0000007b  EFLAGS: 00000292
crash>

So it's a "proof-of-concept" reproducer, since if the issuer were
to have been multi-threaded with a cohort thread running on another
cpu, the oops would have occurred.

I'll run the same test with the local_irq_disable/enable wrappers.

Comment 29 Dave Anderson 2005-07-01 19:42:02 UTC

Can this fix wait for RHEL3-U7, or should a (belated) attempt to propose
it for RHEL3-U6 be made?

Comment 30 Ernie Petrides 2005-07-21 21:16:22 UTC

Putting in NEEDINFO.  Btw, U6 is now closed.

Comment 34 Dave Anderson 2005-08-29 17:47:47 UTC

Patch posted to rhkernel-list:

http://post-office.corp.redhat.com/archives/rhkernel-list/2005-August/msg00565.html

Comment 35 Ernie Petrides 2005-09-15 04:11:15 UTC

A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.2.EL).

Comment 38 Red Hat Bugzilla 2006-03-15 16:05:43 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html