Bug 450094

Summary: Patch for bug 360281 "Odd behaviour in mmap" introduces regression
Product: Red Hat Enterprise Linux 4 Reporter: Vitaly Mayatskikh <vmayatsk>
Component: kernelAssignee: Vitaly Mayatskikh <vmayatsk>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: urgent    
Version: 4.7CC: ahecox, jplans, qcai, sprabhu, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: GSSApproved
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:30:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 450759, 450760    
Attachments:
Description Flags
proposed patch none

Description Vitaly Mayatskikh 2008-06-05 11:08:01 UTC
Process hang waiting of semaphore was found in latest RHEL4U6 async kernels
(upcoming RHEL-4.7 is also affected). The hung processes appear to be waiting
for the mm->mmap_sem. eg:

Jun  3 16:09:05 atlddm19 kernel: ps            D ffffffff8030ee5c
0 22869 22825                     (NOTLB)
Jun  3 16:09:05 atlddm19 kernel: 00000102feb8fdd8 0000000000000006
00000102feb8fd98 0006000000000002
Jun  3 16:09:05 atlddm19 kernel:        00000102feb8fda8
ffffffff801d4610 0000000100000001 0000000200000048
Jun  3 16:09:05 atlddm19 kernel:        00000103f42d0030 0000000000d41df1
Jun  3 16:09:05 atlddm19 kernel: Call
Trace:<ffffffff801d4610>{avc_has_perm+70}
<ffffffff803104be>{__down_read+134}
Jun  3 16:09:05 atlddm19 kernel:
<ffffffff8013fe93>{access_process_vm+90}
<ffffffff801ae429>{proc_pid_cmdline+99}
Jun  3 16:09:05 atlddm19 kernel:
<ffffffff801ae9bb>{proc_info_read+85} <ffffffff8017b0d0>{vfs_read+207}
Jun  3 16:09:05 atlddm19 kernel:
<ffffffff8017b32c>{sys_read+69} <ffffffff8011026a>{system_call+126}

The customer reports that this was not seen on the earlier
2.6.9-67.0.7 kernel. On closer look at the show cpu sysrq which was
sent, I see the following process which is holding the mmap_sem

<ffffffff8023c8b4>{showacpu+45}
<ffffffff8011c6f2>{smp_call_function_interrupt+64}
<ffffffff80110b69>{call_function_interrupt+133}
<ffffffff8011700c>{arch_get_unmapped_area_topdown+0}
<ffffffff8016d4d4>{find_vma_prev+26}
<ffffffff8011710e>{arch_get_unmapped_area_topdown+258}
<ffffffff8016e358>{do_mmap_pgoff+333}
<ffffffff803103ce>{__down_write+52}
<ffffffff801284b9>{sys32_mmap2+252}
<ffffffff801265bb>{sysenter_do_call+27}

Takahiro Yasui found a difference between generic and arch-specific
implementations of arch_get_unmapped_area_topdown():

http://post-office.corp.redhat.com/archives/rhkernel-list/2008-March/msg01248.html

Comment 8 Linda Wang 2008-06-06 13:33:34 UTC
Vitaly, can you please post the patch for review?

Comment 9 Vitaly Mayatskikh 2008-06-06 13:45:25 UTC
Now I'm not sure if this not a new bug. I don't know which condition causes this
loop and have no fix for it at the moment.

Comment 10 Vitaly Mayatskikh 2008-06-07 00:13:22 UTC
Ok, it is a loop in the arch_get_unmapped_area_topdown().

        do {
                /*
                 * Lookup failure means no vma is above this address,
                 * else if new region fits below vma->vm_start,
                 * return with success:
                 */
                vma = find_vma(mm, addr);
                if (!vma || addr+len <= vma->vm_start)
                        /* remember the address as a hint for next time */
                        return (mm->free_area_cache = addr);

                /* remember the largest hole we saw so far */
                if (addr + mm->cached_hole_size < vma->vm_start)
                        mm->cached_hole_size = vma->vm_start - addr;

                /* try just below the current vma->vm_start */
                addr = vma->vm_start-len;
        } while (len <= vma->vm_start);

The condition in "while" statement is absolutely correct. However,
find_vma_prev() does not produce lookup failure!

/* Same as find_vma, but also return a pointer to the previous VMA in *pprev. */
struct vm_area_struct *
find_vma_prev(struct mm_struct *mm, unsigned long addr,
                        struct vm_area_struct **pprev)
{
        struct vm_area_struct *vma = NULL, *prev = NULL;
        struct rb_node * rb_node;
        if (!mm)
                goto out;
        
        /* Guard against addr being lower than the first VMA */
        vma = mm->mmap;
                                
        /* Go through the RB tree quickly. */
        rb_node = mm->mm_rb.rb_node;
        
        while (rb_node) {
                struct vm_area_struct *vma_tmp;
                vma_tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb);
         
                if (addr < vma_tmp->vm_end) {
                        rb_node = rb_node->rb_left;
                } else {
                        prev = vma_tmp;
                        if (!prev->vm_next || (addr < prev->vm_next->vm_end))
                                break;
                        rb_node = rb_node->rb_right;
                }
        }

out:
        *pprev = prev;
        return prev ? prev->vm_next : vma;
}


So, in case of there's no vma below given address, find_vma_prev() just returns
the first vma. I don't understand comment "Guard against addr being lower than
the first VMA" and what it tries to guard, but each user of find_prev_vma()
checks return value for NULL.

Comment 12 Vitaly Mayatskikh 2008-06-07 11:47:15 UTC
Created attachment 308606 [details]
proposed patch

Comment 19 Vivek Goyal 2008-06-12 18:32:21 UTC
Committed in 73.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 22 errata-xmlrpc 2008-07-24 19:30:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html