Process hang waiting of semaphore was found in latest RHEL4U6 async kernels (upcoming RHEL-4.7 is also affected). The hung processes appear to be waiting for the mm->mmap_sem. eg: Jun 3 16:09:05 atlddm19 kernel: ps D ffffffff8030ee5c 0 22869 22825 (NOTLB) Jun 3 16:09:05 atlddm19 kernel: 00000102feb8fdd8 0000000000000006 00000102feb8fd98 0006000000000002 Jun 3 16:09:05 atlddm19 kernel: 00000102feb8fda8 ffffffff801d4610 0000000100000001 0000000200000048 Jun 3 16:09:05 atlddm19 kernel: 00000103f42d0030 0000000000d41df1 Jun 3 16:09:05 atlddm19 kernel: Call Trace:<ffffffff801d4610>{avc_has_perm+70} <ffffffff803104be>{__down_read+134} Jun 3 16:09:05 atlddm19 kernel: <ffffffff8013fe93>{access_process_vm+90} <ffffffff801ae429>{proc_pid_cmdline+99} Jun 3 16:09:05 atlddm19 kernel: <ffffffff801ae9bb>{proc_info_read+85} <ffffffff8017b0d0>{vfs_read+207} Jun 3 16:09:05 atlddm19 kernel: <ffffffff8017b32c>{sys_read+69} <ffffffff8011026a>{system_call+126} The customer reports that this was not seen on the earlier 2.6.9-67.0.7 kernel. On closer look at the show cpu sysrq which was sent, I see the following process which is holding the mmap_sem <ffffffff8023c8b4>{showacpu+45} <ffffffff8011c6f2>{smp_call_function_interrupt+64} <ffffffff80110b69>{call_function_interrupt+133} <ffffffff8011700c>{arch_get_unmapped_area_topdown+0} <ffffffff8016d4d4>{find_vma_prev+26} <ffffffff8011710e>{arch_get_unmapped_area_topdown+258} <ffffffff8016e358>{do_mmap_pgoff+333} <ffffffff803103ce>{__down_write+52} <ffffffff801284b9>{sys32_mmap2+252} <ffffffff801265bb>{sysenter_do_call+27} Takahiro Yasui found a difference between generic and arch-specific implementations of arch_get_unmapped_area_topdown(): http://post-office.corp.redhat.com/archives/rhkernel-list/2008-March/msg01248.html
Vitaly, can you please post the patch for review?
Now I'm not sure if this not a new bug. I don't know which condition causes this loop and have no fix for it at the moment.
Ok, it is a loop in the arch_get_unmapped_area_topdown(). do { /* * Lookup failure means no vma is above this address, * else if new region fits below vma->vm_start, * return with success: */ vma = find_vma(mm, addr); if (!vma || addr+len <= vma->vm_start) /* remember the address as a hint for next time */ return (mm->free_area_cache = addr); /* remember the largest hole we saw so far */ if (addr + mm->cached_hole_size < vma->vm_start) mm->cached_hole_size = vma->vm_start - addr; /* try just below the current vma->vm_start */ addr = vma->vm_start-len; } while (len <= vma->vm_start); The condition in "while" statement is absolutely correct. However, find_vma_prev() does not produce lookup failure! /* Same as find_vma, but also return a pointer to the previous VMA in *pprev. */ struct vm_area_struct * find_vma_prev(struct mm_struct *mm, unsigned long addr, struct vm_area_struct **pprev) { struct vm_area_struct *vma = NULL, *prev = NULL; struct rb_node * rb_node; if (!mm) goto out; /* Guard against addr being lower than the first VMA */ vma = mm->mmap; /* Go through the RB tree quickly. */ rb_node = mm->mm_rb.rb_node; while (rb_node) { struct vm_area_struct *vma_tmp; vma_tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb); if (addr < vma_tmp->vm_end) { rb_node = rb_node->rb_left; } else { prev = vma_tmp; if (!prev->vm_next || (addr < prev->vm_next->vm_end)) break; rb_node = rb_node->rb_right; } } out: *pprev = prev; return prev ? prev->vm_next : vma; } So, in case of there's no vma below given address, find_vma_prev() just returns the first vma. I don't understand comment "Guard against addr being lower than the first VMA" and what it tries to guard, but each user of find_prev_vma() checks return value for NULL.
Created attachment 308606 [details] proposed patch
Committed in 73.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html