Bug 105909 - O(1) scheduler deadlock
Summary: O(1) scheduler deadlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: ia64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-09-29 13:13 UTC by Jun'ichi NOMURA
Modified: 2007-11-30 22:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-12-20 20:54:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
O(1) scheduler patch for ia64 (590 bytes, patch)
2003-09-29 13:16 UTC, Jun'ichi NOMURA
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:550 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4 2004-12-20 05:00:00 UTC

Description Jun'ichi NOMURA 2003-09-29 13:13:21 UTC
Description of problem:

SMP system may freeze, typically by frequent fork/exit.

This is a known problem which is already fixed in 2.5.32.
Patch will be attached.

This happens because there are two different order in kernel to acquire
both runqueue lock and tasklist_lock.
Thus one cpu may try to lock runqueue after tasklist_lock
and the other may try tasklist_lock after runqueue lock.


Version-Release number of selected component (if applicable):

2.4.21-3.EL


How reproducible:

difficult to reproduce.

Steps to Reproduce:
1. Running the following program on machine with many CPUs.
- - - - - - - - - - - - - - - - - - - - 
#include <unistd.h>

#define NPROC 128

int
main(int argc, char** argv)
{
        int i, status;

        for(i=1; i<NPROC; i++) {
                fork();
        }
        while (1) {
                if (fork()==0) {
                       system("exit");
                       exit(0);
                }
                else
                       wait(&status);
        }
}
- - - - - - - - - - - - - - - - - - - - 

2.
3.
    
Actual results:


Expected results:


Additional info:

IA-64 Linux kernel may call wrap_mmu_context during context_switch
to find unused context number.
wrap_mmu_context holds tasklist_lock while searching through tasklist.

On the other hand, some exit related functions grab tasklist_lock
to remove task from the list and then try to hold runqueue lock to wake
up parent.

As context_switch is called with runqueue lock held, there are two
different order to acquire tasklist_lock and runqueue lock.
This can cause dead lock.

Example:
CPU#0:
schedule()
   -> spin_lock_irq(&rq->lock)
   -> context_switch()
      -> wrap_mmu_context()
         -> read_lock(&tasklist_lock)

CPU#1:
sys_wait4()
   -> write_lock(&tasklist_lock)
   -> do_notify_parent()
      -> wake_up_parent()
         -> try_to_wake_up()
            -> spin_lock_irq(&parent_rq->lock)

The problem and fix was discussed in linux-kernel list in July, 2002.
http://marc.theaimsgroup.com/?l=linux-kernel&m=102629373819157&w=2

Comment 1 Jun'ichi NOMURA 2003-09-29 13:16:10 UTC
Created attachment 94811 [details]
O(1) scheduler patch for ia64

The cause of the problem is that context_switch is done with runqueue
lock held. The lock is held only to avoid the running task being stolen
by other cpus.

O(1) scheduler already introduced switch_lock in task_struct for the purpose.

The patch use switch_lock to avoid tasks being stolen during context
switch.

In detail, the patch implements arch-specific macros for ia64:
   - prepare_arch_switch()
   - finish_arch_switch()
   - task_running()

task_running() formerly just compared runqueue->curr with the task.
With this patch, on IA-64, task_running() checks if switch_lock is locked as
well as runqueue->curr. On other architecure, the behaviour is unchanged.

prepare_arch_switch() and finish_arch_switch() are also redefined for IA-64.
By default, prepare_arch_switch() does nothing.
On IA-64, prepare_arch_switch() is changed to lock switch_lock and unlock
runqueue->lock.
finish_arch_switch() unlocks runqueue->lock on architecture other than IA-64.
On IA-64, prepare_arch_switch() is changed to unlock switch_lock.

Comment 2 Jun'ichi NOMURA 2004-09-06 07:05:29 UTC
Though this bug was closed as CURRENTRELEASE,
2.4.21-20.EL (RHEL3 U3) does not include the patch above.


Comment 4 Ernie Petrides 2004-09-20 06:50:57 UTC
The patch in comment #1 has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.8.EL).


Comment 5 John Flanagan 2004-12-20 20:54:44 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html



Note You need to log in before you can comment on or make changes to this bug.