105909 – O(1) scheduler deadlock

Bug 105909 - O(1) scheduler deadlock

Summary: O(1) scheduler deadlock

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-09-29 13:13 UTC by Jun'ichi NOMURA
Modified:	2007-11-30 22:06 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-20 20:54:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
O(1) scheduler patch for ia64 (590 bytes, patch) 2003-09-29 13:16 UTC, Jun'ichi NOMURA	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:550	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4	2004-12-20 05:00:00 UTC

Description Jun'ichi NOMURA 2003-09-29 13:13:21 UTC

Description of problem:

SMP system may freeze, typically by frequent fork/exit.

This is a known problem which is already fixed in 2.5.32.
Patch will be attached.

This happens because there are two different order in kernel to acquire
both runqueue lock and tasklist_lock.
Thus one cpu may try to lock runqueue after tasklist_lock
and the other may try tasklist_lock after runqueue lock.


Version-Release number of selected component (if applicable):

2.4.21-3.EL


How reproducible:

difficult to reproduce.

Steps to Reproduce:
1. Running the following program on machine with many CPUs.
- - - - - - - - - - - - - - - - - - - - 
#include <unistd.h>

#define NPROC 128

int
main(int argc, char** argv)
{
        int i, status;

        for(i=1; i<NPROC; i++) {
                fork();
        }
        while (1) {
                if (fork()==0) {
                       system("exit");
                       exit(0);
                }
                else
                       wait(&status);
        }
}
- - - - - - - - - - - - - - - - - - - - 

2.
3.
    
Actual results:


Expected results:


Additional info:

IA-64 Linux kernel may call wrap_mmu_context during context_switch
to find unused context number.
wrap_mmu_context holds tasklist_lock while searching through tasklist.

On the other hand, some exit related functions grab tasklist_lock
to remove task from the list and then try to hold runqueue lock to wake
up parent.

As context_switch is called with runqueue lock held, there are two
different order to acquire tasklist_lock and runqueue lock.
This can cause dead lock.

Example:
CPU#0:
schedule()
   -> spin_lock_irq(&rq->lock)
   -> context_switch()
      -> wrap_mmu_context()
         -> read_lock(&tasklist_lock)

CPU#1:
sys_wait4()
   -> write_lock(&tasklist_lock)
   -> do_notify_parent()
      -> wake_up_parent()
         -> try_to_wake_up()
            -> spin_lock_irq(&parent_rq->lock)

The problem and fix was discussed in linux-kernel list in July, 2002.
http://marc.theaimsgroup.com/?l=linux-kernel&m=102629373819157&w=2

Comment 1 Jun'ichi NOMURA 2003-09-29 13:16:10 UTC

Created attachment 94811 [details]
O(1) scheduler patch for ia64

The cause of the problem is that context_switch is done with runqueue
lock held. The lock is held only to avoid the running task being stolen
by other cpus.

O(1) scheduler already introduced switch_lock in task_struct for the purpose.

The patch use switch_lock to avoid tasks being stolen during context
switch.

In detail, the patch implements arch-specific macros for ia64:
   - prepare_arch_switch()
   - finish_arch_switch()
   - task_running()

task_running() formerly just compared runqueue->curr with the task.
With this patch, on IA-64, task_running() checks if switch_lock is locked as
well as runqueue->curr. On other architecure, the behaviour is unchanged.

prepare_arch_switch() and finish_arch_switch() are also redefined for IA-64.
By default, prepare_arch_switch() does nothing.
On IA-64, prepare_arch_switch() is changed to lock switch_lock and unlock
runqueue->lock.
finish_arch_switch() unlocks runqueue->lock on architecture other than IA-64.
On IA-64, prepare_arch_switch() is changed to unlock switch_lock.

Comment 2 Jun'ichi NOMURA 2004-09-06 07:05:29 UTC

Though this bug was closed as CURRENTRELEASE,
2.4.21-20.EL (RHEL3 U3) does not include the patch above.

Comment 4 Ernie Petrides 2004-09-20 06:50:57 UTC

The patch in comment #1 has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.8.EL).

Comment 5 John Flanagan 2004-12-20 20:54:44 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Note You need to log in before you can comment on or make changes to this bug.