Bug 105910 - Oops/Freeze by System V semaphore
Oops/Freeze by System V semaphore
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Ernie Petrides
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-09-29 09:28 EDT by Jun'ichi NOMURA
Modified: 2007-11-30 17:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-18 21:35:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix ipc_lock() (1.66 KB, patch)
2003-09-29 09:33 EDT, Jun'ichi NOMURA
no flags Details | Diff
test program to reproduce the semop race (1.21 KB, patch)
2003-09-29 09:35 EDT, Jun'ichi NOMURA
no flags Details | Diff
Fix ipc race (better performance) (2.49 KB, patch)
2003-10-03 06:48 EDT, Jun'ichi NOMURA
no flags Details | Diff
version of IPC locking fix committed to RHEL 3 U1 (3.51 KB, patch)
2003-11-26 19:24 EST, Ernie Petrides
no flags Details | Diff

  None (edit)
Description Jun'ichi NOMURA 2003-09-29 09:28:54 EDT
Description of problem:

When using sys_semtimedop on SMP system, process may stall or oops may occur.

The cause of the problem is that ipc_lock() does not protect the ipc_id
properly.
As a result, the one can remove the ipc_id to which the others refer.

The problem can be fixed by modifying ipc_lock().

Although the problem was observed for sys_semtimedop, the problem may
occur for other SystemV IPC functions too.

Version-Release number of selected component (if applicable):
2.4.21-3.EL

How reproducible:
Everytime after typical program.

Steps to Reproduce:
1. Run the test program (will be attached)
2. Repeat step 1.
3.
    
Actual results:
Oops will occur on BUG() on ipc/sem.c:964,
or the process will never return from sys_semtimedop() and keep running.

Expected results:
System should not freeze.

Additional info:

ipc_lock() is implemented as following:

     struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
     {
             struct kern_ipc_perm* out;
             int lid = id % SEQ_MULTIPLIER;
             if(lid >= ids->size)
                     return NULL;

             br_read_lock(BR_SEMAPHORE_LOCK);
             out = ids->entries[lid].p;
             if (!out) {
                     br_read_unlock(BR_SEMAPHORE_LOCK);
                     return NULL;
             }
             >>>>>>> window here <<<<<<<
             spin_lock(&out->lock);
             return out;
     }

'ids->entries[lid].p' is non-NULL when the id exists.

There is a time window as shown above.
In the window, 'ids->entries[lid].p' could be changed to NULL by ipc_rmid()
on the other cpu. In such a case, the running process will try to obtain
the lock for non-existing id.

The other problem is that setting NULL to 'ids->entries[lid].p' does not
mean the completion of removing-id procedure.
So one may see 'ids->entries[lid].p' being NULL while the other status
indicates the id still exists.

[Example1: CPU#1 will spin forever]

   CPU#0                                       CPU#1
   ---------------------------------------------------------------------
   ipc_lock()                                  ipc_lock()
     spin_lock(entry->lock)
     <obtaining lock>
                                                 spin_lock(entry->lock)
                                                 <waiting for the lock>
   ipc_rmid()
     <set the pointer to the entry as NULL>
     <don't unlock the lock>
                                                 STALL..


[Example2: CPU#1 will Oops by BUG() in sys_semtimedop]

   CPU#0                                       CPU#1
   ---------------------------------------------------------------------
                                               sys_semtimedop()
   ipc_lock()
       spin_lock(entry->lock)
       <obtaining lock>
   ipc_rmid()
     <set the pointer to the entry as NULL>
                                                 ipc_lock()
     ... the id is going to be removed ...         <does nothing>
                                                 <find inconsistency>
                                                 BUG()

The problem can be fixed by locking before reading the pointer.

     struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
     {
             struct kern_ipc_perm* out;
             int lid = id % SEQ_MULTIPLIER;
             if(lid >= ids->size)
                     return NULL;

             br_read_lock(BR_SEMAPHORE_LOCK);
             // HERE: lock the pointer to the entry
             spin_lock(&(ids->entries[lid].lock)); 
             out = ids->entries[lid].p;
             if (!out) {
                     spin_unlock(&(ids->entries[lid].lock));
                     br_read_unlock(BR_SEMAPHORE_LOCK);
                     return NULL;
             }
             return out;
     }
Comment 1 Jun'ichi NOMURA 2003-09-29 09:33:18 EDT
Created attachment 94812 [details]
Fix ipc_lock()

The patch moves lock variable from struct kern_ipc_perm to struct ipc_id.
The lock is held before reading the pointer to kern_ipc_perm.

The stock 2.4 kernel does not have this problem because it has big lock
over whole subsystem of SystemV semaphore.
The problem is introduced by semaphore scaling patch which split the
lock for each semaphore id.
Comment 2 Jun'ichi NOMURA 2003-09-29 09:35:46 EDT
Created attachment 94813 [details]
test program to reproduce the semop race

Running the attached program on SMP machine could reproduce the problem.

Example of execution:
  # cc test.c
  # while true; do ./a.out 256; done
Comment 3 Jun'ichi NOMURA 2003-10-03 06:48:54 EDT
Created attachment 94909 [details]
Fix ipc race (better performance)

The previous patch (#94812) had cache bouncing problem by packing
spinlocks in the array.
The attached patch instead utilize write lock of BR_SEMAPHORE_LOCK
for ipc_rmid.
Comment 4 Ernie Petrides 2003-11-20 23:41:01 EST
Thank you very much for the reproducer program.  We've made
fixes to the IPC locking in RHEL 3 U1, and with your sem-crash
test, I've just now verified that the fixes resolve the problem.

The fixes were committed on 23-Oct-2003 in the (internal-only)
build of kernel version 2.4.21-4.6.EL.
Comment 5 Ernie Petrides 2003-11-26 19:24:59 EST
Created attachment 96221 [details]
version of IPC locking fix committed to RHEL 3 U1

Here is a copy of the patch that was committed to RHEL 3 U1
to fix the IPC locking races.

Note You need to log in before you can comment on or make changes to this bug.