Bug 105910

Summary: Oops/Freeze by System V semaphore
Product: Red Hat Enterprise Linux 3 Reporter: Jun'ichi NOMURA <junichi.nomura>
Component: kernelAssignee: Ernie Petrides <petrides>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: kenneth.w.chen, mingo
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-19 01:35:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix ipc_lock()
none
test program to reproduce the semop race
none
Fix ipc race (better performance)
none
version of IPC locking fix committed to RHEL 3 U1 none

Description Jun'ichi NOMURA 2003-09-29 13:28:54 UTC
Description of problem:

When using sys_semtimedop on SMP system, process may stall or oops may occur.

The cause of the problem is that ipc_lock() does not protect the ipc_id
properly.
As a result, the one can remove the ipc_id to which the others refer.

The problem can be fixed by modifying ipc_lock().

Although the problem was observed for sys_semtimedop, the problem may
occur for other SystemV IPC functions too.

Version-Release number of selected component (if applicable):
2.4.21-3.EL

How reproducible:
Everytime after typical program.

Steps to Reproduce:
1. Run the test program (will be attached)
2. Repeat step 1.
3.
    
Actual results:
Oops will occur on BUG() on ipc/sem.c:964,
or the process will never return from sys_semtimedop() and keep running.

Expected results:
System should not freeze.

Additional info:

ipc_lock() is implemented as following:

     struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
     {
             struct kern_ipc_perm* out;
             int lid = id % SEQ_MULTIPLIER;
             if(lid >= ids->size)
                     return NULL;

             br_read_lock(BR_SEMAPHORE_LOCK);
             out = ids->entries[lid].p;
             if (!out) {
                     br_read_unlock(BR_SEMAPHORE_LOCK);
                     return NULL;
             }
             >>>>>>> window here <<<<<<<
             spin_lock(&out->lock);
             return out;
     }

'ids->entries[lid].p' is non-NULL when the id exists.

There is a time window as shown above.
In the window, 'ids->entries[lid].p' could be changed to NULL by ipc_rmid()
on the other cpu. In such a case, the running process will try to obtain
the lock for non-existing id.

The other problem is that setting NULL to 'ids->entries[lid].p' does not
mean the completion of removing-id procedure.
So one may see 'ids->entries[lid].p' being NULL while the other status
indicates the id still exists.

[Example1: CPU#1 will spin forever]

   CPU#0                                       CPU#1
   ---------------------------------------------------------------------
   ipc_lock()                                  ipc_lock()
     spin_lock(entry->lock)
     <obtaining lock>
                                                 spin_lock(entry->lock)
                                                 <waiting for the lock>
   ipc_rmid()
     <set the pointer to the entry as NULL>
     <don't unlock the lock>
                                                 STALL..


[Example2: CPU#1 will Oops by BUG() in sys_semtimedop]

   CPU#0                                       CPU#1
   ---------------------------------------------------------------------
                                               sys_semtimedop()
   ipc_lock()
       spin_lock(entry->lock)
       <obtaining lock>
   ipc_rmid()
     <set the pointer to the entry as NULL>
                                                 ipc_lock()
     ... the id is going to be removed ...         <does nothing>
                                                 <find inconsistency>
                                                 BUG()

The problem can be fixed by locking before reading the pointer.

     struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
     {
             struct kern_ipc_perm* out;
             int lid = id % SEQ_MULTIPLIER;
             if(lid >= ids->size)
                     return NULL;

             br_read_lock(BR_SEMAPHORE_LOCK);
             // HERE: lock the pointer to the entry
             spin_lock(&(ids->entries[lid].lock)); 
             out = ids->entries[lid].p;
             if (!out) {
                     spin_unlock(&(ids->entries[lid].lock));
                     br_read_unlock(BR_SEMAPHORE_LOCK);
                     return NULL;
             }
             return out;
     }

Comment 1 Jun'ichi NOMURA 2003-09-29 13:33:18 UTC
Created attachment 94812 [details]
Fix ipc_lock()

The patch moves lock variable from struct kern_ipc_perm to struct ipc_id.
The lock is held before reading the pointer to kern_ipc_perm.

The stock 2.4 kernel does not have this problem because it has big lock
over whole subsystem of SystemV semaphore.
The problem is introduced by semaphore scaling patch which split the
lock for each semaphore id.

Comment 2 Jun'ichi NOMURA 2003-09-29 13:35:46 UTC
Created attachment 94813 [details]
test program to reproduce the semop race

Running the attached program on SMP machine could reproduce the problem.

Example of execution:
  # cc test.c
  # while true; do ./a.out 256; done

Comment 3 Jun'ichi NOMURA 2003-10-03 10:48:54 UTC
Created attachment 94909 [details]
Fix ipc race (better performance)

The previous patch (#94812) had cache bouncing problem by packing
spinlocks in the array.
The attached patch instead utilize write lock of BR_SEMAPHORE_LOCK
for ipc_rmid.

Comment 4 Ernie Petrides 2003-11-21 04:41:01 UTC
Thank you very much for the reproducer program.  We've made
fixes to the IPC locking in RHEL 3 U1, and with your sem-crash
test, I've just now verified that the fixes resolve the problem.

The fixes were committed on 23-Oct-2003 in the (internal-only)
build of kernel version 2.4.21-4.6.EL.


Comment 5 Ernie Petrides 2003-11-27 00:24:59 UTC
Created attachment 96221 [details]
version of IPC locking fix committed to RHEL 3 U1

Here is a copy of the patch that was committed to RHEL 3 U1
to fix the IPC locking races.