From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830 Description of problem: From IT#53358: Hitachi reports their customers can't do shmctl(.. SHM_LOCK ..). The call frequently returns "Cannot allocate memory" regardless of the amount of free memory. Thru the test programs, we find the culprit is caused by SHM ops are allowed based on "uid" between different processes but the counter used to track the usages (current->mm->mm->locked_vm) is per process based. The problem can be easily shown by: 1. Run the "shmget" test program to get 3 shared memory segments.The mm->locked_vm within the process' task structure would be incremented correctly to show the number of pages that are allocated to host these shared memory pages allocated by this process. 2. Then, run the "shmreget" test program to remove the above segments and subsequently re-allocate one more. This would result in the following update to the mm->locked_vm field: * process starts with mm->locked_vm=0. * process issues IPC_RMID op three times, the "minus" operations would result mm->locked_vm to become a huge unsigned int (0xfffffff9 in my test run / negative if viewed as int). * process issues SHM_LOCK call, it will fail in #768 (linux-2.4.21-23.EL/mm/shmem.c) since locked is is a huge unsigned int (negative/0xfffffff8): 761 if (lock && !info->locked) { 762 locked = inode->i_size >> PAGE_SHIFT; 763 locked += mm->locked_vm; 764 lock_limit = current->rlim[RLIMIT_MEMLOCK].rlim_cur; 765 lock_limit >>= PAGE_SHIFT; 766 if (locked > lock_limit && !capable(CAP_IPC_LOCK)) 767 goto out_nomem; 768 if (locked > num_physpages/10*9) 769 goto out_nomem; 770 mm->locked_vm = locked; Have a test kernel for the customer to work around this issue. For the real fix, I would suggest back port 2.6 implementation into 2.4 but it will break KABI. Version-Release number of selected component (if applicable): kernel-2.4.21-26.ELsmp How reproducible: Always Steps to Reproduce: (see the problem description - test programs will be uploaded. Additional info:
Created attachment 106967 [details] test program 1. Untar the file (tar xvf shmtest.tar) and build the two executables (shmget and shmreget) by "make". 2. Run the ./shmget to obtain three shared memory regions. 3. Run the ./shmreget to remove the above regions and get a new one. The 3) would fail with ENOMEM.
Created attachment 106970 [details] wor around patch This is the "better-than-before" patch that goes with the test kernel sent to the customer. Also thought about changing the locked_vm into "int" instead of "unsigned int". I eventually gave it up since the kernel had been doing too many shift (<< or >>) operations that is cumbersome to get it right. Still, the 2.6 implementation looks most reasonable, except the KABI issue.
Yes, kABI needs to be broken in order for this bug to be fixed. That's going to have to be a decision for PM to make.
This is a dup of bug 126411, but I'm leaving it open and putting it on the KABI-blocker list.
A fix for this problem was committed to the RHEL3 U6 patch pool on 4-May-2005 (in kernel version 2.4.21-32.3.EL) and to the RHEL3 E6 patch pool on 16-May-2005 (in kernel version 2.4.21-32.0.1.EL). Following is the Errata System notice about the release fixing this: "An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you." *** This bug has been marked as a duplicate of 126411 ***