Bug 97233

Summary: malloc hangs using bigpages
Product: Red Hat Enterprise Linux 2.1 Reporter: rob lojek <rob>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: rob
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
URL: http://www.redhat.com/whitepapers/rhel/OracleLinuxInstallTips.pdf
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-07-15 21:36:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description rob lojek 2003-06-11 18:15:29 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b)
Gecko/20030516 Mozilla Firebird/0.6

Description of problem:
On a machine (compaq dl380 G-3) with 4 gB RAM, trying to malloc 2 gB of RAM with
bigpages enabled locks up at about 1.4 gB, though it eventually finishes in
50-70 seconds.

After disabling bigpages feature, the operation completes successfully in about
6 seconds, the same time as on non-bigpages-capable distributions (RH 7.x/8.0/9).

oracle and redhat recommend using bigpages to enhance oracle performance in
these docs:

http://www.redhat.com/whitepapers/rhel/OracleLinuxInstallTips.pdf
http://otn.oracle.com/tech/linux/pdf/1_linuxVM_v2_accepted.pdf

Version-Release number of selected component (if applicable):
2.4.9-e.24 and all other kernels

How reproducible:
Always

Steps to Reproduce:
1. intall stock RHAS on machine with 4 gB of RAM.
2. upgrade to e.24-enterprise kernel
3. add this line to rc.local:

## Bigpages -- check with 'cat /proc/meminfo'
echo 2 > /proc/sys/kernel/shm-use-bigpages      ## bigpages in shmfs

4. add this line to /etc/lilo.conf for the kernel you're going to boot:

        append="bigpages=2100MB"

5. run lilo ('lilo -v')
6. reboot
7. compile the attached source (gcc -o /usr/local/bin/slurpmem slurpmem.c)
8. run slurpmem like this: 'slurpmem 2000', which attempts to allocated 2 gB of
RAM via malloc system call.
9. box will freeze at about 1.5 gB, taking about 50 seconds to allocate RAM.

control case:
1. comment out the "append=" line in lilo.conf from step 4. above.
2. run lilo & reboot
3. repeat step 8.


Actual Results:  severe lock-up after about 1.4 gB of allocation

Expected Results:  no lock-up

Additional info:

Here's the c program that we use to reproduce the problem:

#include <stdio.h>
#include <string.h>

#define MEG ( 1024 * 1024 )


int main(int argc, char **argv) {

        char    **stored;
        int     megs;
        int i;


        if ( argc < 1 ) {

                printf("No argument specified.\n");
                exit(1);

        }

        megs = atoi(argv[1]);

        printf("%d megabytes will be slurped\n", megs);

        stored = (char **) malloc( sizeof( char* ) * megs );

        if ( stored < 0 ) {

                perror("malloc");
                exit(1);
        }

        printf("Megs zeroed: ");

        for ( i = 0; i < megs; i++ ) {

                stored[i] = (char *) malloc( MEG );

                if ( stored[i] < 0 ) {
                        perror("malloc");
                        exit(1);
                }

                memset( stored[i], 0, MEG - 1);

                printf("%d ", i);

        }

        printf("\n\n");
        printf("All alocated!\n");
        printf("Waiting for <control-c> to finish\n");
        scanf("%d", &i);

        exit(0);
}

Comment 1 Arjan van de Ven 2003-06-11 18:21:24 UTC
memory you set aside for bigpages is not available for normal use; so you
removed more than half of your ram -> result, your code uses more ram than you
have (left) -> swapping etc 

Comment 2 rob lojek 2003-06-11 18:30:44 UTC
there's absolutely no swapping going on--I'll post a 'free' in a minute.

There's also nothing at all running on the machine. Obviously, you should be
able to allocate 2 gB of RAM on a machine that has 4 gB of RAM installed--I the
OS is using 2 gB of RAM!

Here's a quote from this document on RH's site:
http://www.redhat.com/whitepapers/rhel/OracleLinuxInstallTips.pdf

"For a SGA of 4GB, bigpages of size 4100MB could be set and for SGA of 2GB,
bigpages of size 2100MB could be set."

We'd simply like to be able to follow this documentation, and have malloc() not
hang during a trivial memory allocation. We're having a tough time running
oracle using 'bigpages', per that document.

Comment 3 rob lojek 2003-06-11 18:31:35 UTC
sorry, that should read "the OS _isn't_ using 2 gB of RAM"

Comment 4 rob lojek 2003-06-11 18:35:13 UTC
Sorry, misread your original reply. Yes, there's indeed swapping going on, which
explains the delay at ~1.4 gB.