97233 – malloc hangs using bigpages

Bug 97233 - malloc hangs using bigpages

Summary: malloc hangs using bigpages

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:	http://www.redhat.com/whitepapers/rhe...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-06-11 18:15 UTC by rob lojek
Modified:	2007-11-30 22:06 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-07-15 21:36:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description rob lojek 2003-06-11 18:15:29 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b)
Gecko/20030516 Mozilla Firebird/0.6

Description of problem:
On a machine (compaq dl380 G-3) with 4 gB RAM, trying to malloc 2 gB of RAM with
bigpages enabled locks up at about 1.4 gB, though it eventually finishes in
50-70 seconds.

After disabling bigpages feature, the operation completes successfully in about
6 seconds, the same time as on non-bigpages-capable distributions (RH 7.x/8.0/9).

oracle and redhat recommend using bigpages to enhance oracle performance in
these docs:

http://www.redhat.com/whitepapers/rhel/OracleLinuxInstallTips.pdf
http://otn.oracle.com/tech/linux/pdf/1_linuxVM_v2_accepted.pdf

Version-Release number of selected component (if applicable):
2.4.9-e.24 and all other kernels

How reproducible:
Always

Steps to Reproduce:
1. intall stock RHAS on machine with 4 gB of RAM.
2. upgrade to e.24-enterprise kernel
3. add this line to rc.local:

## Bigpages -- check with 'cat /proc/meminfo'
echo 2 > /proc/sys/kernel/shm-use-bigpages      ## bigpages in shmfs

4. add this line to /etc/lilo.conf for the kernel you're going to boot:

        append="bigpages=2100MB"

5. run lilo ('lilo -v')
6. reboot
7. compile the attached source (gcc -o /usr/local/bin/slurpmem slurpmem.c)
8. run slurpmem like this: 'slurpmem 2000', which attempts to allocated 2 gB of
RAM via malloc system call.
9. box will freeze at about 1.5 gB, taking about 50 seconds to allocate RAM.

control case:
1. comment out the "append=" line in lilo.conf from step 4. above.
2. run lilo & reboot
3. repeat step 8.


Actual Results:  severe lock-up after about 1.4 gB of allocation

Expected Results:  no lock-up

Additional info:

Here's the c program that we use to reproduce the problem:

#include <stdio.h>
#include <string.h>

#define MEG ( 1024 * 1024 )


int main(int argc, char **argv) {

        char    **stored;
        int     megs;
        int i;


        if ( argc < 1 ) {

                printf("No argument specified.\n");
                exit(1);

        }

        megs = atoi(argv[1]);

        printf("%d megabytes will be slurped\n", megs);

        stored = (char **) malloc( sizeof( char* ) * megs );

        if ( stored < 0 ) {

                perror("malloc");
                exit(1);
        }

        printf("Megs zeroed: ");

        for ( i = 0; i < megs; i++ ) {

                stored[i] = (char *) malloc( MEG );

                if ( stored[i] < 0 ) {
                        perror("malloc");
                        exit(1);
                }

                memset( stored[i], 0, MEG - 1);

                printf("%d ", i);

        }

        printf("\n\n");
        printf("All alocated!\n");
        printf("Waiting for <control-c> to finish\n");
        scanf("%d", &i);

        exit(0);
}

Comment 1 Arjan van de Ven 2003-06-11 18:21:24 UTC

memory you set aside for bigpages is not available for normal use; so you
removed more than half of your ram -> result, your code uses more ram than you
have (left) -> swapping etc

Comment 2 rob lojek 2003-06-11 18:30:44 UTC

there's absolutely no swapping going on--I'll post a 'free' in a minute.

There's also nothing at all running on the machine. Obviously, you should be
able to allocate 2 gB of RAM on a machine that has 4 gB of RAM installed--I the
OS is using 2 gB of RAM!

Here's a quote from this document on RH's site:
http://www.redhat.com/whitepapers/rhel/OracleLinuxInstallTips.pdf

"For a SGA of 4GB, bigpages of size 4100MB could be set and for SGA of 2GB,
bigpages of size 2100MB could be set."

We'd simply like to be able to follow this documentation, and have malloc() not
hang during a trivial memory allocation. We're having a tough time running
oracle using 'bigpages', per that document.

Comment 3 rob lojek 2003-06-11 18:31:35 UTC

sorry, that should read "the OS _isn't_ using 2 gB of RAM"

Comment 4 rob lojek 2003-06-11 18:35:13 UTC

Sorry, misread your original reply. Yes, there's indeed swapping going on, which
explains the delay at ~1.4 gB.

Note You need to log in before you can comment on or make changes to this bug.