From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2) Gecko/20040415 Description of problem: programs which mmap large files, but in small chunks using offset and length, in rw mode cause kernel panic Version-Release number of selected component (if applicable): 2.4.21-15.EL How reproducible: Always Steps to Reproduce: 1. create a large file [ahoward@harp ahoward]$ dd if=/dev/zero of=1gb bs=8192 count=131072 131072+0 records in 131072+0 records out 2. compile this program #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <sys/mman.h> #include <fcntl.h> #define TILE_SIZE 1048576 /* * * ~ > gcc filemap_bug.c -o filemap_bug * ~ > filemap_bug big_file * */ int main (argc, argv) int argc; char **argv; { int ret; char *path; struct stat buf; off_t size, offset, length, tn; int fd; void *mem; unsigned char *start; unsigned char *byte; int i; if (argc < 2) { fprintf (stderr, "%s huge_input_file\n", __FILE__); return (EXIT_FAILURE); } path = *(argv + 1); ret = stat (path, &buf); size = buf.st_size; fd = open (path, O_RDWR); for (offset = 0, tn = 0; offset < size; offset += TILE_SIZE, tn++) { length = size - offset; length = length > TILE_SIZE ? TILE_SIZE : length; fprintf (stdout, "<%s>[%d,%d] - tile_number <%d>\n", path, offset, length, tn); mem = mmap (NULL, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset); start = (unsigned char *)mem; madvise (start, length, MADV_SEQUENTIAL); for (byte = start; byte - start < length; byte++) { *byte = 42; } msync (start, length, MS_SYNC); munmap (start, length); } close (fd); return (EXIT_SUCCESS); } 3. run the program on created file ./a.out 1gb 4. watch the kernel panic (for me around tile 140) Actual Results: kernel panic Expected Results: every byte of the input file == 42 Additional info:
*** Bug 124626 has been marked as a duplicate of this bug. ***
Ara, what exactly is the error message you get from the kernel ? If it contains a null pointer dereference in page_referenced(), a patch for that got applied to the RHEL code base recently...
we see something like filemap.c:2371 bad pmd c............. and i __think__ we also saw a screen full of stuff which contained ... page_referenced() ... but the console server was flaky at that time... we have done this 4 times and seen the 'bad pmd' error each time. cheers.
OK, I can reproduce the problem locally so I'll work on fixing it. Larry
great! - please let me know if i can do anything from this end. we've got about 160 liscensed enterprise boxes here that we use for processing HUGE files so lack of a working mmap is a real show stopper.
OK, I think its fixed. Please try out this kernel and let me know how it goes: http://people.redhat.com/~lwoodman/.RHEL3/ Larry
Ara, any news on whether this kernel fixes your problems? Larry
yes! sorry i've not gotten back to you - crazy week. the patch worked beautifully. all i've got left is to try it on an smp machine. i will try to get to that today and get back to you. thanks very much for the prompt - and correct! - patch. any idea what the release schedule for these things normally are? our sysads typically only run 'official' kernels... ;-( cheers. -a
It will be included in RHEL3-U3 and that has a mid-August release date target. Larry
Larry's fix for this problem has just been committed to the RHEL3 U3 patch pool this evening (in kernel version 2.4.21-15.8.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-433.html