Bug 124624 - mmap use causes kernel panic
Summary: mmap use causes kernel panic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Larry Woodman
QA Contact:
URL:
Whiteboard:
: 124626 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-05-28 00:47 UTC by ara howard
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-02 04:31:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:433 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 3 2004-09-02 04:00:00 UTC

Description ara howard 2004-05-28 00:47:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2)
Gecko/20040415

Description of problem:
programs which mmap large files, but in small chunks using offset and
length, in rw mode cause kernel panic

Version-Release number of selected component (if applicable):
2.4.21-15.EL

How reproducible:
Always

Steps to Reproduce:
1. create a large file
  
  [ahoward@harp ahoward]$ dd if=/dev/zero of=1gb bs=8192 count=131072
  131072+0 records in
  131072+0 records out

2. compile this program

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>


#define TILE_SIZE 1048576

/*
 *
 * ~ > gcc filemap_bug.c -o filemap_bug
 * ~ > filemap_bug big_file
 *
 */

int
main (argc, argv)
     int argc;
     char **argv;
{

  int ret;
  char *path;
  struct stat buf;
  off_t size, offset, length, tn;
  int fd;
  void *mem;
  unsigned char *start;
  unsigned char *byte;
  int i;

  if (argc < 2)
    {
      fprintf (stderr, "%s huge_input_file\n", __FILE__);
      return (EXIT_FAILURE);
    }

  path = *(argv + 1);
  ret = stat (path, &buf);
  size = buf.st_size;
  fd = open (path, O_RDWR);

  for (offset = 0, tn = 0; offset < size; offset += TILE_SIZE, tn++)
    {
      length = size - offset;
      length = length > TILE_SIZE ? TILE_SIZE : length;
      fprintf (stdout, "<%s>[%d,%d] - tile_number <%d>\n", path,
offset, length, tn);

      mem = mmap (NULL, length, PROT_READ | PROT_WRITE, MAP_SHARED,
fd, offset);
      start = (unsigned char *)mem;
      madvise (start, length, MADV_SEQUENTIAL);
      for (byte = start; byte - start < length; byte++)
        {
          *byte = 42; 
        }
      msync (start, length, MS_SYNC); 
      munmap (start, length);
    }

  close (fd);
  return (EXIT_SUCCESS);
}


3. run the program on created file

  ./a.out 1gb

4. watch the kernel panic (for me around tile 140)


    

Actual Results:  kernel panic

Expected Results:  every byte of the input file == 42

Additional info:

Comment 1 Rik van Riel 2004-05-28 01:42:43 UTC
*** Bug 124626 has been marked as a duplicate of this bug. ***

Comment 2 Rik van Riel 2004-05-28 01:44:50 UTC
Ara, what exactly is the error message you get from the kernel ?

If it contains a null pointer dereference in page_referenced(), a
patch for that got applied to the RHEL code base recently...

Comment 3 ara howard 2004-05-28 12:55:47 UTC
we see something like

  filemap.c:2371 bad pmd c.............

and i __think__ we also saw a screen full of stuff which contained

 ...
 page_referenced()
 ...

but the console server was flaky at that time...

we have done this 4 times and seen the 'bad pmd' error each time.

cheers.

Comment 4 Larry Woodman 2004-06-01 20:08:00 UTC
OK, I can reproduce the problem locally so I'll work on fixing it.

Larry


Comment 5 ara howard 2004-06-01 20:14:51 UTC
great! - please let me know if i can do anything from this end.  we've
got about 160 liscensed enterprise boxes here that we use for processing 
HUGE files so lack of a working mmap is a real show stopper.

Comment 6 Larry Woodman 2004-06-03 01:29:31 UTC
OK, I think its fixed.  Please try out this kernel and let me know how
it goes:

http://people.redhat.com/~lwoodman/.RHEL3/


Larry


Comment 7 Larry Woodman 2004-06-08 14:55:22 UTC
Ara, any news on whether this kernel fixes your problems?

Larry


Comment 8 ara howard 2004-06-08 15:20:39 UTC
yes!  sorry i've not gotten back to you - crazy week.  the patch
worked beautifully.  all i've got left is to try it on an smp machine.
 i will try to get to that today and get back to you.  thanks very
much for the prompt - and correct! - patch.  any idea what the release
schedule for these things normally are?  our sysads typically only run
'official' kernels... ;-(

cheers.

-a

Comment 9 Larry Woodman 2004-06-08 15:26:52 UTC
It will be included in RHEL3-U3 and that has a mid-August release
date target.

Larry


Comment 10 Ernie Petrides 2004-06-09 04:22:13 UTC
Larry's fix for this problem has just been committed to the RHEL3 U3
patch pool this evening (in kernel version 2.4.21-15.8.EL).


Comment 11 John Flanagan 2004-09-02 04:31:42 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html



Note You need to log in before you can comment on or make changes to this bug.