Bug 472558

Summary:	oops in mirror_map (dm-raid1.c)
Product:	Red Hat Enterprise Linux 5	Reporter:	Shane Bradley <sbradley>
Component:	kernel	Assignee:	Heinz Mauelshagen <heinzm>
Status:	CLOSED ERRATA	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	urgent	Docs Contact:
Priority:	medium
Version:	5.2	CC:	agk, bmr, christophe.varoqui, coughlan, dejohnso, dmair, dwysocha, edamato, egoggin, emcnabb, heinzm, jbrassow, junichi.nomura, kueda, lmb, mbroz, mgahagan, pep, prockai, rwheeler, sghosh, tao, tranlan
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-09-02 08:45:30 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	481120

Description Shane Bradley 2008-11-21 18:29:44 UTC

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/2008111217 Fedora/3.0.4-1.fc9 Firefox/3.0.4

Customer is getting random kernel panics with nfs. They have nfsd
setup as a clustered service. There is no pattern to this event.


Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
 [<ffffffff88261e37>] :dm_mirror:mirror_map+0x109/0x152
PGD 3af630067 PUD 3af631067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:04.0/0000:17:00.0/0000:18:08.0/0000:19:00.0/0000:1a:00.0/irq
CPU 8
Modules linked in: nfsd exportfs lockd nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2 dlm configfs sunrpc bonding ib_iser libiscsi scsi_transport_isc
si ib_srp ib_sdp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_round_robin dm_multipath video sb
s backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sg shpchp pcspkr ide_cd serio_raw cdrom bnx2 dm_snapshot dm_zero dm_mir
ror dm_mod usb_storage qla2xxx scsi_transport_fc ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 11418, comm: nfsd Not tainted 2.6.18-92.1.1.el5 #1
RIP: 0010:[<ffffffff88261e37>]  [<ffffffff88261e37>] :dm_mirror:mirror_map+0x109/0x152
RSP: 0018:ffff81041c7217d0  EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff81016ed76d40
RDX: ffff81016ed76d48 RSI: 0000000000011200 RDI: ffff81042fff8100
RBP: ffff8102309ef840 R08: ffffffff88266740 R09: 0000000013788748
R10: 0000000122e46000 R11: 0000000000000024 R12: ffff81042c609b78
R13: 0000000000000000 R14: ffff8102541c0be0 R15: 000000000fd00014
FS:  00002ad83c66f6e0(0000) GS:ffff81010eb309c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 00000003ae6bf000 CR4: 00000000000006e0
Process nfsd (pid: 11418, threadinfo ffff81041c720000, task ffff81042945e7a0)
Stack:  0000000013788748 ffff8102309ef840 ffff8102541c0bd0 0000000000000400
 00000000000000b8 ffffffff882463f2 ffff810428190a00 ffff8102541c0bd0
 0000000000000000 ffff810428190a00 ffff8102541c0bd0 ffffffff88246f3c

--------------------------------------------------------------------------------

PID: 11418  TASK: ffff81042945e7a0  CPU: 8   COMMAND: "nfsd"
 #0 [ffff81041c721530] crash_kexec at ffffffff800aaa9a      
 #1 [ffff81041c7215f0] __die at ffffffff800650af            
 #2 [ffff81041c721630] do_page_fault at ffffffff80066aa1    
 #3 [ffff81041c721720] error_exit at ffffffff8005dde9       
    [exception RIP: mirror_map+265]                         
    RIP: ffffffff88261e37  RSP: ffff81041c7217d0  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: 0000000000000001  RCX: ffff81016ed76d40
    RDX: ffff81016ed76d48  RSI: 0000000000011200  RDI: ffff81042fff8100
    RBP: ffff8102309ef840   R8: ffffffff88266740   R9: 0000000013788748
    R10: 0000000122e46000  R11: 0000000000000024  R12: ffff81042c609b78
    R13: 0000000000000000  R14: ffff8102541c0be0  R15: 000000000fd00014
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                     
 #4 [ffff81041c7217f8] __map_bio at ffffffff882463f2                   
 #5 [ffff81041c721828] __split_bio at ffffffff88246f3c                 
 #6 [ffff81041c7218b8] dm_request at ffffffff88247938                  
 #7 [ffff81041c7218d8] generic_make_request at ffffffff8001bbc8        
 #8 [ffff81041c721958] submit_bio at ffffffff80033035                  
 #9 [ffff81041c721998] mpage_bio_submit at ffffffff800e948f            
#10 [ffff81041c7219a8] mpage_readpages at ffffffff80038be1             
#11 [ffff81041c721ae8] __do_page_cache_readahead at ffffffff800129b2   
#12 [ffff81041c721bd8] blockable_page_cache_readahead at ffffffff80032010
#13 [ffff81041c721c08] page_cache_readahead at ffffffff80013aad          
#14 [ffff81041c721c38] do_generic_mapping_read at ffffffff8000be7f       
#15 [ffff81041c721d28] generic_file_sendfile at ffffffff800bfdf5         
#16 [ffff81041c721d68] nfsd_vfs_read at ffffffff887809ee                 
#17 [ffff81041c721db8] nfsd_read at ffffffff88780f38                     
#18 [ffff81041c721e48] nfsd_dispatch at ffffffff8877d1db                 
#19 [ffff81041c721e78] svc_process at ffffffff885d04fb                   
#20 [ffff81041c721ee8] nfsd at ffffffff8877d746                          
#21 [ffff81041c721f48] kernel_thread at ffffffff8005dfb1 
--------------------------------------------------------------------------------

It appears that the process that caused the crash was nfsd.

PID: 11418  TASK: ffff81042945e7a0  CPU: 8   COMMAND: "nfsd"
#0 [ffff81041c721530] crash_kexec at ffffffff800aaa9a
#1 [ffff81041c7215f0] __die at ffffffff800650af
#2 [ffff81041c721630] do_page_fault at ffffffff80066aa1
#3 [ffff81041c721720] error_exit at ffffffff8005dde9
   [exception RIP: mirror_map+265]

RIP: ffffffff88261e37  RSP: ffff81041c7217d0  RFLAGS: 00010282
   RAX: 0000000000000000  RBX: 0000000000000001  RCX: ffff81016ed76d40
   RDX: ffff81016ed76d48  RSI: 0000000000011200  RDI: ffff81042fff8100
   RBP: ffff8102309ef840   R8: ffffffff88266740   R9: 0000000013788748
   R10: 0000000122e46000  R11: 0000000000000024  R12: ffff8104 2c609b78
   R13: 0000000000000000  R14: ffff8102541c0be0  R15: 000000000fd00014
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

crash> dis mirror_map
.
.
.
0xffffffff88261e2d <mirror_map+255>:    mov    0x10(%r12),%rax
0xffffffff88261e32 <mirror_map+260>:    mov    $0x1,%ebx
0xffffffff88261e37 <mirror_map+265>:    mov    0x18(%rax),%rax

Looks like the function died on a mov operation. The reason is that it
failed was Null(O value stands for NULL):  RAX: 0000000000000000

It tried to move the data in RAX to address with offset of +18. In
this case that would Null - 18.

This is memory address that either does not exist or is restricted
space that this process can not touch. Address in restricted space is
most likely since kexec was called.


Reproducible: Couldn't Reproduce

Steps to Reproduce:
This kernel panic just randomly occurs. 
Customer is running nfsd as a clustered service.
Actual Results:  
kernel panic occurs.

Expected Results:  
nfsd should not trigger a kernel panic.

A new Core is located:
megatron.gsslab.rdu.redhat.com
/cores/20081113035808/work

My notes are based on previous core, which is the same as the new core.Your corefile is ready for you
You may view it at megatron.gsslab.rdu.redhat.com
Login with kerberos name/password
$ cd /cores/20081028102941/work
/cores/20081028102941/work$ ./crash

Comment 2 Steve Whitehouse 2008-11-24 09:40:09 UTC

What filesystem is in use here? It looks like a problem in dm to me.

Comment 3 Bryn M. Reeves 2008-11-24 12:34:12 UTC

Right - mirror_map is part of dm-raid1.c - this is a device-mapper bug, not NFS.

Comment 4 Bryn M. Reeves 2008-11-24 12:35:44 UTC

This is a device-mapper kernel bug - the device-mapper component is for the user space bits (dmsetup, libdevmapper etc.). Re-assigning to kernel.

Comment 7 Issue Tracker 2008-11-25 23:07:43 UTC

Failing here in map_bio:

 912 static void map_bio(struct mirror *m, struct bio *bio)
 913 {
*914         bio->bi_bdev = m->dev->bdev;
 915         bio->bi_sector = map_sector(m, bio);
 916 }

The dev field of the mirror struct is NULL:

                                                               # %r12
holds a pointer to m
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm-raid1.c:
914
0xffffffff88261e2d <mirror_map+255>:    mov    0x10(%r12),%rax # load
m->dev into %rax
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm-raid1.c:
915
0xffffffff88261e32 <mirror_map+260>:    mov    $0x1,%ebx # load 0x1 into
%ebx
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm-raid1.c:
914
0xffffffff88261e37 <mirror_map+265>:    mov    0x18(%rax),%rax # load
dev->bdev into %rax

 Unable to handle kernel NULL pointer dereference at 0000000000000018
RIP:
  [<ffffffff88261e37>] :dm_mirror:mirror_map+0x109/0x152
 Pid: 11418, comm: nfsd Not tainted 2.6.18-92.1.1.el5 #1
 RIP: 0010:[<ffffffff88261e37>]  [<ffffffff88261e37>]
:dm_mirror:mirror_map+0x109/0x152
 RSP: 0018:ffff81041c7217d0  EFLAGS: 00010282
*RAX: 0000000000000000*RBX: 0000000000000001 RCX: ffff81016ed76d40
 RDX: ffff81016ed76d48 RSI: 0000000000011200 RDI: ffff81042fff8100
 RBP: ffff8102309ef840 R08: ffffffff88266740 R09: 0000000013788748
 R10: 0000000122e46000 R11: 0000000000000024 R12: ffff81042c609b78
 R13: 0000000000000000 R14: ffff8102541c0be0 R15: 000000000fd00014
 FS:  00002ad83c66f6e0(0000) GS:ffff81010eb309c0(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000018 CR3: 00000003ae6bf000 CR4: 00000000000006e0


This is called from mirror_map (it's a small helper function which gets
inlined):

1544 static int mirror_map(struct dm_target *ti, struct bio *bio,
1545                       union map_info *map_context)
1546 {
1547         int r, rw = bio_rw(bio);
1548         struct mirror *m;
1549         struct mirror_set *ms = ti->private;
1550         struct bio_map_info *bmi = NULL;
1551         struct dm_bio_details *bd = NULL;
1552 
[...]
1577                 m = choose_mirror(ms);
1578                 if (likely(m)) {
1579                         bmi = mempool_alloc(bio_map_info_pool,
GFP_NOIO);
1580 
1581                         if (likely(bmi)) {
1582                                 /* without this, a read is not
retryable */
1583                                 bd = &bmi->bmi_bd;
1584                                 dm_bio_record(bd, bio);
1585                                 map_context->ptr = bmi;
1586                                 bmi->bmi_m = m;
1587                         } else {
1588                                 /* we could fail now, but we can at
least  **
1589                                 ** give it a shot.  The bd is only
used to **
1590                                 ** retry in the event of a failure
anyway. **
1591                                 ** If we fail, we can fail the I/O
then.   */
1592                                 map_context->ptr = NULL;
1593                         }
1594 
1595                         map_bio(m, bio); <---

Looking at the struct mirror that contains the NULL dev field:

 R10: 0000000122e46000 R11: 0000000000000024 R12: ffff81042c609b78

crash> struct mirror ffff81042c609b78
struct mirror {
  error_count = {
    counter = 0
  },
  ms = 0x0,
  dev = 0x0,
  offset = 18446604440274649088
}

The offset appears to be a pointer to an in-use object in the size-16384
slab cache:

crash> printf "%p\n", 18446604440274649088
0xffff81010eb6c000

crash> kmem  ffff81010eb6c000
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS 
SSIZE
ffff81042fff74c0 size-16384             16384         26        26     26 
  16k
SLAB              MEMORY            TOTAL  ALLOCATED  FREE
ffff81010eb20ec0  ffff81010eb6c000      1          1     0
FREE / [ALLOCATED]
  [ffff81010eb6c000]

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
ffff810103b37fa0 10eb6c000                0        0  1 108100000000080



This event sent from IssueTracker by breeves 
 issue 213254

Comment 8 Issue Tracker 2008-11-25 23:44:05 UTC

I don't know what the object from the 16kb slab cache is. It mostly
contains what look like empty struct list_head objects (two pointers
adjacent to one another, both pointing at the first one):

ffff81010eb6c000:  0000000000000001 0000000000000000   ................
ffff81010eb6c010:  000000010c9551d9*ffff81010eb6c018*  .Q..............
ffff81010eb6c020: *ffff81010eb6c018*ffff81010eb6c028*  ........(.......
ffff81010eb6c030: *ffff81010eb6c028*ffff81010eb6c038*  (.......8.......
ffff81010eb6c040: *ffff81010eb6c038*ffff81010eb6c048*  8.......H.......
[...]

Apart from the first three words, the whole of the structure continues
like this.

Since the error_count field of the mirror was 0, it didn't fail the checks
in choose_mirror and was selected for use:

 813         /*
 814          * Perform MIN_READS on each working mirror then
 815          * advance to the next one.  start_mirror stores
 816          * the first we tried, so we know when we're done.
 817          */
 818         do {
 819                 if
(likely(!atomic_read(&ms->read_mirror->error_count)) &&
 820                     !atomic_dec_and_test(&ms->read_count))
 821                         goto use_mirror;

Need to find the struct mirror_set that this corresponds to.a


This event sent from IssueTracker by breeves 
 issue 213254

Comment 12 Bryn M. Reeves 2008-12-17 20:50:50 UTC

[ from bt -f of dm_request's stack frame ]

request_queue -> ffff810134301928
mapped_device -> ffff81042f931a00

crash> mapped_device.map 0xffff81042f931a00
  map = 0xffff81042610b000

crash> dm_table 0xffff81042610b000
struct dm_table {
  md = 0xffff81042f931a00, 
  holders = {
    counter = 2
  }, 
  depth = 1, 
  counts = {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
  index = {0xffffc20010c15000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  num_targets = 1, 
  num_allocated = 16, 
  highs = 0xffffc20010c15000, 
  targets = 0xffffc20010c15080, 
  mode = 3, 
  devices = {
    next = 0xffff810205725340, 
    prev = 0xffff81019ebf01c0
  }, 
  limits = {
    max_sectors = 1024, 
    max_phys_segments = 128, 
    max_hw_segments = 255, 
    hardsect_size = 512, 
    max_segment_size = 65536, 
    seg_boundary_mask = 4294967295, 
    no_cluster = 0 '\0'
  }, 
  event_fn = 0xffffffff88246605 <event_callback>, 
  event_context = 0xffff81042f931a00
}

crash> dm_target 0xffffc20010c15080
struct dm_target {
  table = 0xffff81042610b000, 
  type = 0xffff81010eb50480, 
  begin = 0, 
  len = 461373440, 
  split_io = 1024, 
  limits = {
    max_sectors = 1024, 
    max_phys_segments = 128, 
    max_hw_segments = 255, 
    hardsect_size = 512, 
    max_segment_size = 65536, 
    seg_boundary_mask = 4294967295, 
    no_cluster = 0 '\0'
  }, 
  private = 0xffff81034b384e00, 
  error = 0xffffffff8824ca79 "Unknown error"
}

crash> mirror_set 0xffff81034b384e00
struct mirror_set {
  ti = 0xffffc20010c15080,
  list = {
    next = 0x0,
    prev = 0x0
  },
  rh = {
    ms = 0xffff81034b384e00,
    region_size = 1024,
    region_shift = 10,
    log = 0xffff8102b3f3dfe0,
    hash_lock = {
      raw_lock = {
        lock = 16777216
      }
    },
    region_pool = 0xffff8101342f4440,
    mask = 4095,
    nr_buckets = 4096,
    buckets = 0xffffc20010c44000,
    region_lock = {
      raw_lock = {
        slock = 1
      }
    },
    recovery_in_flight = {
      counter = 0
    },
    recovery_count = {
      count = {
        counter = 1
      },
      sleepers = 0,
      wait = {
        lock = {
          raw_lock = {
            slock = 1
          }
        },
        task_list = {
          next = 0xffff81034b384e68,
          prev = 0xffff81034b384e68
        }
      }
    },
    clean_regions = {
      next = 0xffff81034b384e78,
      prev = 0xffff81034b384e78
    },
    quiesced_regions = {
      next = 0xffff81034b384e88,
      prev = 0xffff81034b384e88
    },
    recovered_regions = {
      next = 0xffff81034b384e98,
      prev = 0xffff81034b384e98
    },
    failed_recovered_regions = {
      next = 0xffff81034b384ea8,
      prev = 0xffff81034b384ea8
    }
  },
  kcopyd_client = 0xffff8101343490c0,
  lock = {
    raw_lock = {
      slock = 1
    }
  },
  reads = {
    head = 0x0,
    tail = 0x0
  },
  writes = {
    head = 0x0,
    tail = 0x0
  },
  failures = {
    head = 0x0,
    tail = 0x0
  },
  io_client = 0xffff8102b3f3d280,
  nr_regions = 450560,
  in_sync = 1,
  log_failure = 0,
  suspend = {
    counter = 0
  },
  default_mirror = 0xffff81034b384f98,
  nr_mirrors = 2,
  read_count = {
    counter = 124
  },
  read_mirror = 0xffff81034b384fb8,
  kmirrord_wq = 0xffff8102b3f3df00,
  kmirrord_work = {
    pending = 0,
    entry = {
      next = 0xffff81034b384f40,
      prev = 0xffff81034b384f40
    },
    func = 0xffffffff88261fff <do_work>,
    data = 0xffff81034b384e00,
    wq_data = 0xffff8101343481c0,
    timer = {
      entry = {
        next = 0x0,
        prev = 0x0
      },
      expires = 0,
      function = 0,
      data = 0,
      base = 0xffff81042fe2c000
    }
  },
  mirror = 0xffff81034b384f98
}

So how did choose_mirror return this:

crash> struct mirror ffff81042c609b78
struct mirror {
  error_count = {
    counter = 0
  }, 
  ms = 0x0, 
  dev = 0x0, 
  offset = 0
}

Comment 13 Bryn M. Reeves 2008-12-18 12:23:23 UTC

crash> struct mirror ffff81034b384f98
struct mirror {
  error_count = {
    counter = 0
  }, 
  ms = 0xffff81034b384e00, 
  dev = 0xffff810205725fc0, 
  offset = 0
}
crash> struct mirror ffff81034b384fb8
struct mirror {
  error_count = {
    counter = 0
  }, 
  ms = 0xffff81034b384e00, 
  dev = 0xffff810205725340, 
  offset = 0
}

Comment 29 Don Zickus 2009-01-27 16:02:18 UTC

in kernel-2.6.18-129.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 30 RHEL Program Management 2009-02-16 15:36:12 UTC

Updating PM score.

Comment 35 errata-xmlrpc 2009-09-02 08:45:30 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html