Bug 472558
Summary: | oops in mirror_map (dm-raid1.c) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Shane Bradley <sbradley> |
Component: | kernel | Assignee: | Heinz Mauelshagen <heinzm> |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | urgent | Docs Contact: | |
Priority: | medium | ||
Version: | 5.2 | CC: | agk, bmr, christophe.varoqui, coughlan, dejohnso, dmair, dwysocha, edamato, egoggin, emcnabb, heinzm, jbrassow, junichi.nomura, kueda, lmb, mbroz, mgahagan, pep, prockai, rwheeler, sghosh, tao, tranlan |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-02 08:45:30 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 481120 |
Description
Shane Bradley
2008-11-21 18:29:44 UTC
What filesystem is in use here? It looks like a problem in dm to me. Right - mirror_map is part of dm-raid1.c - this is a device-mapper bug, not NFS. This is a device-mapper kernel bug - the device-mapper component is for the user space bits (dmsetup, libdevmapper etc.). Re-assigning to kernel. Failing here in map_bio: 912 static void map_bio(struct mirror *m, struct bio *bio) 913 { *914 bio->bi_bdev = m->dev->bdev; 915 bio->bi_sector = map_sector(m, bio); 916 } The dev field of the mirror struct is NULL: # %r12 holds a pointer to m /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm-raid1.c: 914 0xffffffff88261e2d <mirror_map+255>: mov 0x10(%r12),%rax # load m->dev into %rax /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm-raid1.c: 915 0xffffffff88261e32 <mirror_map+260>: mov $0x1,%ebx # load 0x1 into %ebx /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm-raid1.c: 914 0xffffffff88261e37 <mirror_map+265>: mov 0x18(%rax),%rax # load dev->bdev into %rax Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: [<ffffffff88261e37>] :dm_mirror:mirror_map+0x109/0x152 Pid: 11418, comm: nfsd Not tainted 2.6.18-92.1.1.el5 #1 RIP: 0010:[<ffffffff88261e37>] [<ffffffff88261e37>] :dm_mirror:mirror_map+0x109/0x152 RSP: 0018:ffff81041c7217d0 EFLAGS: 00010282 *RAX: 0000000000000000*RBX: 0000000000000001 RCX: ffff81016ed76d40 RDX: ffff81016ed76d48 RSI: 0000000000011200 RDI: ffff81042fff8100 RBP: ffff8102309ef840 R08: ffffffff88266740 R09: 0000000013788748 R10: 0000000122e46000 R11: 0000000000000024 R12: ffff81042c609b78 R13: 0000000000000000 R14: ffff8102541c0be0 R15: 000000000fd00014 FS: 00002ad83c66f6e0(0000) GS:ffff81010eb309c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 00000003ae6bf000 CR4: 00000000000006e0 This is called from mirror_map (it's a small helper function which gets inlined): 1544 static int mirror_map(struct dm_target *ti, struct bio *bio, 1545 union map_info *map_context) 1546 { 1547 int r, rw = bio_rw(bio); 1548 struct mirror *m; 1549 struct mirror_set *ms = ti->private; 1550 struct bio_map_info *bmi = NULL; 1551 struct dm_bio_details *bd = NULL; 1552 [...] 1577 m = choose_mirror(ms); 1578 if (likely(m)) { 1579 bmi = mempool_alloc(bio_map_info_pool, GFP_NOIO); 1580 1581 if (likely(bmi)) { 1582 /* without this, a read is not retryable */ 1583 bd = &bmi->bmi_bd; 1584 dm_bio_record(bd, bio); 1585 map_context->ptr = bmi; 1586 bmi->bmi_m = m; 1587 } else { 1588 /* we could fail now, but we can at least ** 1589 ** give it a shot. The bd is only used to ** 1590 ** retry in the event of a failure anyway. ** 1591 ** If we fail, we can fail the I/O then. */ 1592 map_context->ptr = NULL; 1593 } 1594 1595 map_bio(m, bio); <--- Looking at the struct mirror that contains the NULL dev field: R10: 0000000122e46000 R11: 0000000000000024 R12: ffff81042c609b78 crash> struct mirror ffff81042c609b78 struct mirror { error_count = { counter = 0 }, ms = 0x0, dev = 0x0, offset = 18446604440274649088 } The offset appears to be a pointer to an in-use object in the size-16384 slab cache: crash> printf "%p\n", 18446604440274649088 0xffff81010eb6c000 crash> kmem ffff81010eb6c000 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff81042fff74c0 size-16384 16384 26 26 26 16k SLAB MEMORY TOTAL ALLOCATED FREE ffff81010eb20ec0 ffff81010eb6c000 1 1 0 FREE / [ALLOCATED] [ffff81010eb6c000] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffff810103b37fa0 10eb6c000 0 0 1 108100000000080 This event sent from IssueTracker by breeves issue 213254 I don't know what the object from the 16kb slab cache is. It mostly contains what look like empty struct list_head objects (two pointers adjacent to one another, both pointing at the first one): ffff81010eb6c000: 0000000000000001 0000000000000000 ................ ffff81010eb6c010: 000000010c9551d9*ffff81010eb6c018* .Q.............. ffff81010eb6c020: *ffff81010eb6c018*ffff81010eb6c028* ........(....... ffff81010eb6c030: *ffff81010eb6c028*ffff81010eb6c038* (.......8....... ffff81010eb6c040: *ffff81010eb6c038*ffff81010eb6c048* 8.......H....... [...] Apart from the first three words, the whole of the structure continues like this. Since the error_count field of the mirror was 0, it didn't fail the checks in choose_mirror and was selected for use: 813 /* 814 * Perform MIN_READS on each working mirror then 815 * advance to the next one. start_mirror stores 816 * the first we tried, so we know when we're done. 817 */ 818 do { 819 if (likely(!atomic_read(&ms->read_mirror->error_count)) && 820 !atomic_dec_and_test(&ms->read_count)) 821 goto use_mirror; Need to find the struct mirror_set that this corresponds to.a This event sent from IssueTracker by breeves issue 213254 [ from bt -f of dm_request's stack frame ] request_queue -> ffff810134301928 mapped_device -> ffff81042f931a00 crash> mapped_device.map 0xffff81042f931a00 map = 0xffff81042610b000 crash> dm_table 0xffff81042610b000 struct dm_table { md = 0xffff81042f931a00, holders = { counter = 2 }, depth = 1, counts = {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, index = {0xffffc20010c15000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, num_targets = 1, num_allocated = 16, highs = 0xffffc20010c15000, targets = 0xffffc20010c15080, mode = 3, devices = { next = 0xffff810205725340, prev = 0xffff81019ebf01c0 }, limits = { max_sectors = 1024, max_phys_segments = 128, max_hw_segments = 255, hardsect_size = 512, max_segment_size = 65536, seg_boundary_mask = 4294967295, no_cluster = 0 '\0' }, event_fn = 0xffffffff88246605 <event_callback>, event_context = 0xffff81042f931a00 } crash> dm_target 0xffffc20010c15080 struct dm_target { table = 0xffff81042610b000, type = 0xffff81010eb50480, begin = 0, len = 461373440, split_io = 1024, limits = { max_sectors = 1024, max_phys_segments = 128, max_hw_segments = 255, hardsect_size = 512, max_segment_size = 65536, seg_boundary_mask = 4294967295, no_cluster = 0 '\0' }, private = 0xffff81034b384e00, error = 0xffffffff8824ca79 "Unknown error" } crash> mirror_set 0xffff81034b384e00 struct mirror_set { ti = 0xffffc20010c15080, list = { next = 0x0, prev = 0x0 }, rh = { ms = 0xffff81034b384e00, region_size = 1024, region_shift = 10, log = 0xffff8102b3f3dfe0, hash_lock = { raw_lock = { lock = 16777216 } }, region_pool = 0xffff8101342f4440, mask = 4095, nr_buckets = 4096, buckets = 0xffffc20010c44000, region_lock = { raw_lock = { slock = 1 } }, recovery_in_flight = { counter = 0 }, recovery_count = { count = { counter = 1 }, sleepers = 0, wait = { lock = { raw_lock = { slock = 1 } }, task_list = { next = 0xffff81034b384e68, prev = 0xffff81034b384e68 } } }, clean_regions = { next = 0xffff81034b384e78, prev = 0xffff81034b384e78 }, quiesced_regions = { next = 0xffff81034b384e88, prev = 0xffff81034b384e88 }, recovered_regions = { next = 0xffff81034b384e98, prev = 0xffff81034b384e98 }, failed_recovered_regions = { next = 0xffff81034b384ea8, prev = 0xffff81034b384ea8 } }, kcopyd_client = 0xffff8101343490c0, lock = { raw_lock = { slock = 1 } }, reads = { head = 0x0, tail = 0x0 }, writes = { head = 0x0, tail = 0x0 }, failures = { head = 0x0, tail = 0x0 }, io_client = 0xffff8102b3f3d280, nr_regions = 450560, in_sync = 1, log_failure = 0, suspend = { counter = 0 }, default_mirror = 0xffff81034b384f98, nr_mirrors = 2, read_count = { counter = 124 }, read_mirror = 0xffff81034b384fb8, kmirrord_wq = 0xffff8102b3f3df00, kmirrord_work = { pending = 0, entry = { next = 0xffff81034b384f40, prev = 0xffff81034b384f40 }, func = 0xffffffff88261fff <do_work>, data = 0xffff81034b384e00, wq_data = 0xffff8101343481c0, timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, function = 0, data = 0, base = 0xffff81042fe2c000 } }, mirror = 0xffff81034b384f98 } So how did choose_mirror return this: crash> struct mirror ffff81042c609b78 struct mirror { error_count = { counter = 0 }, ms = 0x0, dev = 0x0, offset = 0 } crash> struct mirror ffff81034b384f98 struct mirror { error_count = { counter = 0 }, ms = 0xffff81034b384e00, dev = 0xffff810205725fc0, offset = 0 } crash> struct mirror ffff81034b384fb8 struct mirror { error_count = { counter = 0 }, ms = 0xffff81034b384e00, dev = 0xffff810205725340, offset = 0 } in kernel-2.6.18-129.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Updating PM score. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |