555171 – dm-raid1: kernel panic when bio on recovery failed region is released

Bug 555171 - dm-raid1: kernel panic when bio on recovery failed region is released

Summary: dm-raid1: kernel panic when bio on recovery failed region is released

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Takahiro Yasui
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	557934
TreeView+	depends on / blocked

Reported:	2010-01-13 21:07 UTC by Takahiro Yasui
Modified:	2014-07-25 05:08 UTC (History)
CC List:	21 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	557934 (view as bug list)
Environment:
Last Closed:	2010-03-30 07:28:53 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch for 2.6.18-182.el5 kernel (2.78 KB, text/plain) 2010-01-13 21:07 UTC, Takahiro Yasui	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0178	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update	2010-03-29 12:18:21 UTC

Description Takahiro Yasui 2010-01-13 21:07:37 UTC

Created attachment 383559 [details]
Patch for 2.6.18-182.el5 kernel

Description of problem:
  dm-raid1: kernel panic when bio on recovery failed region is released

Version-Release number of selected component (if applicable):
  2.6.18-182.el5

How reproducible:
  In the following steps, dmeventd is killed and suspend/resume are done
  repeatedly to reproduce this issue easily, however, this issue could happen
  without killing dmeventd.

Steps to Reproduce:
  1. create two way mirror
    # dmsetup ls
    vg00-lv00_mimage_1	(253, 2)
    vg00-lv00_mimage_0	(253, 1)
    vg00-lv00_mlog	(253, 0)
    vg00-lv00	(253, 3)

  2. disable one of the leg
    # echo offline > /sys/block/<dev>/device/state

  3. kill dmeventd
    # ps	-ef | grep dmeventd
    root      3378     1  0 11:49 ?        00:00:00 [dmeventd]
    # kill -9 3378

  4. Write I/O to region #0
    # dd if=/dev/zero of=/dev/vg00/lv00 bs=4096 count=1

  5. repeat suspend/resume
    # dmsetup suspend --noflush vg00-lv00
    # dmsetup resume vg00-lv00
    ...repeat suspend/resume...

  6. kernel panic happens

Actual results:
  Kernel panics with the following oops message.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000020
...
 printing eip:
f8d8c207
*pde = cc3c7067
Oops: 0002 [#1]
SMP
...
CPU:    1
EIP:    0060:[<f8d8c207>]    Tainted: GF     VLI
EFLAGS: 00010046   (2.6.18-182.el5.dm #1)
EIP is at mirror_end_io+0x55/0x1fe [dm_mirror]
eax: 00000216   ebx: 00000000   ecx: 00000000   edx: 00000216
esi: f7b81230   edi: f7b8120c   ebp: f7b81200   esp: eb0c2d0c
ds: 007b   es: 007b   ss: 0068
Process dmsetup (pid: 13525, ti=eb0c2000 task=f5a6baa0 task.ti=eb0c2000)
...
Call Trace:
 [<f88c4564>] clone_endio+0x63/0xc4 [dm_mod]
 [<f8d8c1b2>] mirror_end_io+0x0/0x1fe [dm_mirror]
 [<f88c4501>] clone_endio+0x0/0xc4 [dm_mod]
 [<c047ab89>] bio_endio+0x50/0x55
 [<f8d8a4bd>] mirror_presuspend+0xe8/0xf4 [dm_mirror]
 [<c045d211>] __alloc_pages+0x69/0x2cf
 [<f88c59e4>] suspend_targets+0x2a/0x37 [dm_mod]
 [<f88c54cd>] dm_suspend+0x70/0x263 [dm_mod]
 [<c041f775>] default_wake_function+0x0/0xc
 [<f88c7e9a>] dev_suspend+0x50/0x152 [dm_mod]
 [<f88c879f>] ctl_ioctl+0x1f3/0x238 [dm_mod]
 [<f88c7e4a>] dev_suspend+0x0/0x152 [dm_mod]
 [<c0485edc>] do_ioctl+0x47/0x5d
 [<c0486445>] vfs_ioctl+0x47b/0x4d3
 [<c0466c75>] unmap_region+0xe1/0xf0
 [<c061e8fe>] do_page_fault+0x23a/0x52d
 [<c061e968>] do_page_fault+0x2a4/0x52d
 [<c04864e5>] sys_ioctl+0x48/0x5f
 [<c0404f17>] syscall_call+0x7/0xb

Expected results:
  No kernel panic happens.

Additional info:
  This issue is reported on dm-devel.
  https://www.redhat.com/archives/dm-devel/2010-January/msg00047.html

Comment 1 Takahiro Yasui 2010-01-13 22:44:54 UTC

kernel panic happened at 0xf8d8c207.

crash> dis mirror_end_io
...
0xf8d8c1ee <mirror_end_io+0x3c>:        call   0xf8d8a000 <__rh_lookup>
0xf8d8c1f3 <mirror_end_io+0x41>:        mov    %eax,%ebx
0xf8d8c1f5 <mirror_end_io+0x43>:        lock incl 0x1c(%ebp)
0xf8d8c1f9 <mirror_end_io+0x47>:        lea    0x30(%ebp),%esi
0xf8d8c1fc <mirror_end_io+0x4a>:        mov    %esi,%eax
0xf8d8c1fe <mirror_end_io+0x4c>:        call   0xc061d7e8 <_spin_lock_irqsave>
0xf8d8c203 <mirror_end_io+0x51>:        mov    %eax,0x8(%esp)
0xf8d8c207 <mirror_end_io+0x55>:        lock decl 0x20(%ebx)	*** PANIC ***
0xf8d8c20b <mirror_end_io+0x59>:        sete   %al
0xf8d8c20e <mirror_end_io+0x5c>:        test   %al,%al

This means that kernel panic happened at the following line.

static void rh_dec(struct region_hash *rh, region_t region)
{
        ...
        read_lock(&rh->hash_lock);
        reg = __rh_lookup(rh, region);
        read_unlock(&rh->hash_lock);

        spin_lock_irqsave(&rh->region_lock, flags);
        if (atomic_dec_and_test(&reg->pending)) {	*** PANIC ***

By printk debug, I confirmed that __rh_lookup() returned NULL.

Comment 3 RHEL Program Management 2010-01-14 17:21:57 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Mikuláš Patočka 2010-01-14 17:41:59 UTC

I think the reason is this:

When recovery fails, we mark the region as RH_NOSYNC and add it to failed_recovered_regions list (dm-raid1.c:rh_recovery_end)

RH_NOSYNC allows further writes to be processed and they increment region->pending count (see do_writes ... case RH_NOSYNC: this_list = &nosync; ... rh_inc_pending(&ms->rh, &nosync);)

Regions on failed_recovered_regions list are unconditionally freed regardless of possible pending count. See dm-raid1.c:rh_update_states: list_splice(&rh->failed_recovered_regions, &failed_recovered); ... list_for_each_entry_safe (reg, next, &failed_recovered, list) { complete_resync_work(reg, 0); mempool_free(reg, rh->region_pool); } --- here the region is freed without checking if there are pending I/Os on it.

Note that rh_update_states also frees "clean" and "recovered" regions unconditionally, but there should be no i/os on them. On "clean", you can't have i/o by definition ("clean" are regions without i/o, i/o turns "clean" region into "dirty"). On "recovered" you can't have i/o because it is in RH_RECOVERING state and do_writes doesn't pass i/os to them.

Comment 5 Takahiro Yasui 2010-01-14 17:52:13 UTC

(In reply to comment #4)
> I think the reason is this:

Yes, it is the same as the reason I described in the patch header.

---
When recovery process of a region failed, dm_rh_recovery_end() function
changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC.
When recovery_complete() is executed between dm_rh_update_states() and
dm_writes() in do_mirror(), bios are processed with the region state,
DM_RH_NOSYNC. However, the region data is freed without checking its
pending count when dm_rh_update_states() is called next time.

When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec()
returns NULL even though a valid return value are expected.
---

Comment 6 Takahiro Yasui 2010-01-14 17:56:21 UTC

(In reply to comment #5)
Sorry, there are some typos. In RHEL5, function names are different.

<correction>
  dm_writes() -> do_writes()
  dm_rh_update_states() -> rh_update_states()
  dm_rh_dec() -> rh_dec()

Comment 10 Takahiro Yasui 2010-02-26 00:47:27 UTC

I verified this fix on 2.6.18-187.el5.

Comment 14 errata-xmlrpc 2010-03-30 07:28:53 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Note You need to log in before you can comment on or make changes to this bug.

agk
christophe.varoqui
coughlan
dwysocha
edamato
egoggin
heinzm
jbrassow
junichi.nomura
kueda
lmb
lwang
masaki.kimura.kz
mbroz
mpatocka
noboru.obata.ar
prockai
qcai
saguchi
takahiro.yasui.mp
tranlan