Bug 555171
| Summary: | dm-raid1: kernel panic when bio on recovery failed region is released | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Takahiro Yasui <tyasui> | ||||
| Component: | kernel | Assignee: | Takahiro Yasui <tyasui> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.5 | CC: | agk, christophe.varoqui, coughlan, dwysocha, edamato, egoggin, heinzm, jbrassow, junichi.nomura, kueda, lmb, lwang, masaki.kimura.kz, mbroz, mpatocka, noboru.obata.ar, prockai, qcai, saguchi, takahiro.yasui.mp, tranlan | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 557934 (view as bug list) | Environment: | |||||
| Last Closed: | 2010-03-30 07:28:53 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 557934 | ||||||
| Attachments: |
|
||||||
|
Description
Takahiro Yasui
2010-01-13 21:07:37 UTC
kernel panic happened at 0xf8d8c207.
crash> dis mirror_end_io
...
0xf8d8c1ee <mirror_end_io+0x3c>: call 0xf8d8a000 <__rh_lookup>
0xf8d8c1f3 <mirror_end_io+0x41>: mov %eax,%ebx
0xf8d8c1f5 <mirror_end_io+0x43>: lock incl 0x1c(%ebp)
0xf8d8c1f9 <mirror_end_io+0x47>: lea 0x30(%ebp),%esi
0xf8d8c1fc <mirror_end_io+0x4a>: mov %esi,%eax
0xf8d8c1fe <mirror_end_io+0x4c>: call 0xc061d7e8 <_spin_lock_irqsave>
0xf8d8c203 <mirror_end_io+0x51>: mov %eax,0x8(%esp)
0xf8d8c207 <mirror_end_io+0x55>: lock decl 0x20(%ebx) *** PANIC ***
0xf8d8c20b <mirror_end_io+0x59>: sete %al
0xf8d8c20e <mirror_end_io+0x5c>: test %al,%al
This means that kernel panic happened at the following line.
static void rh_dec(struct region_hash *rh, region_t region)
{
...
read_lock(&rh->hash_lock);
reg = __rh_lookup(rh, region);
read_unlock(&rh->hash_lock);
spin_lock_irqsave(&rh->region_lock, flags);
if (atomic_dec_and_test(®->pending)) { *** PANIC ***
By printk debug, I confirmed that __rh_lookup() returned NULL.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I think the reason is this:
When recovery fails, we mark the region as RH_NOSYNC and add it to failed_recovered_regions list (dm-raid1.c:rh_recovery_end)
RH_NOSYNC allows further writes to be processed and they increment region->pending count (see do_writes ... case RH_NOSYNC: this_list = &nosync; ... rh_inc_pending(&ms->rh, &nosync);)
Regions on failed_recovered_regions list are unconditionally freed regardless of possible pending count. See dm-raid1.c:rh_update_states: list_splice(&rh->failed_recovered_regions, &failed_recovered); ... list_for_each_entry_safe (reg, next, &failed_recovered, list) { complete_resync_work(reg, 0); mempool_free(reg, rh->region_pool); } --- here the region is freed without checking if there are pending I/Os on it.
Note that rh_update_states also frees "clean" and "recovered" regions unconditionally, but there should be no i/os on them. On "clean", you can't have i/o by definition ("clean" are regions without i/o, i/o turns "clean" region into "dirty"). On "recovered" you can't have i/o because it is in RH_RECOVERING state and do_writes doesn't pass i/os to them.
(In reply to comment #4) > I think the reason is this: Yes, it is the same as the reason I described in the patch header. --- When recovery process of a region failed, dm_rh_recovery_end() function changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC. When recovery_complete() is executed between dm_rh_update_states() and dm_writes() in do_mirror(), bios are processed with the region state, DM_RH_NOSYNC. However, the region data is freed without checking its pending count when dm_rh_update_states() is called next time. When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec() returns NULL even though a valid return value are expected. --- (In reply to comment #5) Sorry, there are some typos. In RHEL5, function names are different. <correction> dm_writes() -> do_writes() dm_rh_update_states() -> rh_update_states() dm_rh_dec() -> rh_dec() I verified this fix on 2.6.18-187.el5. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |