Bug 460845

Summary: Nested LVM can cause deadlock due to kcopyd
Product: Red Hat Enterprise Linux 5 Reporter: Mikuláš Patočka <mpatocka>
Component: kernelAssignee: Mikuláš Patočka <mpatocka>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: agk, christophe.varoqui, dwysocha, edamato, egoggin, heinzm, jbrassow, junichi.nomura, kueda, lmb, mbroz, prockai, tranlan
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:17:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch 1/2
none
patch 2/2 none

Description Mikuláš Patočka 2008-09-01 21:41:06 UTC
When using nested LVMs (that is, creating higher LVM layer from physical volumes that are logical volumes on another lower-layer LVM), there exists a deadlock possibility due to shared kcopyd. Any dm target using kcopyd is suspicible to this (that is snapshots and mirror). When two of these targets are stacked on the top of each other, deadlock can happen because they use the same kcopyd thread.

This is the possible configuration:
(note that this configuration is very unusual, but Red Hat supports it, so the bug should be fixed --- similar deadlock scenario exists if the user uses snapshot instead of one of the mirrors)

Configuration:
--------------
A (dm-raid1)
B (dm-raid1)
C (any device)

B is a part of the device A.
C is a part of the device B.
There may be other devices in the mirrors, but they are not relevant to this
deadlock.

Deadlock scenario:
------------------
Both mirror devices A and B are running a recovery.

B's mempool "md->tio_pool" is empty. All the IO requests allocated from this
pool belong to the region that is being synchronized, so they are held on
ms->writes and ms->reads queues.

A makes a kcopyd request to B during A's recovery.
Stacktrace of A's "kmirrord" thread is:
do_mirror
_do_mirror
do_recovery
recover
kcopyd_copy

kcopyd receives the A's request and starts processing it:
do_work
process_jobs(&_io_jobs, run_io_job)
run_io_job
dm_io
async_io
dispatch_io
do_region
submit_bio
generic_make_request
... submit BIO calls the B's request function
q->make_request_fn
dm_request (on device B)
__split_bio
__clone_and_map
alloc_tio
--- alloc_tio waits, until some space is made in B's md->tio_pool

Meanwhile, the device B is doing its own recovery work (sending requests on
device C). B's "kmirrord" thread has this stacktrace:
do_mirror
_do_mirror
do_recovery
recover
kcopyd_copy --- however kcopyd is blocked elsewhere, so it doesn't process the
request immediatelly

The deadlock:
-------------
All B's requests are waiting for B's recovery of the region to complete.
The B's recovery is waiting for kcopyd.
kcopyd is waiting (on behalf of A's request) until some B's request finishes andmakes a room in B's md->tio_pool mempool.

A proposed fix:
---------------
Start kcopyd thread for each target device (each time some target calls
kcopyd_client_create), so that kcopyds for different devices will be
independent. So that it wouldn't happen that processing requests submitted by
device B isn't delayed until some other device submits a request.

Comment 1 RHEL Program Management 2008-09-01 21:42:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 2 Mikuláš Patočka 2008-09-01 21:44:46 UTC
Created attachment 315493 [details]
patch 1/2

First patch --- use per-client kcopyd thread.

Comment 3 Mikuláš Patočka 2008-09-01 21:45:47 UTC
Created attachment 315494 [details]
patch 2/2

Second patch --- use per-client mempool

Comment 4 Don Zickus 2008-09-11 19:44:17 UTC
in kernel-2.6.18-111.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 8 errata-xmlrpc 2009-01-20 20:17:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html