Red Hat Bugzilla – Bug 460845
Nested LVM can cause deadlock due to kcopyd
Last modified: 2009-01-20 15:17:22 EST
When using nested LVMs (that is, creating higher LVM layer from physical volumes that are logical volumes on another lower-layer LVM), there exists a deadlock possibility due to shared kcopyd. Any dm target using kcopyd is suspicible to this (that is snapshots and mirror). When two of these targets are stacked on the top of each other, deadlock can happen because they use the same kcopyd thread.
This is the possible configuration:
(note that this configuration is very unusual, but Red Hat supports it, so the bug should be fixed --- similar deadlock scenario exists if the user uses snapshot instead of one of the mirrors)
C (any device)
B is a part of the device A.
C is a part of the device B.
There may be other devices in the mirrors, but they are not relevant to this
Both mirror devices A and B are running a recovery.
B's mempool "md->tio_pool" is empty. All the IO requests allocated from this
pool belong to the region that is being synchronized, so they are held on
ms->writes and ms->reads queues.
A makes a kcopyd request to B during A's recovery.
Stacktrace of A's "kmirrord" thread is:
kcopyd receives the A's request and starts processing it:
... submit BIO calls the B's request function
dm_request (on device B)
--- alloc_tio waits, until some space is made in B's md->tio_pool
Meanwhile, the device B is doing its own recovery work (sending requests on
device C). B's "kmirrord" thread has this stacktrace:
kcopyd_copy --- however kcopyd is blocked elsewhere, so it doesn't process the
All B's requests are waiting for B's recovery of the region to complete.
The B's recovery is waiting for kcopyd.
kcopyd is waiting (on behalf of A's request) until some B's request finishes andmakes a room in B's md->tio_pool mempool.
A proposed fix:
Start kcopyd thread for each target device (each time some target calls
kcopyd_client_create), so that kcopyds for different devices will be
independent. So that it wouldn't happen that processing requests submitted by
device B isn't delayed until some other device submits a request.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Created attachment 315493 [details]
First patch --- use per-client kcopyd thread.
Created attachment 315494 [details]
Second patch --- use per-client mempool
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.