When using nested LVMs (that is, creating higher LVM layer from physical volumes that are logical volumes on another lower-layer LVM), there exists a deadlock possibility due to shared kcopyd. Any dm target using kcopyd is suspicible to this (that is snapshots and mirror). When two of these targets are stacked on the top of each other, deadlock can happen because they use the same kcopyd thread. This is the possible configuration: (note that this configuration is very unusual, but Red Hat supports it, so the bug should be fixed --- similar deadlock scenario exists if the user uses snapshot instead of one of the mirrors) Configuration: -------------- A (dm-raid1) B (dm-raid1) C (any device) B is a part of the device A. C is a part of the device B. There may be other devices in the mirrors, but they are not relevant to this deadlock. Deadlock scenario: ------------------ Both mirror devices A and B are running a recovery. B's mempool "md->tio_pool" is empty. All the IO requests allocated from this pool belong to the region that is being synchronized, so they are held on ms->writes and ms->reads queues. A makes a kcopyd request to B during A's recovery. Stacktrace of A's "kmirrord" thread is: do_mirror _do_mirror do_recovery recover kcopyd_copy kcopyd receives the A's request and starts processing it: do_work process_jobs(&_io_jobs, run_io_job) run_io_job dm_io async_io dispatch_io do_region submit_bio generic_make_request ... submit BIO calls the B's request function q->make_request_fn dm_request (on device B) __split_bio __clone_and_map alloc_tio --- alloc_tio waits, until some space is made in B's md->tio_pool Meanwhile, the device B is doing its own recovery work (sending requests on device C). B's "kmirrord" thread has this stacktrace: do_mirror _do_mirror do_recovery recover kcopyd_copy --- however kcopyd is blocked elsewhere, so it doesn't process the request immediatelly The deadlock: ------------- All B's requests are waiting for B's recovery of the region to complete. The B's recovery is waiting for kcopyd. kcopyd is waiting (on behalf of A's request) until some B's request finishes andmakes a room in B's md->tio_pool mempool. A proposed fix: --------------- Start kcopyd thread for each target device (each time some target calls kcopyd_client_create), so that kcopyds for different devices will be independent. So that it wouldn't happen that processing requests submitted by device B isn't delayed until some other device submits a request.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 315493 [details] patch 1/2 First patch --- use per-client kcopyd thread.
Created attachment 315494 [details] patch 2/2 Second patch --- use per-client mempool
in kernel-2.6.18-111.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html