Created attachment 331076 [details] an upstream patch as a preview. I'll backport it to RHEL 4 if this bug is granted. Various crashes in pending_complete() / remove_exception() were reported. Now I found one upstream user who has a reproducible testcase for his crashes and I sent him some debug code and pinpointed the bug. The bug applies to RHEL 4/5 as well, so I'm creating this bug. The impact of the bug is serious: random crashes when snapshots with chunk size >= 64k are used. This bug is really hapenning in the wild (there are several reports of it). I'm suggesting to backport the bug to RHEL 4.8. See these. Some of these were mistakenly took as a consequences of other memory corruption bugs: http://bugzilla.kernel.org/show_bug.cgi?id=11636 https://bugzilla.redhat.com/show_bug.cgi?id=465825 https://www.redhat.com/archives/dm-devel/2008-October/msg00084.html https://www.redhat.com/archives/dm-devel/2008-November/msg00138.html https://www.redhat.com/archives/dm-devel/2009-February/msg00048.html Description of the bug: Under specific circumstances, kcopyd callback could be called from the thread that called dm_kcopyd_copy not from kcopyd workqueue. dm_kcopyd_copy -> split_job -> segment_complete -> job->fn() This code path is taken if thread calling dm_kcopyd_copy is delayed due to scheduling inside split_job/segment_complete and the subjobs complete before the loop in split_job completes. Snapshots depend on the fact that callbacks are called from the singlethreaded kcopyd workqueue and expect that there is no racing between individual callbacks. The racing between callbacks can lead to corruption of exception store and it can also cause that exception store callbacks are called twice for the same exception --- a likely reason for crashes inside pending_complete() / remove_exception(). When I reviewed kcopyd further, I found four total problems: 1. job->fn being called from the thread that submitted the job (see above). 2. job->fn(read_err, write_err, job->context); in segment_complete reports the error of the last subjob, not the union of all errors. 3. This comment is bogus (the jobs are not cancelable), plus the code is prone to deadlock because the callback may allocate another job from the same mempool. /* * To avoid a race we must keep the job around * until after the notify function has completed. * Otherwise the client may try and stop the job * after we've completed. */ job->fn(read_err, write_err, job->context); mempool_free(job, job->kc->job_pool); 4. Master jobs and subjobs are allocated from the same mempool. Completion and freeing of a master job depends on successful allocation of all subjobs -> deadlock. I made a patch for upstream that should fix problems 1, 2, 3. Problem 4 is more complicated to fix (and less likely to happen --- there are no reports of it), so I wouldn't patch it to RHEL4 now. I am suggesting to backport the patch to RHEL 4.8.
... In that "msg00084.html" reference I copied a wrong link. It should be https://www.redhat.com/archives/dm-devel/2008-October/msg00085.html
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Status update on this bug: The user still claims that there are some crashes, so there may be more bugs. I'm sending debug patches to him to try to pinpoint it. During source code review I already found a second race condition (which wouldn't cause crash, but could corrupt data). And there may be possibly even third bug (unknown yet). I'll send complete patches when the analysis will be finished. If you need the patches fast because of some deadline, contact me.
There were two bugs found so far (but the user's problem is still under analysis) --- I am here posting both patches. The explanation is in the patches.
Created attachment 332598 [details] A backport of first bugfix to RHEL 4 This fixes the first problem for RHEL 4. kcopyd may call the callback directly (not from its thread) and this may result in racing with another callback called from the thread.
Created attachment 332599 [details] Fix for the second bug Fix for the second problem: when allocating a new exception structure, dm-snapshot driver drops the lock. When it regains the lock, it checks if the structure was meanwhile allocated by another thread and placed into pending hash. But it doesn't check if the structure was allocated by another thread and already finished. This may lead to a situation where multiple exception records exists for the same chunk.
Committed in 82.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html