Red Hat Bugzilla – Bug 484319
Random crashing in dm snapshots because of a race condition
Last modified: 2009-05-18 15:21:37 EDT
Created attachment 331076 [details]
an upstream patch as a preview. I'll backport it to RHEL 4 if this bug is granted.
Various crashes in pending_complete() / remove_exception() were reported.
Now I found one upstream user who has a reproducible testcase for his crashes and I sent him some debug code and pinpointed the bug.
The bug applies to RHEL 4/5 as well, so I'm creating this bug.
The impact of the bug is serious: random crashes when snapshots with chunk size >= 64k are used. This bug is really hapenning in the wild (there are several reports of it). I'm suggesting to backport the bug to RHEL 4.8.
See these. Some of these were mistakenly took as a consequences of other memory corruption bugs:
Description of the bug:
Under specific circumstances, kcopyd callback could be called from
the thread that called dm_kcopyd_copy not from kcopyd workqueue.
dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()
This code path is taken if thread calling dm_kcopyd_copy is delayed due to
scheduling inside split_job/segment_complete and the subjobs complete before
the loop in split_job completes.
Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also cause that exception store callbacks are called twice
for the same exception --- a likely reason for crashes inside
pending_complete() / remove_exception().
When I reviewed kcopyd further, I found four total problems:
1. job->fn being called from the thread that submitted the job (see above).
2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.
3. This comment is bogus (the jobs are not cancelable), plus the code
is prone to deadlock because the callback may allocate another job from
the same mempool.
* To avoid a race we must keep the job around
* until after the notify function has completed.
* Otherwise the client may try and stop the job
* after we've completed.
job->fn(read_err, write_err, job->context);
4. Master jobs and subjobs are allocated from the same mempool.
Completion and freeing of a master job depends on successful allocation of
all subjobs -> deadlock.
I made a patch for upstream that should fix problems 1, 2, 3. Problem 4 is more complicated to fix (and less likely to happen --- there are no reports of it), so I wouldn't patch it to RHEL4 now.
I am suggesting to backport the patch to RHEL 4.8.
... In that "msg00084.html" reference I copied a wrong link. It should be
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Status update on this bug:
The user still claims that there are some crashes, so there may be more bugs. I'm sending debug patches to him to try to pinpoint it.
During source code review I already found a second race condition (which wouldn't cause crash, but could corrupt data). And there may be possibly even third bug (unknown yet).
I'll send complete patches when the analysis will be finished. If you need the patches fast because of some deadline, contact me.
There were two bugs found so far (but the user's problem is still under analysis) --- I am here posting both patches. The explanation is in the patches.
Created attachment 332598 [details]
A backport of first bugfix to RHEL 4
This fixes the first problem for RHEL 4. kcopyd may call the callback directly
(not from its thread) and this may result in racing with another callback called
from the thread.
Created attachment 332599 [details]
Fix for the second bug
Fix for the second problem: when allocating a new exception structure,
dm-snapshot driver drops the lock. When it regains the lock, it checks if the
structure was meanwhile allocated by another thread and placed into pending
hash. But it doesn't check if the structure was allocated by another thread
and already finished. This may lead to a situation where multiple exception
records exists for the same chunk.
Committed in 82.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.