484319 – Random crashing in dm snapshots because of a race condition

Bug 484319 - Random crashing in dm snapshots because of a race condition

Summary: Random crashing in dm snapshots because of a race condition

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	4.8
Assignee:	Mikuláš Patočka
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	496100
TreeView+	depends on / blocked

Reported:	2009-02-06 01:14 UTC by Mikuláš Patočka
Modified:	2009-05-18 19:21 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	496100 (view as bug list)
Environment:
Last Closed:	2009-05-18 19:21:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
an upstream patch as a preview. I'll backport it to RHEL 4 if this bug is granted. (3.21 KB, patch) 2009-02-06 01:14 UTC, Mikuláš Patočka	no flags	Details \| Diff
A backport of first bugfix to RHEL 4 (2.78 KB, patch) 2009-02-19 18:10 UTC, Mikuláš Patočka	no flags	Details \| Diff
Fix for the second bug (2.72 KB, patch) 2009-02-19 18:14 UTC, Mikuláš Patočka	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1024	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update	2009-05-18 14:57:26 UTC

Description Mikuláš Patočka 2009-02-06 01:14:09 UTC

Created attachment 331076 [details]
an upstream patch as a preview. I'll backport it to RHEL 4 if this bug is granted.

Various crashes in pending_complete() / remove_exception() were reported.

Now I found one upstream user who has a reproducible testcase for his crashes and I sent him some debug code and pinpointed the bug.

The bug applies to RHEL 4/5 as well, so I'm creating this bug.

The impact of the bug is serious: random crashes when snapshots with chunk size >= 64k are used. This bug is really hapenning in the wild (there are several reports of it). I'm suggesting to backport the bug to RHEL 4.8.

See these. Some of these were mistakenly took as a consequences of other memory corruption bugs:
http://bugzilla.kernel.org/show_bug.cgi?id=11636
https://bugzilla.redhat.com/show_bug.cgi?id=465825
https://www.redhat.com/archives/dm-devel/2008-October/msg00084.html
https://www.redhat.com/archives/dm-devel/2008-November/msg00138.html
https://www.redhat.com/archives/dm-devel/2009-February/msg00048.html

Description of the bug:

Under specific circumstances, kcopyd callback could be called from
the thread that called dm_kcopyd_copy not from kcopyd workqueue.

dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

This code path is taken if thread calling dm_kcopyd_copy is delayed due to
scheduling inside split_job/segment_complete and the subjobs complete before
the loop in split_job completes.

Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also cause that exception store callbacks are called twice
for the same exception --- a likely reason for crashes inside
pending_complete() / remove_exception().

When I reviewed kcopyd further, I found four total problems:

1. job->fn being called from the thread that submitted the job (see above).

2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.

3. This comment is bogus (the jobs are not cancelable), plus the code
is prone to deadlock because the callback may allocate another job from
the same mempool.
/*
 * To avoid a race we must keep the job around
 * until after the notify function has completed.
 * Otherwise the client may try and stop the job
 * after we've completed.
 */
job->fn(read_err, write_err, job->context);
mempool_free(job, job->kc->job_pool);

4. Master jobs and subjobs are allocated from the same mempool.
Completion and freeing of a master job depends on successful allocation of
all subjobs -> deadlock.

I made a patch for upstream that should fix problems 1, 2, 3. Problem 4 is more complicated to fix (and less likely to happen --- there are no reports of it), so I wouldn't patch it to RHEL4 now.

I am suggesting to backport the patch to RHEL 4.8.

Comment 1 Mikuláš Patočka 2009-02-06 03:10:55 UTC

... In that "msg00084.html" reference I copied a wrong link. It should be
https://www.redhat.com/archives/dm-devel/2008-October/msg00085.html

Comment 4 RHEL Program Management 2009-02-06 16:19:36 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Mikuláš Patočka 2009-02-11 16:08:18 UTC

Status update on this bug:

The user still claims that there are some crashes, so there may be more bugs. I'm sending debug patches to him to try to pinpoint it.

During source code review I already found a second race condition (which wouldn't cause crash, but could corrupt data). And there may be possibly even third bug (unknown yet).

I'll send complete patches when the analysis will be finished. If you need the patches fast because of some deadline, contact me.

Comment 7 Mikuláš Patočka 2009-02-19 18:08:36 UTC

There were two bugs found so far (but the user's problem is still under analysis) --- I am here posting both patches. The explanation is in the patches.

Comment 8 Mikuláš Patočka 2009-02-19 18:10:29 UTC

Created attachment 332598 [details]
A backport of first bugfix to RHEL 4

This fixes the first problem for RHEL 4. kcopyd may call the callback directly
(not from its thread) and this may result in racing with another callback called
from the thread.

Comment 9 Mikuláš Patočka 2009-02-19 18:14:44 UTC

Created attachment 332599 [details]
Fix for the second bug

Fix for the second problem: when allocating a new exception structure,
dm-snapshot driver drops the lock. When it regains the lock, it checks if the
structure was meanwhile allocated by another thread and placed into pending
hash. But it doesn't check if the structure was allocated by another thread
and already finished. This may lead to a situation where multiple exception
records exists for the same chunk.

Comment 10 Vivek Goyal 2009-02-26 16:48:12 UTC

Committed in 82.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 13 errata-xmlrpc 2009-05-18 19:21:37 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.