821454 – LVM2 issues invalid ioctl sequence that crashes kernel when snapshots of mounted raid volumes are taken

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 821454 - LVM2 issues invalid ioctl sequence that crashes kernel when snapshots of mounted raid volumes are taken

Summary: LVM2 issues invalid ioctl sequence that crashes kernel when snapshots of moun...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.3
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	6.3
Assignee:	Alasdair Kergon
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	818371
Blocks:
TreeView+	depends on / blocked

Reported:	2012-05-14 14:28 UTC by Alasdair Kergon
Modified:	2012-06-20 15:03 UTC (History)
CC List:	13 users (show)
Fixed In Version:	lvm2-2.02.95-9.el6
Doc Type:	Bug Fix
Doc Text:	RAID is a new feature to RHEL6.3. No tech note needed.
Clone Of:	818371
Environment:
Last Closed:	2012-06-20 15:03:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2012:0962	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2012-06-19 21:12:11 UTC

Description Alasdair Kergon 2012-05-14 14:28:36 UTC

When taking a snapshot of a mirror (either dm or md) LVM2 can cause two instances of the mirror table to be live simultaneously, which can lead to corruption or crashes.  The sequence of ioctls that LVM2 issues needs to be fixed to avoid this.


+++ This bug was initially created as a clone of Bug #818371 +++

Description of problem:
SCENARIO - [write_to_snap_merge]
Create snaps of origin with fs data, verify data on snaps, change data on snaps, merge data back to origin, verify origin data
Making origin volume
** creating RAID1 origin **
Placing an ext filesystem on origin volume
mke2fs 1.41.12 (17-May-2010)
Mounting origin volume

Writing files to /mnt/origin
checkit starting with:
CREATE
Num files:          500
Random Seed:        8057
Verify XIOR Stream: /tmp/original.20616
Working dir:        /mnt/origin

Checking files on /mnt/origin
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/original.20616
Working dir:        /mnt/origin


Making 5 snapshots of the origin volume, mounting, and verifying original data
lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G

[MACHINE LOCKS UP AND REQUIRES A REBOOT]


Version-Release number of selected component (if applicable):
2.6.32-269.el6.x86_64
lvm2-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-libs-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-cluster-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
cmirror-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012


How reproducible:
Most of the time



--- Additional comment from jbrassow on 2012-05-08 09:36:38 EDT ---

I've added print statements to CTR, DTR, SUSPEND, and RESUME operations.  They print out the operation being performed and the memory position of the dm_target struct.  The print-outs seem to indicate that the ordering of the suspend/resume operations is wrong:
1) Table1 CTR
2) Table1 RESUME
3) Table2 CTR
4) Table2 RESUME
5) Snap CTR
6) Table1 SUSPEND
7) Table2 SUSPEND
8) Table2 RESUME
9) Snap RESUME
10) Table1 DTR

Table2 should probably not be resumed before Table1 is suspended.  Actual output below.


device-mapper: raid: RAID CTR - 0xffffc90014aa0040
device-mapper: raid: Superblocks created for new array
md/raid1:mdX: not clean -- starting background reconstruction
md/raid1:mdX: active with 2 out of 2 mirrors
Choosing daemon_sleep default (5 sec)
created bitmap (4 pages) for device mdX
device-mapper: raid: RAID RESUME, 0xffffc90014aa0040
mdX: bitmap file is out of date, doing full recovery
mdX: bitmap initialized from disk: read 1/1 pages, set 8192 of 8192 bits
md: resync of RAID array mdX
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) f.
md: using 128k window, over a total of 4194304k.
md: mdX: resync done.
[device-mapper: raid: RAID CTR - 0xffffc9001506b040
md/raid1:mdX: active with 2 out of 2 mirrors
created bitmap (4 pages) for device mdX
device-mapper: raid: RAID RESUME, 0xffffc9001506b040
mdX: bitmap initialized from disk: read 1/1 pages, set 1 of 8192 bits
device-mapper: snapshots: SNAP CTR
device-mapper: raid: RAID PRESUSPEND, 0xffffc90014aa0040
device-mapper: raid: RAID POSTSUSPEND, 0xffffc90014aa0040
device-mapper: raid: RAID PRESUSPEND, 0xffffc9001506b040
device-mapper: raid: RAID POSTSUSPEND, 0xffffc9001506b040
device-mapper: raid: RAID RESUME, 0xffffc9001506b040
device-mapper: snapshots: SNAP PRERESUME
device-mapper: snapshots: SNAP RESUME
device-mapper: raid: RAID DTR, 0xffffc90014aa0040
------------[ cut here ]------------
kernel BUG at kernel/timer.c:960!

--- Additional comment from jbrassow on 2012-05-08 11:40:18 EDT ---

Created attachment 583023 [details]
output of 'lvcreate -s' command showing possible bad ordering of suspend/resume

Here is the state of the machine before the command was issued (and the command issued to get the output attached here).

[root@bp-01 ~]# lvcreate --type raid1 -m 1 -L 500M -n lv vg
  Logical volume "lv" created
[root@bp-01 ~]# dmsetup table | grep vg-
vg-lv: 0 1024000 raid raid1 3 0 region_size 1024 2 253:3 253:4 253:5 253:6
vg-lv_rmeta_1: 0 8192 linear 8:33 2048
vg-lv_rmeta_0: 0 8192 linear 8:17 2048
vg-lv_rimage_1: 0 1024000 linear 8:33 10240
vg-lv_rimage_0: 0 1024000 linear 8:17 10240
[root@bp-01 ~]# lvcreate -s vg/lv -L 500M -n snap -vvvvv >& output.txt
[


--- Additional comment from agk on 2012-05-14 10:11:28 EDT ---

Ordering problem apparently also reproducible using original mirror implementation.

--- Additional comment from agk on 2012-05-14 10:18:23 EDT ---

Two separate problems:

1. userspace sending invalid sequence of operations;
- this needs fixing for 6.3

2. kernel crash when it receives the invalid sequence.
- if it's easy to detect the situation and return an error, we should do that - but otherwise it doesn't matter as it makes no sense for userspace to issue that sequence of ioctls.

Comment 1 Alasdair Kergon 2012-05-14 22:07:45 UTC

So, there is a table resume on the -real device that assumes it's a linear mapping.  This is needed to set the correct size of the device before the snapshot table references it.

The strictest fix is to always load that table as an error device by that point to set the device size and then to load it with the correct table without resuming it (suppressed in the case of a linear table).

A less good alternative is to suppress the resume at that point, which would rely on the kernel not attempting to access the -real device in the ctr and skipping the device size validation.

Comment 2 Alasdair Kergon 2012-05-14 22:21:08 UTC

A reminder that all 'snapshot of mirror' cases need to be tested:

creation of first snapshot of a mirror

creation of a second snapshot of a mirror

removal of 2nd snapshot

removal of 1st snapshot

removal of mirror with 1 snapshot

removal of mirror with 2 snapshots

activation/deactivation of mirror with 1 snapshot

activation/deactivation of mirror with 2 snapshots

Comment 3 Alasdair Kergon 2012-05-14 22:30:46 UTC

each removal/creation case with the mirror(+snapshots) active/inactive

Comment 4 Alasdair Kergon 2012-05-14 22:54:44 UTC

All snapshot merge cases must also be tested with a mirror underneath.

Comment 6 Alasdair Kergon 2012-05-15 21:08:24 UTC

I have a one-line patch (using the 'less good' method) that seems to fix the  problems reported here.

However, working through the other cases I listed has thrown up a related problem with the order of device removal upon the completion of a snapshot merge.

Comment 7 Alasdair Kergon 2012-05-15 21:37:00 UTC

Patch to fix the basic problem.

http://sourceware.org/cgi-bin/cvsweb.cgi/LVM2/libdm/libdm-deptree.c.diff?cvsroot=lvm2&r1=text&tr1=1.163&r2=text&tr2=1.166&f=u

I can't yet spot an easy fix for snapshot merging and it might need mentioning in the technical note instead.  (It might need to load an error table into the old -real device.  Or we might find a way to re-order the tree dependencies for the removal.  Or we might be able to suppress the resume prior to deletion safely.)

Comment 9 Corey Marthaler 2012-05-23 21:13:11 UTC

These test cases no longer fail with the latest kernel/lvm rpms. Marking verified.

2.6.32-274.el6.x86_64

lvm2-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-libs-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-cluster-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
cmirror-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012

Comment 10 Jonathan Earl Brassow 2012-05-30 00:41:58 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
RAID is a new feature to RHEL6.3.  No tech note needed.

Comment 12 errata-xmlrpc 2012-06-20 15:03:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html

Note You need to log in before you can comment on or make changes to this bug.