Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 818371

Summary:

kernel crashes when snapshots of mounted raid volumes are taken

Product:

Red Hat Enterprise Linux 6

Reporter:

Corey Marthaler <cmarthal>

Component:

kernel

Assignee:

Jonathan Earl Brassow <jbrassow>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.3

CC:

agk, borgan, dwysocha, heinzm, jbrassow, mbroz, msnitzer, nperic, prajnoha, prockai, syeghiay, thornber, zkabelac

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-274.el6

Doc Type:

Bug Fix

Doc Text:

When creating a snapshot of a mounted RAID volume, a kernel panic could occur. This happened because a timer designed to wake up an I/O processing thread was not deactivated when the RAID device was replace by a snapshot origin. The timer then woke a thread that attempted to access memory that had already been freed, resulting in a kernel panic. With this update, this bug has been fixed and the kernel panic no longer occurs in this scenario.

Story Points:

---

Clone Of:

Clones:

821454 (view as bug list)

Environment:

Last Closed:

2012-06-20 14:00:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

821454

Attachments:

Description	Flags
output of 'lvcreate -s' command showing possible bad ordering of suspend/resume	none

Description Corey Marthaler 2012-05-02 21:33:24 UTC

Description of problem:
SCENARIO - [write_to_snap_merge]
Create snaps of origin with fs data, verify data on snaps, change data on snaps, merge data back to origin, verify origin data
Making origin volume
** creating RAID1 origin **
Placing an ext filesystem on origin volume
mke2fs 1.41.12 (17-May-2010)
Mounting origin volume

Writing files to /mnt/origin
checkit starting with:
CREATE
Num files:          500
Random Seed:        8057
Verify XIOR Stream: /tmp/original.20616
Working dir:        /mnt/origin

Checking files on /mnt/origin
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/original.20616
Working dir:        /mnt/origin


Making 5 snapshots of the origin volume, mounting, and verifying original data
lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G

[MACHINE LOCKS UP AND REQUIRES A REBOOT]


Version-Release number of selected component (if applicable):
2.6.32-269.el6.x86_64
lvm2-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-libs-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-cluster-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
cmirror-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012


How reproducible:
Most of the time

Comment 1 Mike Snitzer 2012-05-02 21:47:10 UTC

(Moving to kernel component).

Do you have a crashdump or serial console log of the crash?

Comment 2 Corey Marthaler 2012-05-02 22:09:05 UTC

This happens even when the raid1 origin volume is fully in sync (in the original report it is not).

Anything on the console is lost right away due to the not stop spewing of information: 

[...]
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Uhhuh. NMI received for unknown reason 21 on CPU 0.
[...]


May  2 17:00:29 taft-01 qarshd[2987]: Running cmdline: lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G
May  2 17:00:30 taft-01 kernel: md/raid1:mdX: active with 2 out of 2 mirrors
May  2 17:00:30 taft-01 kernel: created bitmap (4 pages) for device mdX
May  2 17:00:30 taft-01 kernel: mdX: bitmap initialized from disk: read 1/1 pages, set 1 of 8192 bits
May  2 17:00:30 taft-01 lvm[1241]: No longer monitoring RAID device snapper-origin for events.
May  2 17:00:33 taft-01 lvm[1241]: Monitoring snapshot snapper-merge1
May  2 17:00:33 taft-01 lvm[1241]: Monitoring RAID device snapper-origin-real for events.

[POWER CYCLE]

May  2 17:07:06 taft-01 kernel: imklog 5.8.7, log source = /proc/kmsg started.
May  2 17:07:06 taft-01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.7" x-pid="1523" x-info="http://www.rsyslog.com"] start
May  2 17:07:06 taft-01 kernel: Initializing cgroup subsys cpuset
May  2 17:07:06 taft-01 kernel: Initializing cgroup subsys cpu
May  2 17:07:06 taft-01 kernel: Linux version 2.6.32-269.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Tue May 1 19:16:31 EDT 2012
May  2 17:07:06 taft-01 kernel: Command line: ro root=/dev/mapper/vg_taft01-lv_root rd_LVM_LV=vg_taft01/lv_swap LANG=en_US.UTF-8 rd_NO_MD  KEYTABLE=us rd_NO_LUKS SYSFONT=latarcyrheb-sun16 crashkernel=auto console=ttyS0,115200n8 rd_LVM_LV=vg_taft01/lv_root rd_NO_DM
May  2 17:07:06 taft-01 kernel: KERNEL supported cpus:
May  2 17:07:06 taft-01 kernel:  Intel GenuineIntel
May  2 17:07:06 taft-01 kernel:  AMD AuthenticAMD
May  2 17:07:06 taft-01 kernel:  Centaur CentaurHauls
May  2 17:07:06 taft-01 kernel: BIOS-provided physical RAM map:
May  2 17:07:06 taft-01 kernel: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)

Comment 3 Nenad Peric 2012-05-03 14:44:39 UTC

I hit the same issue with the first run of the test:

============================================================
Iteration 1 of 5 started at Thu May  3 09:40:30 CDT 2012
============================================================
SCENARIO - [write_to_snap_merge]
Create snaps of origin with fs data, verify data on snaps, change data on snaps, merge data back to origin, verify origin data
Making origin volume
** creating RAID1 origin **
lvcreate --type raid1 -m 1 -L 4G snapper -n origin
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 12.55% )
   0/1 mirror(s) are fully synced: ( 25.58% )
   0/1 mirror(s) are fully synced: ( 30.01% )
   0/1 mirror(s) are fully synced: ( 37.88% )
   0/1 mirror(s) are fully synced: ( 41.58% )
   0/1 mirror(s) are fully synced: ( 45.91% )
   0/1 mirror(s) are fully synced: ( 50.22% )
   0/1 mirror(s) are fully synced: ( 54.76% )
   0/1 mirror(s) are fully synced: ( 59.28% )
   0/1 mirror(s) are fully synced: ( 64.02% )
   0/1 mirror(s) are fully synced: ( 68.26% )
   0/1 mirror(s) are fully synced: ( 72.38% )
   0/1 mirror(s) are fully synced: ( 76.45% )
   0/1 mirror(s) are fully synced: ( 81.12% )
   0/1 mirror(s) are fully synced: ( 85.18% )
   0/1 mirror(s) are fully synced: ( 89.33% )
   0/1 mirror(s) are fully synced: ( 93.61% )
   0/1 mirror(s) are fully synced: ( 99.08% )
   1/1 mirror(s) are fully synced: ( 100.00% )
Placing an ext filesystem on origin volume
mke2fs 1.41.12 (17-May-2010)
Mounting origin volume

Writing files to /mnt/origin
checkit starting with:
CREATE
Num files:          500
Random Seed:        4019
Verify XIOR Stream: /tmp/original.22269
Working dir:        /mnt/origin

Checking files on /mnt/origin
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/original.22269
Working dir:        /mnt/origin


Making 5 snapshots of the origin volume, mounting, and verifying original data
lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G
Could not connect to node01:5008, 110: Connection timed out
+++ Mounting and verifying snapshot merge1 data +++
Could not connect to node01:5008, 110: Connection timed out
Could not connect to node01:5008, 113: No route to host

/var/log/messages is generally silent:

May  3 09:43:59 node01 qarshd[4024]: Running cmdline: lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G
May  3 09:44:00 node01 kernel: md/raid1:mdX: active with 2 out of 2 mirrors
May  3 09:44:00 node01 kernel: created bitmap (4 pages) for device mdX
May  3 09:44:00 node01 kernel: mdX: bitmap initialized from disk: read 1/1 pages, set 122 of 8192 bits
May  3 09:44:00 node01 lvm[2614]: No longer monitoring RAID device snapper-origin for events.
May  3 09:44:00 node01 lvm[2614]: Monitoring snapshot snapper-merge1
May  3 09:44:00 node01 lvm[2614]: Monitoring RAID device snapper-origin-real for events.

*POOF*

May  3 09:45:37 node01 kernel: imklog 5.8.10, log source = /proc/kmsg started.
May  3 09:45:37 node01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1306" x-info="http://www.rsyslog.com"] start

Comment 4 Nenad Peric 2012-05-03 15:08:29 UTC

Here is the output from crash dump:

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-268.el6.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 1
        DATE: Thu May  3 09:44:01 2012
      UPTIME: 04:40:27
LOAD AVERAGE: 1.08, 0.60, 0.24
       TASKS: 184
    NODENAME: node01
     RELEASE: 2.6.32-268.el6.x86_64
     VERSION: #1 SMP Tue Apr 24 16:27:49 EDT 2012
     MACHINE: x86_64  (2659 Mhz)
      MEMORY: 1 GB
       PANIC: ""
         PID: 4082
     COMMAND: "lvm"
        TASK: ffff88003ba41540  [THREAD_INFO: ffff88003cc1a000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)



PID: 4082   TASK: ffff88003ba41540  CPU: 0   COMMAND: "lvm"
 #0 [ffff880002203be0] machine_kexec at ffffffff8103281b
 #1 [ffff880002203c40] crash_kexec at ffffffff810ba5a2
 #2 [ffff880002203d10] oops_end at ffffffff81500b80
 #3 [ffff880002203d40] die at ffffffff8100f26b
 #4 [ffff880002203d70] do_general_protection at ffffffff81500712
 #5 [ffff880002203da0] general_protection at ffffffff814ffee5
    [exception RIP: run_timer_softirq+404]
    RIP: ffffffff8107e804  RSP: ffff880002203e50  RFLAGS: 00010246
    RAX: ffff880002203e90  RBX: ffffffff81ea6100  RCX: 0000000000000000
    RDX: 0000000000000100  RSI: dead000000200200  RDI: 4d414e5f564c5f4d
    RBP: ffff880002203ed0   R8: ffff880037c472e8   R9: 0000000000000001
    R10: 0000000000000010  R11: 0000000000000000  R12: ffff880037c472b0
    R13: ffff880002203e90  R14: 4d414e5f564c5f4d  R15: 440072657070616e
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #6 [ffff880002203ed8] __do_softirq at ffffffff81073e31
 #7 [ffff880002203f48] call_softirq at ffffffff8100c24c
 #8 [ffff880002203f60] do_softirq at ffffffff8100de85
 #9 [ffff880002203f80] irq_exit at ffffffff81073c15
#10 [ffff880002203f90] smp_apic_timer_interrupt at ffffffff815054d0
#11 [ffff880002203fb0] apic_timer_interrupt at ffffffff8100bc13
--- <IRQ stack> ---
#12 [ffff88003cc1bf58] apic_timer_interrupt at ffffffff8100bc13
    RIP: 00007ff039b98880  RSP: 00007fff0f67e5f0  RFLAGS: 00000246
    RAX: 0000000000002d60  RBX: 0000000000000050  RCX: 0000000000000000
    RDX: 0000000001c001e0  RSI: 0000000000000000  RDI: 00007ff039eaee80
    RBP: ffffffff8100bc0e   R8: 00000000ffffffff   R9: 0000000000000000
    R10: 0000000000000000  R11: 000000000000000b  R12: 0000000000000016
    R13: 00007fff0f67ef58  R14: 0000000001beee80  R15: 00000000000008a0
    ORIG_RAX: ffffffffffffff10  CS: 0033  SS: 002b


and the log says:


md: mdX: resync done.
RAID1 conf printout:
 --- wd:2 rd:2
 disk 0, wo:0, o:1, dev:dm-3
 disk 1, wo:0, o:1, dev:dm-5
SELinux: initialized (dev dm-6, type ext2), uses xattr
md/raid1:mdX: active with 2 out of 2 mirrors
created bitmap (4 pages) for device mdX
mdX: bitmap initialized from disk: read 1/1 pages, set 122 of 8192 bits
general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/platform/host6/session5/target6:0:0/6:0:0:1/block/sda/dev
CPU 0 
Modules linked in: ext2 dm_snapshot dm_raid raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 4082, comm: lvm Not tainted 2.6.32-268.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff8107e804>]  [<ffffffff8107e804>] run_timer_softirq+0x194/0x340
RSP: 0000:ffff880002203e50  EFLAGS: 00010246
RAX: ffff880002203e90 RBX: ffffffff81ea6100 RCX: 0000000000000000
RDX: 0000000000000100 RSI: dead000000200200 RDI: 4d414e5f564c5f4d
RBP: ffff880002203ed0 R08: ffff880037c472e8 R09: 0000000000000001
R10: 0000000000000010 R11: 0000000000000000 R12: ffff880037c472b0
R13: ffff880002203e90 R14: 4d414e5f564c5f4d R15: 440072657070616e
FS:  00007ff03ab567a0(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7bcb5820a0 CR3: 000000003c0a7000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process lvm (pid: 4082, threadinfo ffff88003cc1a000, task ffff88003ba41540)
Stack:
 00000000000c0c90 ffffffff81ea7d20 ffffffff81ea7920 ffffffff81ea7520
<d> ffff88003cc1bfd8 ffff88003cc1bfd8 0000010081038779 ffffffff81ea7120
<d> ffff880002203e90 ffff880002203e90 ffffffff8102b40d 0000000000000101
Call Trace:
 <IRQ> 
 [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff81073e31>] __do_softirq+0xc1/0x1e0
 [<ffffffff81096bc0>] ? hrtimer_interrupt+0x140/0x250
 [<ffffffff8100c24c>] call_softirq+0x1c/0x30
 [<ffffffff8100de85>] do_softirq+0x65/0xa0
 [<ffffffff81073c15>] irq_exit+0x85/0x90
 [<ffffffff815054d0>] smp_apic_timer_interrupt+0x70/0x9b
 [<ffffffff8100bc13>] apic_timer_interrupt+0x13/0x20
 <EOI> 
Code: c7 03 00 00 00 00 fb 66 0f 1f 44 00 00 48 8b 55 a8 8b 0d 98 84 b4 00 8b 92 44 e0 ff ff 85 c9 89 55 b4 0f 85 48 01 00 00 4c 89 f7 <41> ff d7 8b 15 bb 84 b4 00 85 d2 0f 85 0b 01 00 00 48 8b 45 a0 
RIP  [<ffffffff8107e804>] run_timer_softirq+0x194/0x340
 RSP <ffff880002203e50>

Comment 5 RHEL Program Management 2012-05-07 21:59:41 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Corey Marthaler 2012-05-07 22:16:55 UTC

Finally able to capture what was spewed to the console just before the reboot.

May  7 17:13:52 taft-01 qarshd[3083]: Running cmdline: lvs --noheadings -o
lv_a1
general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/block/dm-9/uevent
CPU 0 
Modules linked in: ext2 sunrpc p4_clockmod freq_table speedstep_lib ipv6
dm_sna]

Pid: 3083, comm: qarshd Not tainted 2.6.32-269.el6.x86_64 #1 Dell Computer
Corp1
RIP: 0010:[<ffffffff8107e804>]  [<ffffffff8107e804>]
run_timer_softirq+0x194/0x0
RSP: 0018:ffff880028203e50  EFLAGS: 00010246
RAX: ffff880028203e90 RBX: ffffffff81ea6100 RCX: 0000000000000000
RDX: 0000000000000100 RSI: dead000000200200 RDI: 67a99b330a493eb0
RBP: ffff880028203ed0 R08: ffff880218d952e8 R09: 0000000000000001
R10: 0000000000000010 R11: 0000000000000000 R12: ffff880218d952b0
R13: ffff880028203e90 R14: 67a99b330a493eb0 R15: ca60ff2f5e510a70
FS:  00007f52079d8700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001164568 CR3: 0000000217c5f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qarshd (pid: 3083, threadinfo ffff8801f83d0000, task ffff880215d66080)
Stack:
 ffffffff810a2110 ffffffff81ea7d20 ffffffff81ea7920 ffffffff81ea7520
<d> ffff8801f83d1fd8 ffff8801f83d1fd8 0000010081012bd9 ffffffff81ea7120
<d> ffff880028203e90 ffff880028203e90 ffffffff8102b40d 0000000000000141
Call Trace:
 <IRQ> 
 [<ffffffff810a2110>] ? tick_sched_timer+0x0/0xc0
 [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff81073e31>] __do_softirq+0xc1/0x1e0
 [<ffffffff81096bc0>] ? hrtimer_interrupt+0x140/0x250
 [<ffffffff8100c24c>] call_softirq+0x1c/0x30
 [Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-269.el6.x86_64 (mockbuild.bos.redhat.com)
(g2
Command line: ro root=/dev/mapper/vg_taft01-lv_root
rd_LVM_LV=vg_taft01/lv_swapK
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls

Comment 7 Jonathan Earl Brassow 2012-05-08 13:36:38 UTC

I've added print statements to CTR, DTR, SUSPEND, and RESUME operations.  They print out the operation being performed and the memory position of the dm_target struct.  The print-outs seem to indicate that the ordering of the suspend/resume operations is wrong:
1) Table1 CTR
2) Table1 RESUME
3) Table2 CTR
4) Table2 RESUME
5) Snap CTR
6) Table1 SUSPEND
7) Table2 SUSPEND
8) Table2 RESUME
9) Snap RESUME
10) Table1 DTR

Table2 should probably not be resumed before Table1 is suspended.  Actual output below.


device-mapper: raid: RAID CTR - 0xffffc90014aa0040
device-mapper: raid: Superblocks created for new array
md/raid1:mdX: not clean -- starting background reconstruction
md/raid1:mdX: active with 2 out of 2 mirrors
Choosing daemon_sleep default (5 sec)
created bitmap (4 pages) for device mdX
device-mapper: raid: RAID RESUME, 0xffffc90014aa0040
mdX: bitmap file is out of date, doing full recovery
mdX: bitmap initialized from disk: read 1/1 pages, set 8192 of 8192 bits
md: resync of RAID array mdX
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) f.
md: using 128k window, over a total of 4194304k.
md: mdX: resync done.
[device-mapper: raid: RAID CTR - 0xffffc9001506b040
md/raid1:mdX: active with 2 out of 2 mirrors
created bitmap (4 pages) for device mdX
device-mapper: raid: RAID RESUME, 0xffffc9001506b040
mdX: bitmap initialized from disk: read 1/1 pages, set 1 of 8192 bits
device-mapper: snapshots: SNAP CTR
device-mapper: raid: RAID PRESUSPEND, 0xffffc90014aa0040
device-mapper: raid: RAID POSTSUSPEND, 0xffffc90014aa0040
device-mapper: raid: RAID PRESUSPEND, 0xffffc9001506b040
device-mapper: raid: RAID POSTSUSPEND, 0xffffc9001506b040
device-mapper: raid: RAID RESUME, 0xffffc9001506b040
device-mapper: snapshots: SNAP PRERESUME
device-mapper: snapshots: SNAP RESUME
device-mapper: raid: RAID DTR, 0xffffc90014aa0040
------------[ cut here ]------------
kernel BUG at kernel/timer.c:960!

Comment 8 Jonathan Earl Brassow 2012-05-08 15:40:18 UTC

Created attachment 583023 [details]
output of 'lvcreate -s' command showing possible bad ordering of suspend/resume

Here is the state of the machine before the command was issued (and the command issued to get the output attached here).

[root@bp-01 ~]# lvcreate --type raid1 -m 1 -L 500M -n lv vg
  Logical volume "lv" created
[root@bp-01 ~]# dmsetup table | grep vg-
vg-lv: 0 1024000 raid raid1 3 0 region_size 1024 2 253:3 253:4 253:5 253:6
vg-lv_rmeta_1: 0 8192 linear 8:33 2048
vg-lv_rmeta_0: 0 8192 linear 8:17 2048
vg-lv_rimage_1: 0 1024000 linear 8:33 10240
vg-lv_rimage_0: 0 1024000 linear 8:17 10240
[root@bp-01 ~]# lvcreate -s vg/lv -L 500M -n snap -vvvvv >& output.txt
[

Comment 10 Alasdair Kergon 2012-05-14 14:11:28 UTC

Ordering problem apparently also reproducible using original mirror implementation.

Comment 11 Alasdair Kergon 2012-05-14 14:18:23 UTC

Two separate problems:

1. userspace sending invalid sequence of operations;
- this needs fixing for 6.3

2. kernel crash when it receives the invalid sequence.
- if it's easy to detect the situation and return an error, we should do that - but otherwise it doesn't matter as it makes no sense for userspace to issue that sequence of ioctls.

Comment 12 Alasdair Kergon 2012-05-14 14:30:42 UTC

Moved the LVM2 part of this problem to bug 821454.

Comment 16 RHEL Program Management 2012-05-16 14:20:33 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 17 Jonathan Earl Brassow 2012-05-16 15:40:33 UTC

I found the problem that is causing the kernel panic.  MD is setting a timer that occasionally kicks the personality's I/O processing thread.  In this case, the timer can go off after the mddev structure has been removed.  This causes a dereference to an area of memory that has been freed - kernel panic.

To fix the issue, I've submitted a patch to linux-raid that adds del_timer_sync to remove the timer before mddev_suspend finishes.  We don't want a timer going off after an mddev_suspend is called.  This is especially true with device-mapper, since it can call the destructor function immediately following a suspend.  This results in the removal (kfree) of the structures upon which the timer depends - resulting in a very ugly panic.  Therefore, we add a del_timer_sync to mddev_suspend to prevent this.

While this approach works, it seems that 'md_stop_writes' should handle this
properly.  It calls del_timer_sync already, but then allows writes to /finish/
after it is called.  Finishing writes call md_write_end, which resets the timer.
Why call del_timer_sync at all in 'md_stop_writes' then?  Given this line of reasoning, it may be more appropriate to have 'md_stop_writes' wait for all I/O to complete and then call 'del_timer_sync'.  Then we wouldn't have to remove the timer in 'mddev_suspend'.

Comment 19 Alasdair Kergon 2012-05-17 12:24:21 UTC

Upstream patch/thread is here:
https://www.redhat.com/archives/dm-devel/2012-May/msg00122.html

Comment 20 Jonathan Earl Brassow 2012-05-17 17:33:40 UTC

... and Neil's request for Linus to pull the patch into 3.4 is here:
https://lkml.org/lkml/2012/5/17/6

Comment 22 Jarod Wilson 2012-05-23 16:37:04 UTC

Patch(es) available on kernel-2.6.32-274.el6

Comment 24 Jarod Wilson 2012-05-23 17:09:17 UTC

Patch(es) available on kernel-2.6.32-274.el6

Comment 27 Jarod Wilson 2012-05-23 20:09:38 UTC

Patch(es) available on kernel-2.6.32-274.el6

Comment 29 Corey Marthaler 2012-05-23 21:12:27 UTC

These test cases no longer fail with the latest kernel/lvm rpms.

2.6.32-274.el6.x86_64

lvm2-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-libs-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-cluster-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
cmirror-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012

Comment 32 Nenad Peric 2012-06-08 12:58:48 UTC

Verified with 

lvm2-2.02.95-10.el6.x86_64
kernel-2.6.32-278.el6.x86_64

as well

Comment 33 Eva Kopalova 2012-06-19 06:39:38 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When creating a snapshot of a mounted RAID volume, a kernel panic could occur. This happened because a timer designed to wake up an I/O processing thread was not deactivated when the RAID device was replace by a snapshot origin. The timer then woke a thread that attempted to access memory that had already been freed, resulting in a kernel panic. With this update, this bug has been fixed and the kernel panic no longer occurs in this scenario.

Comment 35 errata-xmlrpc 2012-06-20 14:00:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html