Bug 735124

Summary: LVM --type raid1 create attempt panics system and leaves it unbootable
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: kernelAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Petr Beňas <pbenas>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.2CC: agk, dwysocha, heinzm, jbrassow, mbroz, pbenas, prajnoha, prockai, pstehlik, thornber, tru, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-198.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 14:28:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 743047    

Description Corey Marthaler 2011-09-01 15:04:02 UTC
Description of problem:
[root@taft-01 ~]# pvcreate /dev/sd[bcdefgh]1
  Writing physical volume data to disk "/dev/sdb1"
  Physical volume "/dev/sdb1" successfully created
  Writing physical volume data to disk "/dev/sdc1"
  Physical volume "/dev/sdc1" successfully created
  Writing physical volume data to disk "/dev/sdd1"
  Physical volume "/dev/sdd1" successfully created
  Writing physical volume data to disk "/dev/sde1"
  Physical volume "/dev/sde1" successfully created
  Writing physical volume data to disk "/dev/sdf1"
  Physical volume "/dev/sdf1" successfully created
  Writing physical volume data to disk "/dev/sdg1"
  Physical volume "/dev/sdg1" successfully created
  Writing physical volume data to disk "/dev/sdh1"
  Physical volume "/dev/sdh1" successfully created
[root@taft-01 ~]# vgcreate vg /dev/sd[bcdefgh]1
  Volume group "vg" successfully created
[root@taft-01 ~]# pvscan
  PV /dev/sdb1   VG vg          lvm2 [67.83 GiB / 67.83 GiB free]
  PV /dev/sdc1   VG vg          lvm2 [67.83 GiB / 67.83 GiB free]
  PV /dev/sdd1   VG vg          lvm2 [67.83 GiB / 67.83 GiB free]
  PV /dev/sde1   VG vg          lvm2 [67.83 GiB / 67.83 GiB free]
  PV /dev/sdf1   VG vg          lvm2 [67.83 GiB / 67.83 GiB free]
  PV /dev/sdg1   VG vg          lvm2 [67.83 GiB / 67.83 GiB free]
  PV /dev/sdh1   VG vg          lvm2 [67.83 GiB / 67.83 GiB free]
  PV /dev/sda2   VG vg_taft01   lvm2 [67.75 GiB / 0    free]
  Total: 8 [542.55 GiB] / in use: 8 [542.55 GiB] / in no VG: 0 [0   ]
[root@taft-01 ~]# lvcreate --type raid1 -m 3  -L 100M -n lv vg
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/

Message from syslogd@taft-01 at Aug 31 14:49:36 ...
 kernel:Stack:

Message from syslogd@taft-01 at Aug 31 14:49:36 ...
 kernel:Call Trace:

Message from syslogd@taft-01 at Aug 31 14:49:36 ...
 kernel:Code: 7d f8 c9 c3 0f 1f 44 00 00 48 8b 05 a1 6e bb 00 41 8d b5 f0 00 00 00 4c 89 e7 ff 90 e0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 20 de ba 00 4c 89 e7 e8 b8 fd 22 00 85 c0 74 ec eb 90 66 


[REBOOT]


Setting hostname taft-01:  [  OK  ]
Setting up Logical Volume Management: async_tx: api initialized (async)
xor: automatically using best checksumming function: generic_sse
   generic_sse:  4204.000 MB/sec
   xor: using function: generic_sse (4204.000 MB/sec)
   raid6: int64x1   1222 MB/s
   raid6: int64x2   1746 MB/s
   raid6: int64x4   1789 MB/s
   raid6: int64x8   1492 MB/s
   raid6: sse2x1    2039 MB/s
   raid6: sse2x2    3019 MB/s
   raid6: sse2x4    2890 MB/s
   raid6: using algorithm sse2x2 (3019 MB/s)
   md: raid6 personality registered for level 6
   md: raid5 personality registered for level 5
   md: raid4 personality registered for level 4
   md: raid1 personality registered for level 1
   bio: create slab <bio-1> at 1
   md/raid1:mdX: not clean -- starting background reconstruction
   md/raid1:mdX: active with 4 out of 4 mirrors
   created bitmap (1 pages) for device mdX
   TECH PREVIEW: dm-raid (a device-mapper/MD bridge) may not be fully supported.
   Please review provided documentation for limitations.
   mdX: bitmap initialized from disk: read 1/1 pages, set 196 of 200 bits
   BUG: unable to handle kernel 
   md: resync of RAID array mdX
   md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
   md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
   md: using 128k window, over a total of 102400k.
   paging request at 0000000100000028
   IP: [<ffffffff813e42de>] md_wakeup_thread+0xe/0x30
   PGD 215664067 PUD 0 
   Oops: 0002 [#1] SMP 
   last sysfs file: /sys/module/raid1/initstate
CPU 3 
Modules linked in: dm_raid(T) raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx e1000 microcode dcdb]

Pid: 1051, comm: lvm Tainted: G           ---------------- T 2.6.32-192.el6.x86_64 #1 Dell Computer Corporation PowerEdge 2850/0T7971
RIP: 0010:[<ffffffff813e42de>]  [<ffffffff813e42de>] md_wakeup_thread+0xe/0x30
RSP: 0018:ffff8802156dbc98  EFLAGS: 00010082
RAX: ffff8802169e1200 RBX: ffff880216bc3200 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 9efacf44457ee738
RBP: ffff8802156dbc98 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: ffff880216bc3420
R13: 0000000000000292 R14: ffff880216bc3328 R15: 0000000000000000
FS:  00007fb06e60c7a0(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd9e7bdda0 CR3: 0000000218d4a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process lvm (pid: 1051, threadinfo ffff8802156da000, task ffff8802156b8b40)
Stack:
 ffff8802156dbce8 ffffffffa0396937 ffff8802156dbcd8 ffffffffa0002760
<0> ffffe8ffffc03540 ffff880216888408 ffff88021525ad00 ffff88021525ad28
<0> ffff8802156dbd18 0000000000000000 ffff8802156dbcf8 ffffffffa03b7dd5
Call Trace:
 [<ffffffffa0396937>] md_raid5_unplug_device+0x67/0x100 [raid456]
 [<ffffffffa0002760>] ? dm_unplug_all+0x50/0x70 [dm_mod]
 [<ffffffffa03b7dd5>] raid_unplug+0x15/0x20 [dm_raid]
 [<ffffffffa00041fe>] dm_table_unplug_all+0x8e/0x100 [dm_mod]
 [<ffffffff811af50f>] ? thaw_bdev+0x5f/0x130
 [<ffffffffa0002703>] dm_resume+0xe3/0xf0 [dm_mod]
 [<ffffffffa000894c>] dev_suspend+0x1bc/0x250 [dm_mod]
 [<ffffffffa00093b4>] ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffffa0008790>] ? dev_suspend+0x0/0x250 [dm_mod]
 [<ffffffffa0009483>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81188ed2>] vfs_ioctl+0x22/0xa0
 [<ffffffff81189074>] do_vfs_ioctl+0x84/0x580
 [<ffffffff811895f1>] sys_ioctl+0x81/0xa0
 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
Code: 24 2c 02 00 00 01 00 00 00 66 ff 03 66 66 90 48 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 00 55 48 89 e5 0f 1f 44 00 00 48 85 ff 74 1a < 
RIP  [<ffffffff813e42de>] md_wakeup_thread+0xe/0x30
 RSP <ffff8802156dbc98>
---[ end trace c981a51a52f7c4ab ]---
Kernel panic - not syncing: Fatal exception
Pid: 1051, comm: lvm Tainted: G      D    ---------------- T 2.6.32-192.el6.x86_64 #1
Call Trace:
 [<ffffffff814eb56e>] ? panic+0x78/0x143
 [<ffffffff814ef704>] ? oops_end+0xe4/0x100
 [<ffffffff8100f22b>] ? die+0x5b/0x90
 [<ffffffff814ef272>] ? do_general_protection+0x152/0x160
 [<ffffffff814eea45>] ? general_protection+0x25/0x30
 [<ffffffff813e42de>] ? md_wakeup_thread+0xe/0x30
 [<ffffffffa0396937>] ? md_raid5_unplug_device+0x67/0x100 [raid456]
 [<ffffffffa0002760>] ? dm_unplug_all+0x50/0x70 [dm_mod]
 [<ffffffffa03b7dd5>] ? raid_unplug+0x15/0x20 [dm_raid]
 [<ffffffffa00041fe>] ? dm_table_unplug_all+0x8e/0x100 [dm_mod]
 [<ffffffff811af50f>] ? thaw_bdev+0x5f/0x130
 [<ffffffffa0002703>] ? dm_resume+0xe3/0xf0 [dm_mod]
 [<ffffffffa000894c>] ? dev_suspend+0x1bc/0x250 [dm_mod]
 [<ffffffffa00093b4>] ? ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffffa0008790>] ? dev_suspend+0x0/0x250 [dm_mod]
 [<ffffffffa0009483>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81188ed2>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff81189074>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff811895f1>] ? sys_ioctl+0x81/0xa0
 [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b
panic occurred, switching back to text console


Version-Release number of selected component (if applicable):
2.6.32-192.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011


How reproducible:
Everytime

Comment 1 Jonathan Earl Brassow 2011-09-01 16:52:22 UTC
[<ffffffffa0396937>] md_raid5_unplug_device+0x67/0x100 [raid456]

Upstream doesn't have this unplug interface...  Problem is likely there.

Comment 2 Jonathan Earl Brassow 2011-09-01 16:53:44 UTC
wait, you are creating a RAID1 device, but it is calling md_raid5_unplug_device... that is a problem.

Comment 3 Jonathan Earl Brassow 2011-09-01 17:01:39 UTC
kernel is /missing/ the RAID1 unplug patch.

Comment 4 RHEL Product and Program Management 2011-09-01 17:19:48 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 5 Corey Marthaler 2011-09-01 21:41:48 UTC
No change in bahavior with the scratch kernel (2.6.32-193.el6.bz735124.x86_64),
this bug still exists.

Comment 8 Corey Marthaler 2011-09-02 16:26:26 UTC
All the RAID test cases listed in bug 729712#c4 work with the latest built kernel.

[root@taft-01 ~]# uname -ar
Linux taft-01 2.6.32-193.el6.bz735124.1.x86_64 #1 SMP Thu Sep 1 23:34:44 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux


[root@taft-01 ~]# lvcreate --type raid1 -m 3  -L 100M -n lv vg
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid4  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid5  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid5_ls  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid5_la  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid5_rs  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid5_ra  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid6  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid6_zr  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid6_nr  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

[root@taft-01 ~]# lvcreate --type raid6_nc  -i 4 -L 100M -n lv vg
  Using default stripesize 64.00 KiB
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Rounding size (25 extents) up to stripe boundary size (28 extents)
  Logical volume "lv" created
[root@taft-01 ~]# lvremove vg
Do you really want to remove active logical volume lv? [y/n]: y
  Logical volume "lv" successfully removed

Comment 10 Corey Marthaler 2011-09-15 16:42:55 UTC
FYI - although everything passed in the test kernel in comment #8, this bug still exists in the latest official kernel/scratch rpms.

2.6.32-195.el6.x86_64

lvm2-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-libs-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-cluster-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
udev-147-2.38.el6    BUILT: Fri Sep  9 16:25:50 CDT 2011
device-mapper-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
cmirror-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011

Comment 11 Jonathan Earl Brassow 2011-09-15 17:48:30 UTC
I was told it would be integrated into kernel-2.6.32-198.el6.  I don't see it in git yet, but it has been noticed and has approval.

Comment 12 Corey Marthaler 2011-09-16 20:15:44 UTC
This bug no longer exists in the latest kernel.

[root@taft-01 ~]# lvcreate --type raid1 -m 3  -L 100M -n lv vg
  WARNING:  RAID segment types are considered Tech Preview
  For more information on Tech Preview features, visit:
  https://access.redhat.com/support/offerings/techpreview/
  Logical volume "lv" created

[root@taft-01 ~]# lvs -a -o +devices
  LV            VG  Attr   LSize    Copy%  Devices
  lv            vg  mwi-a- 100.00m  100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0),lv_rimage_3(0)
  [lv_rimage_0] vg  -wi-ao 100.00m         /dev/sdb1(1)
  [lv_rimage_1] vg  -wi-ao 100.00m         /dev/sdc1(1)
  [lv_rimage_2] vg  -wi-ao 100.00m         /dev/sdd1(1)
  [lv_rimage_3] vg  -wi-ao 100.00m         /dev/sde1(1)
  [lv_rmeta_0]  vg  -wi-ao   4.00m         /dev/sdb1(0)
  [lv_rmeta_1]  vg  -wi-ao   4.00m         /dev/sdc1(0)
  [lv_rmeta_2]  vg  -wi-ao   4.00m         /dev/sdd1(0)
  [lv_rmeta_3]  vg  -wi-ao   4.00m         /dev/sde1(0)


2.6.32-198.el6.x86_64

lvm2-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-libs-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-cluster-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
udev-147-2.38.el6    BUILT: Fri Sep  9 16:25:50 CDT 2011
device-mapper-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
cmirror-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011

Comment 13 Kyle McMartin 2011-09-16 21:01:05 UTC
Patch(es) available on kernel-2.6.32-198.el6

Comment 17 Petr Beňas 2011-09-23 07:55:28 UTC
Reproduced in 2.6.32-197.el6.x86_64 and verified in 2.6.32-198.el6.x86_64.

Comment 18 errata-xmlrpc 2011-12-06 14:28:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html