Bug 867644 - kernel crashes when attempting to partially activate a corrupted raid volume
kernel crashes when attempting to partially activate a corrupted raid volume
Status: CLOSED DUPLICATE of bug 871630
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-17 19:00 EDT by Corey Marthaler
Modified: 2012-11-19 10:26 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-19 10:26:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2012-10-17 19:00:16 EDT
Description of problem:
# NOTE: the released raid_sanity doesn't support -t raid10 yet
./raid_sanity -o taft-04 -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.3  -i 1 -t raid10 -e partial_activation_replace_missing_segment

SCENARIO (raid10) - [partial_activation_replace_missing_segment]
Create a raid, corrupt an image, and then reactivate it partially with an error dm target

Recreating PVs/VG with smaller sizes
pvcreate --setphysicalvolumesize 200M /dev/sdb1 /dev/sdb2 /dev/sdc1 /dev/sdc2 /dev/sde1)
vgcreate raid_sanity /dev/sdb1 /dev/sdb2 /dev/sdc1 /dev/sdc2 /dev/sde1

taft-04: lvcreate --type raid10 -i 2 -n partial_activation -L 188M raid_sanity /dev/sdb1 /dev/sdb2 /dev/sdc1 /dev/sdc2

Deactivating volume group
vgchange -an raid_sanity

taft-04: dd if=/dev/zero of=/dev/sdb1 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0100532 s, 104 MB/s
Verify there's an unknown device where the corrupt PV used to be

[root@taft-04 ~]# lvs -a -o +devices
  Couldn't find device with uuid Zrj3SX-ZnwG-wRgR-1M74-ZXkX-SEQG-eU0HUx.
  LV                            VG          Attr      LSize   Devices
  partial_activation            raid_sanity rwi---r-p 192.00m partial_activation_rimage_0(0),partial_activation_rimage_1(0),partial_activation_rimage_2(0),partial_activation_rimage_3(0)
  [partial_activation_rimage_0] raid_sanity Iwi---r-p  96.00m unknown device(1)
  [partial_activation_rimage_1] raid_sanity Iwi---r--  96.00m /dev/sdb2(1)
  [partial_activation_rimage_2] raid_sanity Iwi---r--  96.00m /dev/sdc1(1)
  [partial_activation_rimage_3] raid_sanity Iwi---r--  96.00m /dev/sdc2(1)
  [partial_activation_rmeta_0]  raid_sanity ewi---r-p   4.00m unknown device(0)
  [partial_activation_rmeta_1]  raid_sanity ewi---r--   4.00m /dev/sdb2(0)
  [partial_activation_rmeta_2]  raid_sanity ewi---r--   4.00m /dev/sdc1(0)
  [partial_activation_rmeta_3]  raid_sanity ewi---r--   4.00m /dev/sdc2(0)

[root@taft-04 ~]# vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  Couldn't find device with uuid Zrj3SX-ZnwG-wRgR-1M74-ZXkX-SEQG-eU0HUx.



general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/block/dm-8/queue/scheduler
CPU 1 
Modules linked in: dm_raid raid10 raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sunrpc p4_clockmod freq_table speedstep_lib ipv6 e1000 microcode dcdbas serio_raw sg iTCO_wdt iTCO_vendor_support shpchp e752x_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt megaraid_mbox megaraid_mm sr_mod cdrom video output pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf]

Pid: 3551, comm: vgchange Not tainted 2.6.32-332.el6.x86_64 #1 Dell Computer Corporation PowerEdge 2850/0T7971
RIP: 0010:[<ffffffffa04c3e8d>]  [<ffffffffa04c3e8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
RSP: 0018:ffff880218955c68  EFLAGS: 00010297
RAX: dead000000200200 RBX: ffff88021b532000 RCX: ffff88021b532438
RDX: dead000000100100 RSI: ffffffff81fc7440 RDI: ffff88021b532448
RBP: ffff880218955d08 R08: ffff88021b532448 R09: 00000000000001e2
R10: ffff880028401fc0 R11: 0000000000000000 R12: dead000000100100
R13: ffff88021b5328c8 R14: ffff88021b532028 R15: 0000000000000000
FS:  00007f221ea8d7a0(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8e715dc000 CR3: 0000000219bd9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process vgchange (pid: 3551, threadinfo ffff880218954000, task ffff88021822d500)
Stack:
 0000000000000480 0000000000000002 ffffffffa04c45c3 ffff88021b532010
<d> 0000000218955d34 0000000000030000 ffff88021b532028 0000000000000001
<d> ffffc90013307040 ffff88021b532438 ffffc90013301160 0000000000000400
Call Trace:
 [<ffffffffa0005f7f>] dm_table_add_target+0x13f/0x3b0 [dm_mod]
 [<ffffffffa00086f9>] table_load+0xc9/0x340 [dm_mod]
 [<ffffffffa0009984>] ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffffa0008630>] ? table_load+0x0/0x340 [dm_mod]
 [<ffffffffa0009a53>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81190522>] vfs_ioctl+0x22/0xa0
 [<ffffffff811906c4>] do_vfs_ioctl+0x84/0x580
 [<ffffffff81190c41>] sys_ioctl+0x81/0xa0
 [<ffffffff810d8255>] ? __audit_syscall_exit+0x265/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: a8 4c 89 e7 41 83 ef 01 48 c7 41 08 00 00 00 00 49 c7 44 24 30 00 00 00 00 e8 70 02 dc e0 4d 8b 24 24 4d 39 f4 0f 84 38 01 00 00 <49> 83 7c 24 28 00 74 eb 49 8b 4c 24 38 49 c7 44 24 68 00 00 00 
RIP  [<ffffffffa04c3e8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
 RSP <ffff880218955c68>


Version-Release number of selected component (if applicable):
2.6.32-332.el6.x86_64

lvm2-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
lvm2-libs-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
lvm2-cluster-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
device-mapper-libs-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
device-mapper-event-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
device-mapper-event-libs-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
cmirror-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012


How reproducible:
Everytime
Comment 1 Nenad Peric 2012-10-22 11:10:32 EDT
The same thing happens with raid1 as well:

vgcreate raid_sanity /dev/sda1  /dev/sdb1 /dev/sdc1
lvcreate -n partial_activate -m 1 --type raid1 raid_sanity /dev/sda1 /dev/sdb1 -L 188M

After wiping one of the disks (first 1M):

[root@r6-node01:~]$ lvs -a -o +devices
  Couldn't find device with uuid 4JQnSf-liwF-iQfF-NQ10-xlMr-n06l-nHsZoK.
  LV                          VG          Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert Devices                                                  
  lv_root                     VolGroup    -wi-ao---   7.54g                                            /dev/vda2(0)                                             
  lv_swap                     VolGroup    -wi-ao---   1.97g                                            /dev/vda2(1930)                                          
  partial_activate            raid_sanity rwi---r-p 188.00m                                            partial_activate_rimage_0(0),partial_activate_rimage_1(0)
  [partial_activate_rimage_0] raid_sanity Iwi---r-p 188.00m                                            unknown device(1)                                        
  [partial_activate_rimage_1] raid_sanity Iwi---r-- 188.00m                                            /dev/sdb1(1)                                             
  [partial_activate_rmeta_0]  raid_sanity ewi---r-p   4.00m                                            unknown device(0)                                        
  [partial_activate_rmeta_1]  raid_sanity ewi---r--   4.00m                                            /dev/sdb1(0)     



[root@r6-node01:~]$ vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  Couldn't find device with uuid 91p9RC-SIat-nfxr-rh6r-cNFw-tnjx-26BfFG.
device-mapper: raid: Failed to read superblock of device at position 0
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
CPU 0
Modules linked in: dm_raid raid10 raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sctp libcrc32c autofs4 sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 2151, comm: vgchange Not tainted 2.6.32-330.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffffa043ee8d>]  [<ffffffffa043ee8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
RSP: 0018:ffff88011b827c68  EFLAGS: 00010297
RAX: dead000000200200 RBX: ffff880119bdf800 RCX: ffff880119bdfc38
RDX: dead000000100100 RSI: ffffffff81fc8440 RDI: ffff880119bdfc48
RBP: ffff88011b827d08 R08: ffff880119bdfc48 R09: 000000000000003a
R10: 0000000000000000 R11: 0000000000000000 R12: dead000000100100
R13: ffff880119bdfdc8 R14: ffff880119bdf828 R15: 0000000000000000
FS:  00007f085176a7a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4529ec8000 CR3: 000000011a88d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process vgchange (pid: 2151, threadinfo ffff88011b826000, task ffff88011bf5aaa0)
Stack:
 0000000000000180 0000000000000002 ffffffffa043f5c3 ffff880119bdf810
<d> 000000021b827d34 000000000005e000 ffff880119bdf828 0000000000000001
<d> ffffc9000211c040 ffff880119bdfc38 ffffc90002116160 0000000000000400
Call Trace:
 [<ffffffffa00061ff>] dm_table_add_target+0x13f/0x3b0 [dm_mod]
 [<ffffffffa00088e9>] table_load+0xc9/0x340 [dm_mod]
 [<ffffffffa0009884>] ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffff8105b3b3>] ? perf_event_task_sched_out+0x33/0x80
 [<ffffffffa0008820>] ? table_load+0x0/0x340 [dm_mod]
 [<ffffffffa0009953>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81196062>] vfs_ioctl+0x22/0xa0
 [<ffffffff81196204>] do_vfs_ioctl+0x84/0x580
 [<ffffffff81196781>] sys_ioctl+0x81/0xa0
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: a8 4c 89 e7 41 83 ef 01 48 c7 41 08 00 00 00 00 49 c7 44 24 30 00 00 00 00 e8 d0 f1 e4 e0 4d 8b 24 24 4d 39 f4 0f 84 38 01 00 00 <49> 83 7c 24 28 00 74 eb 49 8b 4c 24 38 49 c7 44 24 68 00 00 00
RIP  [<ffffffffa043ee8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
 RSP <ffff88011b827c68>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-330.el6.x86_64 (mockbuild@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Thu Oct 11 15:37:45 EDT 2012
Command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap console=ttyS0,115200 rd_LVM_LV=VolGroup/lv_root SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off  memmap=exactmap memmap=626K@4K memmap=261498K@262770K elfcorehdr=524268K memmap=4K$0K memmap=10K$630K memmap=64K$960K memmap=12K$3670004K memmap=272K$4194032K
KERNEL supported cpus:
Comment 2 Jonathan Earl Brassow 2012-11-19 10:26:39 EST
This looks exactly like bug 871630.  The fix for that was in kernel 2.6.32-339.el6.x86_64.  Marking this as duplicate.

*** This bug has been marked as a duplicate of bug 871630 ***

Note You need to log in before you can comment on or make changes to this bug.