Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 867644

Summary:	kernel crashes when attempting to partially activate a corrupted raid volume
Product:	Red Hat Enterprise Linux 6	Reporter:	Corey Marthaler <cmarthal>
Component:	lvm2	Assignee:	Jonathan Earl Brassow <jbrassow>
Status:	CLOSED DUPLICATE	QA Contact:	Cluster QE <mspqa-list>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	6.3	CC:	agk, dwysocha, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-11-19 15:26:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corey Marthaler 2012-10-17 23:00:16 UTC

Description of problem:
# NOTE: the released raid_sanity doesn't support -t raid10 yet
./raid_sanity -o taft-04 -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.3  -i 1 -t raid10 -e partial_activation_replace_missing_segment

SCENARIO (raid10) - [partial_activation_replace_missing_segment]
Create a raid, corrupt an image, and then reactivate it partially with an error dm target

Recreating PVs/VG with smaller sizes
pvcreate --setphysicalvolumesize 200M /dev/sdb1 /dev/sdb2 /dev/sdc1 /dev/sdc2 /dev/sde1)
vgcreate raid_sanity /dev/sdb1 /dev/sdb2 /dev/sdc1 /dev/sdc2 /dev/sde1

taft-04: lvcreate --type raid10 -i 2 -n partial_activation -L 188M raid_sanity /dev/sdb1 /dev/sdb2 /dev/sdc1 /dev/sdc2

Deactivating volume group
vgchange -an raid_sanity

taft-04: dd if=/dev/zero of=/dev/sdb1 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0100532 s, 104 MB/s
Verify there's an unknown device where the corrupt PV used to be

[root@taft-04 ~]# lvs -a -o +devices
  Couldn't find device with uuid Zrj3SX-ZnwG-wRgR-1M74-ZXkX-SEQG-eU0HUx.
  LV                            VG          Attr      LSize   Devices
  partial_activation            raid_sanity rwi---r-p 192.00m partial_activation_rimage_0(0),partial_activation_rimage_1(0),partial_activation_rimage_2(0),partial_activation_rimage_3(0)
  [partial_activation_rimage_0] raid_sanity Iwi---r-p  96.00m unknown device(1)
  [partial_activation_rimage_1] raid_sanity Iwi---r--  96.00m /dev/sdb2(1)
  [partial_activation_rimage_2] raid_sanity Iwi---r--  96.00m /dev/sdc1(1)
  [partial_activation_rimage_3] raid_sanity Iwi---r--  96.00m /dev/sdc2(1)
  [partial_activation_rmeta_0]  raid_sanity ewi---r-p   4.00m unknown device(0)
  [partial_activation_rmeta_1]  raid_sanity ewi---r--   4.00m /dev/sdb2(0)
  [partial_activation_rmeta_2]  raid_sanity ewi---r--   4.00m /dev/sdc1(0)
  [partial_activation_rmeta_3]  raid_sanity ewi---r--   4.00m /dev/sdc2(0)

[root@taft-04 ~]# vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  Couldn't find device with uuid Zrj3SX-ZnwG-wRgR-1M74-ZXkX-SEQG-eU0HUx.



general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/block/dm-8/queue/scheduler
CPU 1 
Modules linked in: dm_raid raid10 raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sunrpc p4_clockmod freq_table speedstep_lib ipv6 e1000 microcode dcdbas serio_raw sg iTCO_wdt iTCO_vendor_support shpchp e752x_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt megaraid_mbox megaraid_mm sr_mod cdrom video output pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf]

Pid: 3551, comm: vgchange Not tainted 2.6.32-332.el6.x86_64 #1 Dell Computer Corporation PowerEdge 2850/0T7971
RIP: 0010:[<ffffffffa04c3e8d>]  [<ffffffffa04c3e8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
RSP: 0018:ffff880218955c68  EFLAGS: 00010297
RAX: dead000000200200 RBX: ffff88021b532000 RCX: ffff88021b532438
RDX: dead000000100100 RSI: ffffffff81fc7440 RDI: ffff88021b532448
RBP: ffff880218955d08 R08: ffff88021b532448 R09: 00000000000001e2
R10: ffff880028401fc0 R11: 0000000000000000 R12: dead000000100100
R13: ffff88021b5328c8 R14: ffff88021b532028 R15: 0000000000000000
FS:  00007f221ea8d7a0(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8e715dc000 CR3: 0000000219bd9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process vgchange (pid: 3551, threadinfo ffff880218954000, task ffff88021822d500)
Stack:
 0000000000000480 0000000000000002 ffffffffa04c45c3 ffff88021b532010
<d> 0000000218955d34 0000000000030000 ffff88021b532028 0000000000000001
<d> ffffc90013307040 ffff88021b532438 ffffc90013301160 0000000000000400
Call Trace:
 [<ffffffffa0005f7f>] dm_table_add_target+0x13f/0x3b0 [dm_mod]
 [<ffffffffa00086f9>] table_load+0xc9/0x340 [dm_mod]
 [<ffffffffa0009984>] ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffffa0008630>] ? table_load+0x0/0x340 [dm_mod]
 [<ffffffffa0009a53>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81190522>] vfs_ioctl+0x22/0xa0
 [<ffffffff811906c4>] do_vfs_ioctl+0x84/0x580
 [<ffffffff81190c41>] sys_ioctl+0x81/0xa0
 [<ffffffff810d8255>] ? __audit_syscall_exit+0x265/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: a8 4c 89 e7 41 83 ef 01 48 c7 41 08 00 00 00 00 49 c7 44 24 30 00 00 00 00 e8 70 02 dc e0 4d 8b 24 24 4d 39 f4 0f 84 38 01 00 00 <49> 83 7c 24 28 00 74 eb 49 8b 4c 24 38 49 c7 44 24 68 00 00 00 
RIP  [<ffffffffa04c3e8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
 RSP <ffff880218955c68>


Version-Release number of selected component (if applicable):
2.6.32-332.el6.x86_64

lvm2-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
lvm2-libs-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
lvm2-cluster-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
device-mapper-libs-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
device-mapper-event-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
device-mapper-event-libs-1.02.77-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012
cmirror-2.02.98-2.el6    BUILT: Tue Oct 16 05:15:59 CDT 2012


How reproducible:
Everytime

Comment 1 Nenad Peric 2012-10-22 15:10:32 UTC

The same thing happens with raid1 as well:

vgcreate raid_sanity /dev/sda1  /dev/sdb1 /dev/sdc1
lvcreate -n partial_activate -m 1 --type raid1 raid_sanity /dev/sda1 /dev/sdb1 -L 188M

After wiping one of the disks (first 1M):

[root@r6-node01:~]$ lvs -a -o +devices
  Couldn't find device with uuid 4JQnSf-liwF-iQfF-NQ10-xlMr-n06l-nHsZoK.
  LV                          VG          Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert Devices                                                  
  lv_root                     VolGroup    -wi-ao---   7.54g                                            /dev/vda2(0)                                             
  lv_swap                     VolGroup    -wi-ao---   1.97g                                            /dev/vda2(1930)                                          
  partial_activate            raid_sanity rwi---r-p 188.00m                                            partial_activate_rimage_0(0),partial_activate_rimage_1(0)
  [partial_activate_rimage_0] raid_sanity Iwi---r-p 188.00m                                            unknown device(1)                                        
  [partial_activate_rimage_1] raid_sanity Iwi---r-- 188.00m                                            /dev/sdb1(1)                                             
  [partial_activate_rmeta_0]  raid_sanity ewi---r-p   4.00m                                            unknown device(0)                                        
  [partial_activate_rmeta_1]  raid_sanity ewi---r--   4.00m                                            /dev/sdb1(0)     



[root@r6-node01:~]$ vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  Couldn't find device with uuid 91p9RC-SIat-nfxr-rh6r-cNFw-tnjx-26BfFG.
device-mapper: raid: Failed to read superblock of device at position 0
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
CPU 0
Modules linked in: dm_raid raid10 raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sctp libcrc32c autofs4 sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 2151, comm: vgchange Not tainted 2.6.32-330.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffffa043ee8d>]  [<ffffffffa043ee8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
RSP: 0018:ffff88011b827c68  EFLAGS: 00010297
RAX: dead000000200200 RBX: ffff880119bdf800 RCX: ffff880119bdfc38
RDX: dead000000100100 RSI: ffffffff81fc8440 RDI: ffff880119bdfc48
RBP: ffff88011b827d08 R08: ffff880119bdfc48 R09: 000000000000003a
R10: 0000000000000000 R11: 0000000000000000 R12: dead000000100100
R13: ffff880119bdfdc8 R14: ffff880119bdf828 R15: 0000000000000000
FS:  00007f085176a7a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4529ec8000 CR3: 000000011a88d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process vgchange (pid: 2151, threadinfo ffff88011b826000, task ffff88011bf5aaa0)
Stack:
 0000000000000180 0000000000000002 ffffffffa043f5c3 ffff880119bdf810
<d> 000000021b827d34 000000000005e000 ffff880119bdf828 0000000000000001
<d> ffffc9000211c040 ffff880119bdfc38 ffffc90002116160 0000000000000400
Call Trace:
 [<ffffffffa00061ff>] dm_table_add_target+0x13f/0x3b0 [dm_mod]
 [<ffffffffa00088e9>] table_load+0xc9/0x340 [dm_mod]
 [<ffffffffa0009884>] ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffff8105b3b3>] ? perf_event_task_sched_out+0x33/0x80
 [<ffffffffa0008820>] ? table_load+0x0/0x340 [dm_mod]
 [<ffffffffa0009953>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81196062>] vfs_ioctl+0x22/0xa0
 [<ffffffff81196204>] do_vfs_ioctl+0x84/0x580
 [<ffffffff81196781>] sys_ioctl+0x81/0xa0
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: a8 4c 89 e7 41 83 ef 01 48 c7 41 08 00 00 00 00 49 c7 44 24 30 00 00 00 00 e8 d0 f1 e4 e0 4d 8b 24 24 4d 39 f4 0f 84 38 01 00 00 <49> 83 7c 24 28 00 74 eb 49 8b 4c 24 38 49 c7 44 24 68 00 00 00
RIP  [<ffffffffa043ee8d>] raid_ctr+0xdcd/0x1274 [dm_raid]
 RSP <ffff88011b827c68>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-330.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Thu Oct 11 15:37:45 EDT 2012
Command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap console=ttyS0,115200 rd_LVM_LV=VolGroup/lv_root SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off  memmap=exactmap memmap=626K@4K memmap=261498K@262770K elfcorehdr=524268K memmap=4K$0K memmap=10K$630K memmap=64K$960K memmap=12K$3670004K memmap=272K$4194032K
KERNEL supported cpus:

Comment 2 Jonathan Earl Brassow 2012-11-19 15:26:39 UTC

This looks exactly like bug 871630.  The fix for that was in kernel 2.6.32-339.el6.x86_64.  Marking this as duplicate.

*** This bug has been marked as a duplicate of bug 871630 ***