Hide Forgot
Description of problem: Scenario kill_primary_synced_raid5_3legs: Kill primary leg of synced 3 leg raid5 volume(s) ********* RAID hash info for this scenario ********* * names: synced_primary_raid5_3legs_1 * sync: 1 * leg devices: /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdg1 * failpv(s): /dev/sdc1 * failnode(s): taft-01 * raid fault policy: warn ****************************************************** Creating raids(s) on taft-01... taft-01: lvcreate --type raid5 -i 3 -n synced_primary_raid5_3legs_1 -L 500M black_bird /dev/sdc1:0-1000 /dev/sdd1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-1000 RAID Structure(s): LV Attr LSize Copy% Devices synced_primary_raid5_3legs_1 rwi-a-r- 504.00m synced_primary_raid5_3legs_1_rimage_0(0),synced_primary_raid5_3legs_1_rimage_1(0),synced_primary_raid5_3legs_1_rimage_2(0),synced_primary_raid5_3legs_1_rimage_3(0) [synced_primary_raid5_3legs_1_rimage_0] Iwi-aor- 168.00m /dev/sdc1(1) [synced_primary_raid5_3legs_1_rimage_1] Iwi-aor- 168.00m /dev/sdd1(1) [synced_primary_raid5_3legs_1_rimage_2] Iwi-aor- 168.00m /dev/sde1(1) [synced_primary_raid5_3legs_1_rimage_3] Iwi-aor- 168.00m /dev/sdg1(1) [synced_primary_raid5_3legs_1_rmeta_0] ewi-aor- 4.00m /dev/sdc1(0) [synced_primary_raid5_3legs_1_rmeta_1] ewi-aor- 4.00m /dev/sdd1(0) [synced_primary_raid5_3legs_1_rmeta_2] ewi-aor- 4.00m /dev/sde1(0) [synced_primary_raid5_3legs_1_rmeta_3] ewi-aor- 4.00m /dev/sdg1(0) PV=/dev/sdc1 synced_primary_raid5_3legs_1_rimage_0: 2 synced_primary_raid5_3legs_1_rmeta_0: 2 Continuing on without fully syncd raid1 mirror(s), currently at... ( 6.25% ) Disabling device sdc on taft-01 [DEADLOCK] qarshd[3131]: Running cmdline: echo offline > /sys/block/sdc/device/state & kernel: sd 3:0:0:2: rejecting I/O to offline device kernel: sd 3:0:0:2: rejecting I/O to offline device kernel: md/raid:mdX: Disk failure on dm-3, disabling device. kernel: md/raid:mdX: Operation continuing on 3 devices. kernel: md: mdX: resync done. kernel: md: checkpointing resync of mdX. lvm[1153]: Device #0 of raid5_ls array, black_bird-synced_primary_raid5_3legs_1, has failed. qarshd[3134]: Running cmdline: pvs -a kernel: INFO: task dmeventd:3108 blocked for more than 120 seconds. kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: dmeventd D 0000000000000003 0 3108 1 0x00000080 kernel: ffff880218c37b18 0000000000000086 0000000000000000 ffffffffa000422e kernel: ffff880218c37ae8 00000000bd278ab4 ffff880218c37b08 ffff880219021980 kernel: ffff880216ea3ab8 ffff880218c37fd8 000000000000f4e8 ffff880216ea3ab8 kernel: Call Trace: kernel: [<ffffffffa000422e>] ? dm_table_unplug_all+0x8e/0x100 [dm_mod] kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0 kernel: [<ffffffff811b1a2e>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90 kernel: [<ffffffff8125821d>] ? get_disk+0x7d/0xf0 kernel: [<ffffffff811b1f1e>] __blockdev_direct_IO+0x5e/0xd0 kernel: [<ffffffff811ae820>] ? blkdev_get_blocks+0x0/0xc0 kernel: [<ffffffff8126cd7a>] ? kobject_get+0x1a/0x30 kernel: [<ffffffff811af687>] blkdev_direct_IO+0x57/0x60 kernel: [<ffffffff811ae820>] ? blkdev_get_blocks+0x0/0xc0 kernel: [<ffffffff811128db>] generic_file_aio_read+0x6bb/0x700 kernel: [<ffffffff81213a31>] ? avc_has_perm+0x71/0x90 kernel: [<ffffffff8120d52f>] ? security_inode_permission+0x1f/0x30 kernel: [<ffffffff8117641a>] do_sync_read+0xfa/0x140 kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40 [root@taft-01 ~]# dmsetup status black_bird-synced_primary_raid5_3legs_1_rimage_3: 0 344064 linear black_bird-synced_primary_raid5_3legs_1_rimage_2: 0 344064 linear black_bird-synced_primary_raid5_3legs_1_rimage_1: 0 344064 linear black_bird-synced_primary_raid5_3legs_1_rimage_0: 0 344064 linear black_bird-synced_primary_raid5_3legs_1: 0 1032192 raid raid5_ls 4 DAAA 150584/344064 black_bird-synced_primary_raid5_3legs_1_rmeta_3: 0 8192 linear black_bird-synced_primary_raid5_3legs_1_rmeta_2: 0 8192 linear black_bird-synced_primary_raid5_3legs_1_rmeta_1: 0 8192 linear black_bird-synced_primary_raid5_3legs_1_rmeta_0: 0 8192 linear [root@taft-01 ~]# dmsetup table black_bird-synced_primary_raid5_3legs_1_rimage_3: 0 344064 linear 8:97 10240 black_bird-synced_primary_raid5_3legs_1_rimage_2: 0 344064 linear 8:65 10240 black_bird-synced_primary_raid5_3legs_1_rimage_1: 0 344064 linear 8:49 10240 black_bird-synced_primary_raid5_3legs_1_rimage_0: 0 344064 linear 8:33 10240 black_bird-synced_primary_raid5_3legs_1: 0 1032192 raid raid5_ls 3 128 region_size 1024 4 253:2 253:3 253:4 253:5 253:6 253:7 253:8 253:9 black_bird-synced_primary_raid5_3legs_1_rmeta_3: 0 8192 linear 8:97 2048 black_bird-synced_primary_raid5_3legs_1_rmeta_2: 0 8192 linear 8:65 2048 black_bird-synced_primary_raid5_3legs_1_rmeta_1: 0 8192 linear 8:49 2048 black_bird-synced_primary_raid5_3legs_1_rmeta_0: 0 8192 linear 8:33 2048 Version-Release number of selected component (if applicable): 2.6.32-220.el6.x86_64 lvm2-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 lvm2-libs-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 lvm2-cluster-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 device-mapper-libs-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 device-mapper-event-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 device-mapper-event-libs-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 cmirror-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 How reproducible: Everytime
Seems to be fixed by the latest version of the rhel6 kernel (2.6.32-236.el6). However, I did notice that the helpful message that RAID1 prints when a device is lost is not printed for higher raid. This is not a problem with the kernel or dmeventd, but the lvconvert command run by dmeventd. Perhaps this might be worth another bug?
Verified fixed in the latest kernel + scratch lvm builds. 2.6.32-236.el6.x86_64 lvm2-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 lvm2-libs-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 lvm2-cluster-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 device-mapper-libs-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 device-mapper-event-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 device-mapper-event-libs-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 cmirror-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: New Feature to 6.3. No documentation required. Bug 732458 is the bug that requires a release note for the RAID features. Other documentation is found in the LVM manual. Operational bugs need no documentation because they are being fixed before their initial release.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0962.html