Hide Forgot
Description of problem: This may be related to bug 789408. Scenario kill_primary_synced_raid4_3legs: Kill primary leg of synced 3 leg raid4 volume(s) ********* RAID hash info for this scenario ********* * names: synced_primary_raid4_3legs_1 * sync: 1 * type: raid4 * -m or -i value: 3 * leg devices: /dev/sdg1 /dev/sdd1 /dev/sdh1 /dev/sdf1 * failpv(s): /dev/sdg1 * failnode(s): taft-01 * raid fault policy: warn ****************************************************** Creating raids(s) on taft-01... taft-01: lvcreate --type raid4 -i 3 -n synced_primary_raid4_3legs_1 -L 500M black_bird /dev/sdg1:0-1000 /dev/sdd1:0-1000 /dev/sdh1:0-1000 /dev/sdf1:0-1000 RAID Structure(s): LV Attr LSize Devices synced_primary_raid4_3legs_1 rwi-a-r- 504.00m synced_primary_raid4_3legs_1_rimage_0(0),synced_primary_raid4_3legs_1_rimage_1(0),synced_primary_raid4_3legs_1_rimage_2(0),synced_primary_raid4_3legs_1_rimage_3(0) [synced_primary_raid4_3legs_1_rimage_0] Iwi-aor- 168.00m /dev/sdg1(1) [synced_primary_raid4_3legs_1_rimage_1] Iwi-aor- 168.00m /dev/sdd1(1) [synced_primary_raid4_3legs_1_rimage_2] Iwi-aor- 168.00m /dev/sdh1(1) [synced_primary_raid4_3legs_1_rimage_3] Iwi-aor- 168.00m /dev/sdf1(1) [synced_primary_raid4_3legs_1_rmeta_0] ewi-aor- 4.00m /dev/sdg1(0) [synced_primary_raid4_3legs_1_rmeta_1] ewi-aor- 4.00m /dev/sdd1(0) [synced_primary_raid4_3legs_1_rmeta_2] ewi-aor- 4.00m /dev/sdh1(0) [synced_primary_raid4_3legs_1_rmeta_3] ewi-aor- 4.00m /dev/sdf1(0) PV=/dev/sdg1 synced_primary_raid4_3legs_1_rimage_0: 2 synced_primary_raid4_3legs_1_rmeta_0: 2 Disabling device sdg on taft-01 Attempting I/O to cause mirror down conversion(s) on taft-01 [DEADLOCK] qarshd[5787]: Running cmdline: echo offline > /sys/block/sdg/device/state & lvm[1256]: Device #0 of raid4 array, black_bird-synced_primary_raid4_3legs_1, has failed. kernel: md/raid:mdX: Disk failure on dm-3, disabling device. kernel: md/raid:mdX: Operation continuing on 3 devices. kernel: md/raid:mdX: read error not correctable (sector 126760 on dm-3). [...] kernel: md/raid:mdX: read error not correctable (sector 126832 on dm-3). kernel: md: mdX: resync done. lvm[1256]: /dev/sdg1: read failed after 0 of 512 at 145669554176: Input/output error [...] lvm[1256]: /dev/sdg1: read failed after 0 of 2048 at 0: Input/output error lvm[1256]: Couldn't find device with uuid 403agt-g0GQ-LPZ0-zcYq-3PTc-3R6A-efKAfT. qarshd[5790]: Running cmdline: pvs -a qarshd[5792]: Running cmdline: dd if=/dev/zero of=/dev/black_bird/synced_primary_raid4_3legs_1 count=1 kernel: INFO: task mdX_resync:5760 blocked for more than 120 seconds. kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: mdX_resync D 0000000000000002 0 5760 2 0x00000080 kernel: ffff88021729dcd0 0000000000000046 0000000000000000 ffff880217fa5c00 kernel: ffff880217fa5e20 0000000000000286 ffff880217fa5d28 ffff8802175e1028 kernel: ffff880218c89038 ffff88021729dfd8 000000000000f4e8 ffff880218c89038 kernel: Call Trace: kernel: [<ffffffff813eaf52>] md_do_sync+0xaf2/0xbe0 kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40 kernel: [<ffffffff813eb2d6>] md_thread+0x116/0x150 kernel: [<ffffffff813eb1c0>] ? md_thread+0x0/0x150 kernel: [<ffffffff81090886>] kthread+0x96/0xa0 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20 kernel: INFO: task dd:5793 blocked for more than 120 seconds. kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: dd D 0000000000000001 0 5793 5792 0x00000080 kernel: ffff880216a57bf8 0000000000000086 0000000000000000 0000000000000001 kernel: 0000000000008460 ffff880216a57c88 ffff880216a57d00 0000000000000286 kernel: ffff880216d89a78 ffff880216a57fd8 000000000000f4e8 ffff880216d89a78 kernel: Call Trace: kernel: [<ffffffff81110b10>] ? sync_page+0x0/0x50 kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0 kernel: [<ffffffff81110b4d>] sync_page+0x3d/0x50 kernel: [<ffffffff814edb9f>] __wait_on_bit+0x5f/0x90 kernel: [<ffffffff81110d03>] wait_on_page_bit+0x73/0x80 kernel: [<ffffffff81090c30>] ? wake_bit_function+0x0/0x50 kernel: [<ffffffff811271a5>] ? pagevec_lookup_tag+0x25/0x40 kernel: [<ffffffff8111111b>] wait_on_page_writeback_range+0xfb/0x190 kernel: [<ffffffff81126324>] ? generic_writepages+0x24/0x30 kernel: [<ffffffff81126351>] ? do_writepages+0x21/0x40 kernel: [<ffffffff8111126b>] ? __filemap_fdatawrite_range+0x5b/0x60 kernel: [<ffffffff811111df>] filemap_fdatawait+0x2f/0x40 kernel: [<ffffffff811117c4>] filemap_write_and_wait+0x44/0x60 kernel: [<ffffffff811afa74>] __sync_blockdev+0x24/0x50 kernel: [<ffffffff811afab3>] sync_blockdev+0x13/0x20 kernel: [<ffffffff811afb68>] __blkdev_put+0xa8/0x190 kernel: [<ffffffff811afc60>] blkdev_put+0x10/0x20 kernel: [<ffffffff811afca3>] blkdev_close+0x33/0x60 kernel: [<ffffffff81177e85>] __fput+0xf5/0x210 kernel: [<ffffffff81177fc5>] fput+0x25/0x30 kernel: [<ffffffff81173a0d>] filp_close+0x5d/0x90 kernel: [<ffffffff81173ae5>] sys_close+0xa5/0x100 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [root@taft-01 ~]# dmsetup status black_bird-synced_primary_raid4_3legs_1_rimage_1: 0 344064 linear black_bird-synced_primary_raid4_3legs_1: 0 1032192 raid raid4 4 DAAA 107560/344064 black_bird-synced_primary_raid4_3legs_1_rimage_0: 0 344064 linear black_bird-synced_primary_raid4_3legs_1_rmeta_3: 0 8192 linear black_bird-synced_primary_raid4_3legs_1_rmeta_2: 0 8192 linear black_bird-synced_primary_raid4_3legs_1_rmeta_1: 0 8192 linear black_bird-synced_primary_raid4_3legs_1_rmeta_0: 0 8192 linear black_bird-synced_primary_raid4_3legs_1_rimage_3: 0 344064 linear black_bird-synced_primary_raid4_3legs_1_rimage_2: 0 344064 linear [root@taft-01 ~]# dmsetup table black_bird-synced_primary_raid4_3legs_1_rimage_1: 0 344064 linear 8:49 10240 black_bird-synced_primary_raid4_3legs_1: 0 1032192 raid raid4 3 128 region_size 1024 4 253:2 253:3 253:4 253:5 253:6 253:7 253:8 253:9 black_bird-synced_primary_raid4_3legs_1_rimage_0: 0 344064 linear 8:97 10240 black_bird-synced_primary_raid4_3legs_1_rmeta_3: 0 8192 linear 8:81 2048 black_bird-synced_primary_raid4_3legs_1_rmeta_2: 0 8192 linear 8:113 2048 black_bird-synced_primary_raid4_3legs_1_rmeta_1: 0 8192 linear 8:49 2048 black_bird-synced_primary_raid4_3legs_1_rmeta_0: 0 8192 linear 8:97 2048 black_bird-synced_primary_raid4_3legs_1_rimage_3: 0 344064 linear 8:81 10240 black_bird-synced_primary_raid4_3legs_1_rimage_2: 0 344064 linear 8:113 10240 Version: 2.6.32-220.el6.x86_64 lvm2-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 lvm2-libs-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 lvm2-cluster-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 device-mapper-libs-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 device-mapper-event-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 device-mapper-event-libs-1.02.69-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012 cmirror-2.02.90-0.25.el6 BUILT: Sat Jan 28 18:03:08 CST 2012
This too is reproducible.
Fix by latest rhel6 kernel (2.6.32-236)
Verified fixed in the latest kernel + scratch lvm builds. 2.6.32-236.el6.x86_64 lvm2-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 lvm2-libs-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 lvm2-cluster-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 device-mapper-libs-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 device-mapper-event-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 device-mapper-event-libs-1.02.71-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012 cmirror-2.02.92-0.40.el6 BUILT: Thu Feb 16 18:12:38 CST 2012
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: New Feature to 6.3. No documentation required. Bug 732458 is the bug that requires a release note for the RAID features. Other documentation is found in the LVM manual. Operational bugs need no documentation because they are being fixed before their initial release.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0962.html