Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 789408

Summary: RAID5 device failure causes dmeventd to block
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.3CC: agk, djansa, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.95-1.el6 Doc Type: Bug Fix
Doc Text:
New Feature to 6.3. No documentation required. Bug 732458 is the bug that requires a release note for the RAID features. Other documentation is found in the LVM manual. Operational bugs need no documentation because they are being fixed before their initial release.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 15:01:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2012-02-10 17:33:46 UTC
Description of problem:
Scenario kill_primary_synced_raid5_3legs: Kill primary leg of synced 3 leg raid5 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_primary_raid5_3legs_1
* sync:               1
* leg devices:        /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdg1
* failpv(s):          /dev/sdc1
* failnode(s):        taft-01
* raid fault policy:   warn
******************************************************

Creating raids(s) on taft-01...
taft-01: lvcreate --type raid5 -i 3 -n synced_primary_raid5_3legs_1 -L 500M black_bird /dev/sdc1:0-1000 /dev/sdd1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-1000

RAID Structure(s):
 LV                                      Attr     LSize   Copy%  Devices
 synced_primary_raid5_3legs_1            rwi-a-r- 504.00m        synced_primary_raid5_3legs_1_rimage_0(0),synced_primary_raid5_3legs_1_rimage_1(0),synced_primary_raid5_3legs_1_rimage_2(0),synced_primary_raid5_3legs_1_rimage_3(0)
 [synced_primary_raid5_3legs_1_rimage_0] Iwi-aor- 168.00m        /dev/sdc1(1)
 [synced_primary_raid5_3legs_1_rimage_1] Iwi-aor- 168.00m        /dev/sdd1(1)
 [synced_primary_raid5_3legs_1_rimage_2] Iwi-aor- 168.00m        /dev/sde1(1)
 [synced_primary_raid5_3legs_1_rimage_3] Iwi-aor- 168.00m        /dev/sdg1(1)
 [synced_primary_raid5_3legs_1_rmeta_0]  ewi-aor-   4.00m        /dev/sdc1(0)
 [synced_primary_raid5_3legs_1_rmeta_1]  ewi-aor-   4.00m        /dev/sdd1(0)
 [synced_primary_raid5_3legs_1_rmeta_2]  ewi-aor-   4.00m        /dev/sde1(0)
 [synced_primary_raid5_3legs_1_rmeta_3]  ewi-aor-   4.00m        /dev/sdg1(0)

PV=/dev/sdc1
        synced_primary_raid5_3legs_1_rimage_0: 2
        synced_primary_raid5_3legs_1_rmeta_0: 2

Continuing on without fully syncd raid1 mirror(s), currently at...
        ( 6.25% )

Disabling device sdc on taft-01
[DEADLOCK]




qarshd[3131]: Running cmdline: echo offline > /sys/block/sdc/device/state &
kernel: sd 3:0:0:2: rejecting I/O to offline device
kernel: sd 3:0:0:2: rejecting I/O to offline device
kernel: md/raid:mdX: Disk failure on dm-3, disabling device.
kernel: md/raid:mdX: Operation continuing on 3 devices.
kernel: md: mdX: resync done.
kernel: md: checkpointing resync of mdX.
lvm[1153]: Device #0 of raid5_ls array, black_bird-synced_primary_raid5_3legs_1, has failed.
qarshd[3134]: Running cmdline: pvs -a
kernel: INFO: task dmeventd:3108 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: dmeventd      D 0000000000000003     0  3108      1 0x00000080
kernel: ffff880218c37b18 0000000000000086 0000000000000000 ffffffffa000422e
kernel: ffff880218c37ae8 00000000bd278ab4 ffff880218c37b08 ffff880219021980
kernel: ffff880216ea3ab8 ffff880218c37fd8 000000000000f4e8 ffff880216ea3ab8
kernel: Call Trace:
kernel: [<ffffffffa000422e>] ? dm_table_unplug_all+0x8e/0x100 [dm_mod]
kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0
kernel: [<ffffffff811b1a2e>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90
kernel: [<ffffffff8125821d>] ? get_disk+0x7d/0xf0
kernel: [<ffffffff811b1f1e>] __blockdev_direct_IO+0x5e/0xd0
kernel: [<ffffffff811ae820>] ? blkdev_get_blocks+0x0/0xc0
kernel: [<ffffffff8126cd7a>] ? kobject_get+0x1a/0x30
kernel: [<ffffffff811af687>] blkdev_direct_IO+0x57/0x60
kernel: [<ffffffff811ae820>] ? blkdev_get_blocks+0x0/0xc0
kernel: [<ffffffff811128db>] generic_file_aio_read+0x6bb/0x700
kernel: [<ffffffff81213a31>] ? avc_has_perm+0x71/0x90
kernel: [<ffffffff8120d52f>] ? security_inode_permission+0x1f/0x30
kernel: [<ffffffff8117641a>] do_sync_read+0xfa/0x140
kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40


[root@taft-01 ~]# dmsetup status
black_bird-synced_primary_raid5_3legs_1_rimage_3: 0 344064 linear 
black_bird-synced_primary_raid5_3legs_1_rimage_2: 0 344064 linear 
black_bird-synced_primary_raid5_3legs_1_rimage_1: 0 344064 linear 
black_bird-synced_primary_raid5_3legs_1_rimage_0: 0 344064 linear 
black_bird-synced_primary_raid5_3legs_1: 0 1032192 raid raid5_ls 4 DAAA 150584/344064
black_bird-synced_primary_raid5_3legs_1_rmeta_3: 0 8192 linear 
black_bird-synced_primary_raid5_3legs_1_rmeta_2: 0 8192 linear 
black_bird-synced_primary_raid5_3legs_1_rmeta_1: 0 8192 linear 
black_bird-synced_primary_raid5_3legs_1_rmeta_0: 0 8192 linear 


[root@taft-01 ~]# dmsetup table
black_bird-synced_primary_raid5_3legs_1_rimage_3: 0 344064 linear 8:97 10240
black_bird-synced_primary_raid5_3legs_1_rimage_2: 0 344064 linear 8:65 10240
black_bird-synced_primary_raid5_3legs_1_rimage_1: 0 344064 linear 8:49 10240
black_bird-synced_primary_raid5_3legs_1_rimage_0: 0 344064 linear 8:33 10240
black_bird-synced_primary_raid5_3legs_1: 0 1032192 raid raid5_ls 3 128 region_size 1024 4 253:2 253:3 253:4 253:5 253:6 253:7 253:8 253:9
black_bird-synced_primary_raid5_3legs_1_rmeta_3: 0 8192 linear 8:97 2048
black_bird-synced_primary_raid5_3legs_1_rmeta_2: 0 8192 linear 8:65 2048
black_bird-synced_primary_raid5_3legs_1_rmeta_1: 0 8192 linear 8:49 2048
black_bird-synced_primary_raid5_3legs_1_rmeta_0: 0 8192 linear 8:33 2048


Version-Release number of selected component (if applicable):
2.6.32-220.el6.x86_64

lvm2-2.02.90-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012
lvm2-libs-2.02.90-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012
lvm2-cluster-2.02.90-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.69-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012
device-mapper-libs-1.02.69-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012
device-mapper-event-1.02.69-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012
device-mapper-event-libs-1.02.69-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012
cmirror-2.02.90-0.25.el6    BUILT: Sat Jan 28 18:03:08 CST 2012


How reproducible:
Everytime

Comment 1 Jonathan Earl Brassow 2012-02-20 19:29:13 UTC
Seems to be fixed by the latest version of the rhel6 kernel (2.6.32-236.el6).

However, I did notice that the helpful message that RAID1 prints when a device is lost is not printed for higher raid.  This is not a problem with the kernel or dmeventd, but the lvconvert command run by dmeventd.  Perhaps this might be worth another bug?

Comment 3 Corey Marthaler 2012-02-20 23:56:31 UTC
Verified fixed in the latest kernel + scratch lvm builds.

2.6.32-236.el6.x86_64

lvm2-2.02.92-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012
lvm2-libs-2.02.92-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012
lvm2-cluster-2.02.92-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.71-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012
device-mapper-libs-1.02.71-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012
device-mapper-event-1.02.71-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012
device-mapper-event-libs-1.02.71-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012
cmirror-2.02.92-0.40.el6    BUILT: Thu Feb 16 18:12:38 CST 2012

Comment 6 Jonathan Earl Brassow 2012-04-23 18:28:56 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
New Feature to 6.3.  No documentation required.

Bug 732458 is the bug that requires a release note for the RAID features.  Other documentation is found in the LVM manual.

Operational bugs need no documentation because they are being fixed before their initial release.

Comment 8 errata-xmlrpc 2012-06-20 15:01:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html