Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1185441

Summary: known issues involving redundant log mirrors and multiple device failues
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: agk, heinzm, jbrassow, msnitzer, prajnoha, zkabelac
Version: 7.1Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-28 20:36:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2015-01-23 18:15:41 UTC
Description of problem:
I'm opening this bug for the specific issue listed in https://bugzilla.redhat.com/show_bug.cgi?id=1085983#c9 so that I have something to refer to when seeing those failures still but having bug 1085983 marked verified. We may end up turning the redundant log + multi device failure test cases off all together eventually, but in the mean time we can look at this "it's a known issue" place holder whenever we see issues.


================================================================================
Iteration 2.6 started at Thu Jan 22 22:37:31 CST 2015
================================================================================
Scenario kill_both_logs_2_legs_2_logs: Kill both logs of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_both_logs_2legs_2logs_1 syncd_both_logs_2legs_2logs_2 syncd_both_logs_2legs_2logs_3
* sync:               1
* striped:            0
* leg devices:        /dev/sdf1 /dev/sda1
* log devices:        /dev/sdd1 /dev/sde1
* no MDA devices:     
* failpv(s):          /dev/sdd1 /dev/sde1
* failnode(s):        host-109.virt.lab.msp.redhat.com
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   allocate
******************************************************

Creating mirror(s) on host-109.virt.lab.msp.redhat.com...
host-109.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_1 -L 500M helter_skelter /dev/sdf1:0-2400 /dev/sda1:0-2400 /dev/sdd1:0-150 /dev/sde1:0-150
host-109.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_2 -L 500M helter_skelter /dev/sdf1:0-2400 /dev/sda1:0-2400 /dev/sdd1:0-150 /dev/sde1:0-150
host-109.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_3 -L 500M helter_skelter /dev/sdf1:0-2400 /dev/sda1:0-2400 /dev/sdd1:0-150 /dev/sde1:0-150

Current mirror/raid device structure(s):
  LV                                            Attr       LSize   Cpy%Sync Devices
  syncd_both_logs_2legs_2logs_1                 mwi-a-m--- 500.00m 57.60    syncd_both_logs_2legs_2logs_1_mimage_0(0),syncd_both_logs_2legs_2logs_1_mimage_1(0)          
  [syncd_both_logs_2legs_2logs_1_mimage_0]      Iwi-aom--- 500.00m          /dev/sdf1(0)
  [syncd_both_logs_2legs_2logs_1_mimage_1]      Iwi-aom--- 500.00m          /dev/sda1(0)
  [syncd_both_logs_2legs_2logs_1_mlog]          mwi-aom---   4.00m 100.00   syncd_both_logs_2legs_2logs_1_mlog_mimage_0(0),syncd_both_logs_2legs_2logs_1_mlog_mimage_1(0)
  [syncd_both_logs_2legs_2logs_1_mlog_mimage_0] iwi-aom---   4.00m          /dev/sdd1(0)
  [syncd_both_logs_2legs_2logs_1_mlog_mimage_1] iwi-aom---   4.00m          /dev/sde1(0)
  syncd_both_logs_2legs_2logs_2                 mwi-a-m--- 500.00m 30.40    syncd_both_logs_2legs_2logs_2_mimage_0(0),syncd_both_logs_2legs_2logs_2_mimage_1(0)          
  [syncd_both_logs_2legs_2logs_2_mimage_0]      Iwi-aom--- 500.00m          /dev/sdf1(125)
  [syncd_both_logs_2legs_2logs_2_mimage_1]      Iwi-aom--- 500.00m          /dev/sda1(125)
  [syncd_both_logs_2legs_2logs_2_mlog]          mwi-aom---   4.00m 100.00   syncd_both_logs_2legs_2logs_2_mlog_mimage_0(0),syncd_both_logs_2legs_2logs_2_mlog_mimage_1(0)
  [syncd_both_logs_2legs_2logs_2_mlog_mimage_0] iwi-aom---   4.00m          /dev/sdd1(1)
  [syncd_both_logs_2legs_2logs_2_mlog_mimage_1] iwi-aom---   4.00m          /dev/sde1(1)
  syncd_both_logs_2legs_2logs_3                 mwi-aom--- 500.00m 1.60     syncd_both_logs_2legs_2logs_3_mimage_0(0),syncd_both_logs_2legs_2logs_3_mimage_1(0)          
  [syncd_both_logs_2legs_2logs_3_mimage_0]      Iwi-aom--- 500.00m          /dev/sdf1(250)
  [syncd_both_logs_2legs_2logs_3_mimage_1]      Iwi-aom--- 500.00m          /dev/sda1(250)
  [syncd_both_logs_2legs_2logs_3_mlog]          mwi-aom---   4.00m 100.00   syncd_both_logs_2legs_2logs_3_mlog_mimage_0(0),syncd_both_logs_2legs_2logs_3_mlog_mimage_1(0)
  [syncd_both_logs_2legs_2logs_3_mlog_mimage_0] iwi-aom---   4.00m          /dev/sdd1(2)
  [syncd_both_logs_2legs_2logs_3_mlog_mimage_1] iwi-aom---   4.00m          /dev/sde1(2)


Waiting until all mirror|raid volumes become fully syncd...
   3/3 mirror(s) are fully synced: ( 100.00% 100.00% 100.00% )

Creating ext on top of mirror(s) on host-109.virt.lab.msp.redhat.com...
mke2fs 1.42.9 (28-Dec-2013)
mke2fs 1.42.9 (28-Dec-2013)
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-109.virt.lab.msp.redhat.com...

PV=/dev/sde1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_1: 1.4
        syncd_both_logs_2legs_2logs_2_mlog_mimage_1: 1.4
        syncd_both_logs_2legs_2logs_3_mlog_mimage_1: 1.4
PV=/dev/sdd1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_0: 1.4
        syncd_both_logs_2legs_2logs_2_mlog_mimage_0: 1.4
        syncd_both_logs_2legs_2logs_3_mlog_mimage_0: 1.4
PV=/dev/sde1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_1: 1.4
        syncd_both_logs_2legs_2logs_2_mlog_mimage_1: 1.4
        syncd_both_logs_2legs_2logs_3_mlog_mimage_1: 1.4
PV=/dev/sdd1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_0: 1.4
        syncd_both_logs_2legs_2logs_2_mlog_mimage_0: 1.4
        syncd_both_logs_2legs_2logs_3_mlog_mimage_0: 1.4
Writing verification files (checkit) to mirror(s) on...
        ---- host-109.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-109.virt.lab.msp.redhat.com ----

Disabling device sdd on host-109.virt.lab.msp.redhat.com  /dev/sdd1: read failed after 0 of 4096 at 26838958080: Input/output error
  /dev/sdd1: read failed after 0 of 4096 at 26839048192: Input/output error
  /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error

Disabling device sde on host-109.virt.lab.msp.redhat.com  /dev/sde1: read failed after 0 of 4096 at 26838958080: Input/output error
  /dev/sde1: read failed after 0 of 4096 at 26839048192: Input/output error
  /dev/sde1: read failed after 0 of 4096 at 0: Input/output error
  /dev/sde1: read failed after 0 of 4096 at 4096: Input/output error

[DEADLOCK]

Jan 22 22:38:35 host-109 qarshd[534]: Running cmdline: echo offline > /sys/block/sdd/device/state
Jan 22 22:38:35 host-109 qarshd[537]: Running cmdline: pvscan --cache /dev/sdd1
Jan 22 22:38:35 host-109 lvm[1117]: Primary mirror device 253:2 has failed (D).
Jan 22 22:38:35 host-109 lvm[1117]: Device failure in helter_skelter-syncd_both_logs_2legs_2logs_1_mlog.
Jan 22 22:38:35 host-109 lvm[1117]: WARNING: Device for PV TdwTRF-2jmZ-3lPn-dZvx-JHaG-w84E-8m3ESg not found or rejected by a filter.
Jan 22 22:38:35 host-109 lvm[1117]: WARNING: Device for PV TdwTRF-2jmZ-3lPn-dZvx-JHaG-w84E-8m3ESg already missing, skipping.
Jan 22 22:38:35 host-109 lvm[1117]: WARNING: Device for PV TdwTRF-2jmZ-3lPn-dZvx-JHaG-w84E-8m3ESg not found or rejected by a filter.
Jan 22 22:38:35 host-109 qarshd[551]: Running cmdline: pvscan --cache /dev/sde1
Jan 22 22:38:37 host-109 kernel: device-mapper: raid1: All sides of mirror have failed.
Jan 22 22:38:37 host-109 kernel: device-mapper: raid1: All sides of mirror have failed.
Jan 22 22:38:41 host-109 qarshd[568]: Running cmdline: pvs -a
Jan 22 22:38:43 host-109 lvm[1117]: /dev/sde1: read failed after 0 of 4096 at 4096: Input/output error
Jan 22 22:38:43 host-109 lvm[1117]: WARNING: Failed to write an MDA of VG helter_skelter.
Jan 22 22:38:43 host-109 lvm[1117]: WARNING: Failed to write an MDA of VG helter_skelter.
Jan 22 22:38:43 host-109 lvm[1117]: WARNING: Failed to write an MDA of VG helter_skelter.
Jan 22 22:38:51 host-109 kernel: device-mapper: raid1: log postsuspend failed
Jan 22 22:38:51 host-109 kernel: device-mapper: dirty region log: 253:4: Failed to read header on dirty region log device
Jan 22 22:38:51 host-109 kernel: device-mapper: dirty region log: 253:4: Failed to write header on dirty region log device
Jan 22 22:38:51 host-109 lvm[1117]: Monitoring mirror device helter_skelter-syncd_both_logs_2legs_2logs_1 for events.
Jan 22 22:38:51 host-109 kernel: device-mapper: raid1: log resume failed
Jan 22 22:38:51 host-109 lvm[1117]: Mirror log status: 1 of 2 images failed.
Jan 22 22:38:51 host-109 lvm[1117]: Trying to up-convert to 2 images, 2 logs.
Jan 22 22:38:51 host-109 lvm[1117]: WARNING: Failed to write an MDA of VG helter_skelter.
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: log postsuspend failed
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: Primary mirror (253:2) failed while out-of-sync: Reads may fail.
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: Read failure on mirror device 253:2.  Failing I/O.
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: Mirror read failed.
Jan 22 22:39:00 host-109 kernel: device-mapper: dirty region log: 253:4: Failed to read header on dirty region log device
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: Primary mirror (253:2) failed while out-of-sync: Reads may fail.
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Jan 22 22:39:00 host-109 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Jan 22 22:41:08 host-109 kernel: INFO: task dmeventd:32542 blocked for more than 120 seconds.
Jan 22 22:41:08 host-109 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 22 22:41:08 host-109 kernel: dmeventd        D ffff88003fc13680     0 32542      1 0x00000080
Jan 22 22:41:08 host-109 kernel: ffff88001dbc3a20 0000000000000082 ffff88003c2ecfa0 ffff88001dbc3fd8
Jan 22 22:41:08 host-109 kernel: ffff88001dbc3fd8 ffff88001dbc3fd8 ffff88003c2ecfa0 ffff88003c2ecfa0
Jan 22 22:41:08 host-109 kernel: ffff88003fc13f48 7fffffffffffffff ffff88003c2ecfa0 ffff880028687210
Jan 22 22:41:08 host-109 kernel: Call Trace:
Jan 22 22:41:08 host-109 kernel: [<ffffffff81609269>] schedule+0x29/0x70
Jan 22 22:41:08 host-109 kernel: [<ffffffff816071b9>] schedule_timeout+0x209/0x2d0
Jan 22 22:41:08 host-109 kernel: [<ffffffff812aa5c1>] ? submit_bio+0x71/0x150
Jan 22 22:41:08 host-109 kernel: [<ffffffff81052b0f>] ? kvm_clock_get_cycles+0x1f/0x30
Jan 22 22:41:08 host-109 kernel: [<ffffffff810c8c9c>] ? ktime_get_ts+0x4c/0xf0
Jan 22 22:41:08 host-109 kernel: [<ffffffff81608a2b>] io_schedule_timeout+0x9b/0xf0
Jan 22 22:41:08 host-109 kernel: [<ffffffffa000a730>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffff81609a56>] wait_for_completion_io+0x116/0x170
Jan 22 22:41:08 host-109 kernel: [<ffffffff810a94d0>] ? wake_up_state+0x20/0x20
Jan 22 22:41:08 host-109 kernel: [<ffffffffa000afbf>] dm_io+0x26f/0x2e0 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa000a8d0>] ? dm_io_client_destroy+0x30/0x30 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa000a730>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa00278d6>] disk_resume+0x166/0x300 [dm_log]
Jan 22 22:41:08 host-109 kernel: [<ffffffff8126ec6b>] ? cred_has_capability+0x6b/0x120
Jan 22 22:41:08 host-109 kernel: [<ffffffffa002d412>] mirror_resume+0x32/0x60 [dm_mirror]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa00069ae>] dm_table_resume_targets+0x8e/0xe0 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa0003ee1>] dm_resume+0xc1/0x100 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa00091ab>] dev_suspend+0x12b/0x250 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa0009080>] ? table_load+0x380/0x380 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffffa0009ac5>] ctl_ioctl+0x255/0x500 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffff81254fe4>] ? SYSC_semtimedop+0x264/0xd10
Jan 22 22:41:08 host-109 kernel: [<ffffffffa0009d83>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jan 22 22:41:08 host-109 kernel: [<ffffffff811d9b75>] do_vfs_ioctl+0x2e5/0x4c0
Jan 22 22:41:08 host-109 kernel: [<ffffffff8126efae>] ? file_has_perm+0xae/0xc0
Jan 22 22:41:08 host-109 kernel: [<ffffffff81253a01>] ? sem_rcu_free+0x21/0x30
Jan 22 22:41:08 host-109 kernel: [<ffffffff811d9df1>] SyS_ioctl+0xa1/0xc0
Jan 22 22:41:08 host-109 kernel: [<ffffffff81613de9>] system_call_fastpath+0x16/0x1b
Jan 22 22:41:08 host-109 kernel: INFO: task jbd2/dm-7-8:406 blocked for more than 120 seconds.
Jan 22 22:41:08 host-109 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 22 22:41:08 host-109 kernel: jbd2/dm-7-8     D ffff88003fc13680     0   406      2 0x00000080
Jan 22 22:41:08 host-109 kernel: ffff88000012faf8 0000000000000046 ffff8800228ccfa0 ffff88000012ffd8
Jan 22 22:41:08 host-109 kernel: ffff88000012ffd8 ffff88000012ffd8 ffff8800228ccfa0 ffff88003fc13f48
Jan 22 22:41:08 host-109 kernel: ffff88003ff687e8 0000000000000002 ffffffff81155f10 ffff88000012fb70
Jan 22 22:41:08 host-109 kernel: Call Trace:
Jan 22 22:41:08 host-109 kernel: [<ffffffff81155f10>] ? wait_on_page_read+0x60/0x60
Jan 22 22:41:08 host-109 kernel: [<ffffffff8160956d>] io_schedule+0x9d/0x130
Jan 22 22:41:08 host-109 kernel: [<ffffffff81155f1e>] sleep_on_page+0xe/0x20
Jan 22 22:41:08 host-109 kernel: [<ffffffff81607340>] __wait_on_bit+0x60/0x90
Jan 22 22:41:08 host-109 kernel: [<ffffffff81155ca6>] wait_on_page_bit+0x86/0xb0
Jan 22 22:41:08 host-109 kernel: [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40
Jan 22 22:41:08 host-109 kernel: [<ffffffff81155de1>] filemap_fdatawait_range+0x111/0x1b0
Jan 22 22:41:08 host-109 kernel: [<ffffffff81155ea7>] filemap_fdatawait+0x27/0x30
Jan 22 22:41:08 host-109 kernel: [<ffffffffa0269a9f>] jbd2_journal_commit_transaction+0xa8f/0x19a0 [jbd2]
Jan 22 22:41:08 host-109 kernel: [<ffffffff810ad726>] ? __dequeue_entity+0x26/0x40
Jan 22 22:41:08 host-109 kernel: [<ffffffff810125c6>] ? __switch_to+0x136/0x4a0
Jan 22 22:41:08 host-109 kernel: [<ffffffffa026eda9>] kjournald2+0xc9/0x260 [jbd2]
Jan 22 22:41:08 host-109 kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Jan 22 22:41:08 host-109 kernel: [<ffffffffa026ece0>] ? commit_timeout+0x10/0x10 [jbd2]
Jan 22 22:41:08 host-109 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan 22 22:41:08 host-109 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 22 22:41:08 host-109 kernel: [<ffffffff81613d3c>] ret_from_fork+0x7c/0xb0
Jan 22 22:41:08 host-109 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140




Version-Release number of selected component (if applicable):
3.10.0-223.el7.x86_64

lvm2-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
lvm2-libs-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
lvm2-cluster-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-libs-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-event-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-event-libs-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-persistent-data-0.4.1-2.el7    BUILT: Wed Nov 12 12:39:46 CST 2014
cmirror-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015


How reproducible:
Often