RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1719469 - cmirror gets stuck once failed non primary leg is repaired resync'ed and then written to.
Summary: cmirror gets stuck once failed non primary leg is repaired resync'ed and then...
Keywords:
Status: CLOSED DUPLICATE of bug 1711427
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Heinz Mauelshagen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1711427
TreeView+ depends on / blocked
 
Reported: 2019-06-11 21:43 UTC by Corey Marthaler
Modified: 2021-09-03 12:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-19 21:14:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
clvmd debugging from null-02 (1.08 MB, text/plain)
2019-06-18 14:41 UTC, David Teigland
no flags Details
clvmd debugging from null-03 (1.31 MB, text/plain)
2019-06-18 14:41 UTC, David Teigland
no flags Details
clvmd debug output from null-02 (1.83 MB, text/plain)
2019-06-18 16:21 UTC, David Teigland
no flags Details
clvmd debug output from null-03 (1.67 MB, text/plain)
2019-06-18 16:22 UTC, David Teigland
no flags Details
cmirrord debug output from null-02 (6.86 MB, text/plain)
2019-06-18 16:31 UTC, David Teigland
no flags Details
clvmd debug output from null-02 (451.25 KB, text/plain)
2019-06-18 17:12 UTC, David Teigland
no flags Details
clvmd debug output from null-03 (324.42 KB, text/plain)
2019-06-18 17:13 UTC, David Teigland
no flags Details
clvmd debug output from null-02 (649.85 KB, text/plain)
2019-06-18 17:37 UTC, David Teigland
no flags Details
clvmd debug output from null-03 (471.64 KB, text/plain)
2019-06-18 17:38 UTC, David Teigland
no flags Details
clvmd debug output from null-02 (490.95 KB, text/plain)
2019-06-18 19:42 UTC, David Teigland
no flags Details
clvmd debug output from null-03 (338.19 KB, text/plain)
2019-06-18 19:43 UTC, David Teigland
no flags Details

Description Corey Marthaler 2019-06-11 21:43:03 UTC
Description of problem:
harding-02: lvcreate --type mirror --mirrorlog disk -m 2 -n mirror_1 -L 2G revolution_9  /dev/mapper/mpatha1:0-9999 /dev/mapper/mpathb1:0-9999 /dev/mapper/mpathc1:0-9999 /dev/mapper/mpathd1:0-150
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 34.86% )
   0/1 mirror(s) are fully synced: ( 61.43% )
   0/1 mirror(s) are fully synced: ( 88.38% )
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec
Sleeping 15 sec
Creating gfs2 on top of mirror(s) on harding-02...
mkfs.gfs2 -o align=0 -J 32M -j 2 -p lock_dlm -t HARDING:gfs1 /dev/revolution_9/mirror_1 -O
Warning: device is not properly aligned. This may harm performance.
Mounting mirrored gfs2 filesystems on harding-02...
Mounting mirrored gfs2 filesystems on harding-03...

================================================================================
Iteration 0.1 started at Mon Jun 10 15:44:15 CDT 2019
================================================================================
ACTUAL LEG ORDER: /dev/mapper/mpatha1 /dev/mapper/mpathb1 /dev/mapper/mpathc1
Scenario kill_non_primary_leg: Kill non primary leg
********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/mapper/mpatha1 /dev/mapper/mpathb1 /dev/mapper/mpathc1
* log devices:        /dev/mapper/mpathd1
* failpv(s):          /dev/mapper/mpathb1
* failnode(s):        harding-02 harding-03
* lvmetad:            0
* leg fault policy:   allocate
* log fault policy:   allocate
*************************************************

Current mirror/raid device structure(s):
  LV                  Attr       LSize    Cpy%Sync Devices                                                       
   mirror_1            mwi-aom---    2.00g 100.00   mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
   [mirror_1_mimage_0] iwi-aom---    2.00g          /dev/mapper/mpatha1(0)                                        
   [mirror_1_mimage_1] iwi-aom---    2.00g          /dev/mapper/mpathb1(0)                                        
   [mirror_1_mimage_2] iwi-aom---    2.00g          /dev/mapper/mpathc1(0)                                        
   [mirror_1_mlog]     lwi-aom---    4.00m          /dev/mapper/mpathd1(0)                                        
   home                -wi-ao---- <200.52g          /dev/sdb1(0)                                                  
   home                -wi-ao---- <200.52g          /dev/sdc1(0)                                                  
   home                -wi-ao---- <200.52g          /dev/sda2(7155)                                               
   root                -wi-ao----   50.00g          /dev/sda2(10792)                                              
   swap                -wi-ao----  <27.95g          /dev/sda2(0)                                                  


PV=/dev/mapper/mpathb1
        mirror_1_mimage_1: 4.1
PV=/dev/mapper/mpathb1
        mirror_1_mimage_1: 4.1

Writing verification files (checkit) to mirror(s) on...
        ---- harding-02 ----

        ---- harding-03 ----


<start name="harding-02_mirror_1"  pid="31689" time="Mon Jun 10 15:44:30 2019 -0500" type="cmd" />
<start name="harding-03_mirror_1"  pid="31690" time="Mon Jun 10 15:44:30 2019 -0500" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- harding-02 ----
        ---- harding-03 ----

Failing path sdm on harding-02
Failing path sde on harding-02
Failing path sdu on harding-02
Failing path sdac on harding-02
Failing path sde on harding-03
Failing path sdu on harding-03
Failing path sdm on harding-03
Failing path sdac on harding-03

Attempting I/O to cause mirror down conversion(s) on harding-02
dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0689112 s, 609 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
  LV                     Attr       LSize    Cpy%Sync Devices                                     
   mirror_1               cwi-aom---    2.00g 84.38    mirror_1_mimagetmp_3(0),mirror_1_mimage_3(0)
   [mirror_1_mimage_0]    iwi-aom---    2.00g          /dev/mapper/mpatha1(0)                      
   [mirror_1_mimage_2]    iwi-aom---    2.00g          /dev/mapper/mpathc1(0)                      
   [mirror_1_mimage_3]    Iwi-aom---    2.00g          /dev/mapper/mpathe1(0)                      
   [mirror_1_mimagetmp_3] mwi-aom---    2.00g 100.00   mirror_1_mimage_0(0),mirror_1_mimage_2(0)   
   [mirror_1_mlog]        lwi-aom---    4.00m          /dev/mapper/mpathd1(0)                      



Verifying FAILED device /dev/mapper/mpathb1 is *NOT* in the volume(s)
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
olog: 1
Verifying LOG device(s) /dev/mapper/mpathd1 *ARE* in the mirror(s)
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
Verifying LEG device /dev/mapper/mpatha1 *IS* in the volume(s)
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
Verifying LEG device /dev/mapper/mpathc1 *IS* in the volume(s)
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
verify the dm devices associated with /dev/mapper/mpathb1 have been removed as expected
Checking REMOVAL of mirror_1_mimage_1 on:  harding-02 harding-03
verify the newly allocated dm devices were added as a result of the failures
Checking EXISTENCE of mirror_1_mimage_3 on:  harding-02 harding-03

Verify that the mirror image order remains the same after the down conversion
EXPECTED LEG ORDER: /dev/mapper/mpatha1 /dev/mapper/mpathc1 unknown
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
  Couldn't find device with uuid kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj.
ACTUAL LEG ORDER: /dev/mapper/mpatha1 /dev/mapper/mpathc1 /dev/mapper/mpathe1
Verifying files (checkit) on mirror(s) on...
        ---- harding-02 ----

        ---- harding-03 ----

Enabling path sdm on harding-02
Enabling path sde on harding-02
Enabling path sdu on harding-02
Enabling path sdac on harding-02
Enabling path sde on harding-03
Enabling path sdu on harding-03
Enabling path sdm on harding-03
Enabling path sdac on harding-03

-------------------------------------------------------------------------------
Force a vgreduce to either clean up the corrupt additional LV or lack of PV MDA
-------------------------------------------------------------------------------
  Scan of VG revolution_9 from /dev/mapper/mpathb1 found metadata seqno 3 vs previous 8.
  Scan of VG revolution_9 from /dev/mapper/mpathb1 found metadata seqno 3 vs previous 8.
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 8
Recreating PVs /dev/mapper/mpathb1 and then extending back into revolution_9
harding-03 pvcreate /dev/mapper/mpathb1
  Scan of VG revolution_9 from /dev/mapper/mpathb1 found metadata seqno 9 vs previous 10.
  Scan of VG revolution_9 from /dev/mapper/mpathb1 found metadata seqno 9 vs previous 10.
  Scan of VG revolution_9 from /dev/mapper/mpathb1 found metadata seqno 9 vs previous 10.
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 10
  WARNING: Removing PV /dev/mapper/mpathb1 (kX34mx-VIEy-mivD-R5HS-KLM4-qJBV-1eJXTj) that no longer belongs to VG revolution_9
harding-03 vgextend revolution_9 /dev/mapper/mpathb1
<fail name="harding-03_mirror_1"  pid="31690" time="Mon Jun 10 15:45:37 2019 -0500" type="cmd" duration="67" ec="1" />
ALL STOP!
<stop name="harding-02_mirror_1"  pid="31689" time="Mon Jun 10 15:45:37 2019 -0500" type="cmd" duration="67" ec="1" />
  Error locking on node 2: Command timed out
  Error locking on node 1: Command timed out
  Can't get lock for orphan PVs.
extension of /dev/mapper/mpathb1 back into revolution_9 failed

Verify that each of the raid repairs finished successfully

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
sync percent for mirror_1 hasn't changed in the past minute
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   [...]
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
   0/1 mirror(s) are fully synced: ( 99.41% )
sync percent for mirror_1 hasn't changed in the past 5+ minutes


Jun 10 15:45:36 harding-02 kernel: device-mapper: multipath: Reinstating path 65:192.
Jun 10 15:45:36 harding-02 kernel: sd 8:0:2:1: alua: port group 02 state A preferred supports tolUsNA
Jun 10 15:45:36 harding-02 multipathd: 65:192: reinstated
Jun 10 15:45:36 harding-02 multipathd: mpathb: remaining active paths: 4
Jun 10 15:45:36 harding-02 lvm[12022]: revolution_9/mirror_1: Converted: 100.00%
Jun 10 15:45:37 harding-02 kernel: GFS2: fsid=HARDING:gfs1.0: Error -5 writing to journal, jid=0
Jun 10 15:45:37 harding-02 kernel: GFS2: fsid=HARDING:gfs1.0: error -5: withdrawing the file system to prevent further damage.
Jun 10 15:45:37 harding-02 kernel: GFS2: fsid=HARDING:gfs1.0: Error -5 writing to journal, jid=0
Jun 10 15:45:37 harding-02 kernel: VFS:Filesystem freeze failed
Jun 10 15:45:37 harding-02 kernel: GFS2: fsid=HARDING:gfs1.0: about to withdraw this file system
Jun 10 15:45:37 harding-02 kernel: GFS2: fsid=HARDING:gfs1.0: telling LM to unmount
Jun 10 15:45:37 harding-02 kernel: GFS2: fsid=HARDING:gfs1.0: withdrawn
Jun 10 15:45:37 harding-02 kernel: CPU: 14 PID: 18605 Comm: gfs2_logd Kdump: loaded Not tainted 3.10.0-1053.el7.x86_64 #1
Jun 10 15:45:37 harding-02 kernel: Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
Jun 10 15:45:37 harding-02 kernel: Call Trace:
Jun 10 15:45:37 harding-02 kernel: [<ffffffff937789ea>] dump_stack+0x19/0x1b
Jun 10 15:45:37 harding-02 kernel: [<ffffffffc0b4b966>] gfs2_lm_withdraw+0x146/0x180 [gfs2]
Jun 10 15:45:37 harding-02 kernel: [<ffffffffc0b303d5>] gfs2_logd+0x315/0x340 [gfs2]
Jun 10 15:45:37 harding-02 kernel: [<ffffffff930c60d0>] ? wake_up_atomic_t+0x30/0x30
Jun 10 15:45:37 harding-02 kernel: [<ffffffffc0b300c0>] ? gfs2_meta_syncfs+0x50/0x50 [gfs2]
Jun 10 15:45:37 harding-02 kernel: [<ffffffff930c4fe1>] kthread+0xd1/0xe0
Jun 10 15:45:37 harding-02 kernel: [<ffffffff930d3df4>] ? finish_task_switch+0x54/0x1c0
Jun 10 15:45:37 harding-02 kernel: [<ffffffff930c4f10>] ? insert_kthread_work+0x40/0x40
Jun 10 15:45:37 harding-02 kernel: [<ffffffff9378bd37>] ret_from_fork_nospec_begin+0x21/0x21
Jun 10 15:45:37 harding-02 kernel: [<ffffffff930c4f10>] ? insert_kthread_work+0x40/0x40
Jun 10 15:45:42 harding-02 kernel: Buffer I/O error on dev dm-23, logical block 16969, lost async page write
Jun 10 15:46:37 harding-02 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 18998) has timed out
Jun 10 15:46:51 harding-02 lvmpolld: W: #011LVPOLL: PID 18998: STDERR: '  Error locking on node 1: Command timed out'
Jun 10 15:47:52 harding-02 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 18998) has timed out
Jun 10 15:47:52 harding-02 lvmpolld: W: #011LVPOLL: PID 18998: STDERR: '  Failed to lock logical volume revolution_9/mirror_1.'
Jun 10 15:47:52 harding-02 lvmpolld: W: #011LVPOLL: PID 18998: STDERR: '  Error locking on node 1: Command timed out'
Jun 10 15:48:35 harding-02 kernel: INFO: task clvmd:9142 blocked for more than 120 seconds.
Jun 10 15:48:35 harding-02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 10 15:48:35 harding-02 kernel: clvmd           D ffff93bfad64d230     0  9142      1 0x00000080
Jun 10 15:48:35 harding-02 kernel: Call Trace:
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9377e959>] schedule+0x29/0x70
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9377c2e1>] schedule_timeout+0x221/0x2d0
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9377dbe7>] __down_common+0xaa/0x104
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9377dc5e>] __down+0x1d/0x1f
Jun 10 15:48:35 harding-02 kernel: [<ffffffff930caf81>] down+0x41/0x50
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc028c91b>] dm_rh_stop_recovery+0x2b/0x40 [dm_region_hash]
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc02a14d6>] mirror_presuspend+0xa6/0x175 [dm_mirror]
Jun 10 15:48:35 harding-02 kernel: [<ffffffff93308a3b>] ? cred_has_capability+0x6b/0x120
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc027323d>] dm_table_presuspend_targets+0x4d/0x60 [dm_mod]
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc026da87>] __dm_suspend+0x47/0x240 [dm_mod]
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc02702fd>] dm_suspend+0xad/0xc0 [dm_mod]
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc0276167>] dev_suspend+0x197/0x260 [dm_mod]
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc027722e>] ctl_ioctl+0x24e/0x550 [dm_mod]
Jun 10 15:48:35 harding-02 kernel: [<ffffffff930d655f>] ? ttwu_do_activate+0x6f/0x80
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc0275fd0>] ? table_load+0x390/0x390 [dm_mod]
Jun 10 15:48:35 harding-02 kernel: [<ffffffffc027753e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9325e220>] do_vfs_ioctl+0x3a0/0x5a0
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9324b20b>] ? __fput+0xeb/0x260
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9325e4c1>] SyS_ioctl+0xa1/0xc0
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9378be15>] ? system_call_after_swapgs+0xa2/0x146
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9378bede>] system_call_fastpath+0x25/0x2a
Jun 10 15:48:35 harding-02 kernel: [<ffffffff9378be21>] ? system_call_after_swapgs+0xae/0x146


Jun 10 15:45:36 harding-03 qarshd[18131]: Running cmdline: pvcreate /dev/mapper/mpathb1
Jun 10 15:45:37 harding-03 dmeventd[11505]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Jun 10 15:45:37 harding-03 kernel: GFS2: fsid=HARDING:gfs1.1: Error -5 writing to journal, jid=1
Jun 10 15:45:37 harding-03 kernel: GFS2: fsid=HARDING:gfs1.1: error -5: withdrawing the file system to prevent further damage.
Jun 10 15:45:37 harding-03 systemd: Started qarsh Per-Connection Server (10.15.80.218:38084).
Jun 10 15:45:37 harding-03 qarshd[18150]: Talking to peer ::ffff:10.15.80.218:38084 (IPv6)
Jun 10 15:45:37 harding-03 kernel: GFS2: fsid=HARDING:gfs1.1: Error -5 writing to journal, jid=1
Jun 10 15:45:37 harding-03 kernel: VFS:Filesystem freeze failed
Jun 10 15:45:37 harding-03 kernel: GFS2: fsid=HARDING:gfs1.1: about to withdraw this file system
Jun 10 15:45:37 harding-03 kernel: GFS2: fsid=HARDING:gfs1.1: telling LM to unmount
Jun 10 15:45:37 harding-03 corosync[8611]: [CPG   ] *** 0x564c1fb382d0 can't mcast to group  state:0, error:12
Jun 10 15:45:37 harding-03 kernel: GFS2: fsid=HARDING:gfs1.1: withdrawn
Jun 10 15:45:37 harding-03 kernel: CPU: 15 PID: 17369 Comm: gfs2_logd Kdump: loaded Not tainted 3.10.0-1053.el7.x86_64 #1
Jun 10 15:45:37 harding-03 kernel: Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
Jun 10 15:45:37 harding-03 kernel: Call Trace:
Jun 10 15:45:37 harding-03 kernel: [<ffffffffb59789ea>] dump_stack+0x19/0x1b
Jun 10 15:45:37 harding-03 kernel: [<ffffffffc06ca966>] gfs2_lm_withdraw+0x146/0x180 [gfs2]
Jun 10 15:45:37 harding-03 kernel: [<ffffffffc06af3d5>] gfs2_logd+0x315/0x340 [gfs2]
Jun 10 15:45:37 harding-03 kernel: [<ffffffffb52c60d0>] ? wake_up_atomic_t+0x30/0x30
Jun 10 15:45:37 harding-03 kernel: [<ffffffffc06af0c0>] ? gfs2_meta_syncfs+0x50/0x50 [gfs2]
Jun 10 15:45:37 harding-03 kernel: [<ffffffffb52c4fe1>] kthread+0xd1/0xe0
Jun 10 15:45:37 harding-03 kernel: [<ffffffffb52c4f10>] ? insert_kthread_work+0x40/0x40
Jun 10 15:45:37 harding-03 kernel: [<ffffffffb598bd37>] ret_from_fork_nospec_begin+0x21/0x21
Jun 10 15:45:37 harding-03 kernel: [<ffffffffb52c4f10>] ? insert_kthread_work+0x40/0x40
Jun 10 15:45:37 harding-03 qarshd[18150]: Running cmdline: vgextend revolution_9 /dev/mapper/mpathb1
Jun 10 15:48:51 harding-03 kernel: INFO: task clvmd:9160 blocked for more than 120 seconds.
Jun 10 15:48:51 harding-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 10 15:48:51 harding-03 kernel: clvmd           D ffff9c53eeda8000     0  9160      1 0x00000080
Jun 10 15:48:51 harding-03 kernel: Call Trace:
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597e959>] schedule+0x29/0x70
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c2e1>] schedule_timeout+0x221/0x2d0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597dbe7>] __down_common+0xaa/0x104
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597dc5e>] __down+0x1d/0x1f
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52caf81>] down+0x41/0x50
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00bf91b>] dm_rh_stop_recovery+0x2b/0x40 [dm_region_hash]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00d44d6>] mirror_presuspend+0xa6/0x175 [dm_mirror]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb5508a3b>] ? cred_has_capability+0x6b/0x120
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00a623d>] dm_table_presuspend_targets+0x4d/0x60 [dm_mod]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00a0a87>] __dm_suspend+0x47/0x240 [dm_mod]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00a32fd>] dm_suspend+0xad/0xc0 [dm_mod]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00a9167>] dev_suspend+0x197/0x260 [dm_mod]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00aa22e>] ctl_ioctl+0x24e/0x550 [dm_mod]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52d655f>] ? ttwu_do_activate+0x6f/0x80
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00a8fd0>] ? table_load+0x390/0x390 [dm_mod]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc00aa53e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb545e220>] do_vfs_ioctl+0x3a0/0x5a0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb545e4c1>] SyS_ioctl+0xa1/0xc0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb598be15>] ? system_call_after_swapgs+0xa2/0x146
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb598bede>] system_call_fastpath+0x25/0x2a
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb598be21>] ? system_call_after_swapgs+0xae/0x146
Jun 10 15:48:51 harding-03 kernel: INFO: task gfs2_logd:17369 blocked for more than 120 seconds.
Jun 10 15:48:51 harding-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 10 15:48:51 harding-03 kernel: gfs2_logd       D ffff9c57ea7f62a0     0 17369      2 0x00000080
Jun 10 15:48:51 harding-03 kernel: Call Trace:
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52e364c>] ? update_curr+0x14c/0x1e0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52e031e>] ? account_entity_dequeue+0xae/0xd0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c920>] ? bit_wait+0x50/0x50
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597e959>] schedule+0x29/0x70
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c2e1>] schedule_timeout+0x221/0x2d0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52e441e>] ? dequeue_task_fair+0x41e/0x660
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb522b59e>] ? __switch_to+0xce/0x580
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c920>] ? bit_wait+0x50/0x50
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597decd>] io_schedule_timeout+0xad/0x130
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597df68>] io_schedule+0x18/0x20
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c931>] bit_wait_io+0x11/0x50
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c457>] __wait_on_bit+0x67/0x90
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c920>] ? bit_wait+0x50/0x50
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb597c5c1>] out_of_line_wait_on_bit+0x81/0xb0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52c6190>] ? wake_bit_function+0x40/0x40
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb548124a>] __wait_on_buffer+0x2a/0x30
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc06ad77a>] gfs2_ail1_wait+0xaa/0xc0 [gfs2]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc06af3a0>] gfs2_logd+0x2e0/0x340 [gfs2]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52c60d0>] ? wake_up_atomic_t+0x30/0x30
Jun 10 15:48:51 harding-03 kernel: [<ffffffffc06af0c0>] ? gfs2_meta_syncfs+0x50/0x50 [gfs2]
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52c4fe1>] kthread+0xd1/0xe0
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52c4f10>] ? insert_kthread_work+0x40/0x40
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb598bd37>] ret_from_fork_nospec_begin+0x21/0x21
Jun 10 15:48:51 harding-03 kernel: [<ffffffffb52c4f10>] ? insert_kthread_work+0x40/0x40

[root@harding-02 ~]# dlm_tool ls
dlm lockspaces
name          clvmd
id            0x4104eefa
flags         0x00000000 
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2 


[root@harding-03 ~]# dlm_tool ls
dlm lockspaces
name          clvmd
id            0x4104eefa
flags         0x00000000 
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2 


[root@harding-02 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000001 gr PW rq IV pid 19003 master 0 "V_revolution_9"
id 00000005 gr IV rq PR pid 18882 master 0 "V_revolution_9"
id 00000002 gr PW rq IV pid 19003 master 0 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA"
id 00000004 gr CR rq IV pid 9142 master 2 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT"
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff93bcf99e4280 Name (len=14) "V_revolution_9"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000001 PW
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:00000005 -- (PR)
/sys/kernel/debug/dlm/clvmd:00000006 -- (PR) Remote:   2 00000005
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff93c7ac371f40 Name (len=64) "SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000002 PW
/sys/kernel/debug/dlm/clvmd:00000003 CR Remote:   2 00000001
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff93bcf99e43c0 Name (len=64) "SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 2
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000004 CR Master:     00000003
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff93bcf99e4280 0 0 0 0 0 0 14 str V_revolution_9
/sys/kernel/debug/dlm/clvmd_all:lkb 1 0 0 19003 0 0 1 2 4 -1 0 0 0 0 4262509774825 0
/sys/kernel/debug/dlm/clvmd_all:lkb 5 0 0 18882 0 0 1 1 -1 3 0 0 0 0 4277310195962 0
/sys/kernel/debug/dlm/clvmd_all:lkb 6 2 5 7216 0 0 10001 1 -1 3 0 0 0 0 83666696272317 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff93c7ac371f40 0 0 0 0 0 0 64 str SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA
/sys/kernel/debug/dlm/clvmd_all:lkb 2 0 0 19003 0 5 1 2 4 -1 0 0 0 0 4262546902849 0
/sys/kernel/debug/dlm/clvmd_all:lkb 3 2 1 9160 0 1 10001 2 1 -1 0 0 0 0 4084761163046 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff93bcf99e43c0 2 0 0 0 0 0 64 str SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT
/sys/kernel/debug/dlm/clvmd_all:lkb 4 2 3 9142 0 1 1 2 1 -1 0 0 0 0 4231963236873 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:1 0 0 19003 0 0 1 2 4 -1 79433230394 0 14 "V_revolution_9"
/sys/kernel/debug/dlm/clvmd_locks:5 0 0 18882 0 0 1 1 -1 3 79418429978 0 14 "V_revolution_9"
/sys/kernel/debug/dlm/clvmd_locks:6 2 5 7216 0 0 10001 1 -1 3 29043905 0 14 "V_revolution_9"
/sys/kernel/debug/dlm/clvmd_locks:2 0 0 19003 0 5 1 2 4 -1 79433193281 0 64 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA"
/sys/kernel/debug/dlm/clvmd_locks:3 2 1 9160 0 1 10001 2 1 -1 79610979024 0 64 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA"
/sys/kernel/debug/dlm/clvmd_locks:4 2 3 9142 0 1 1 2 1 -1 79463776976 2 64 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT"
/sys/kernel/debug/dlm/clvmd_toss:version 4 rsb 2
/sys/kernel/debug/dlm/clvmd_toss:rsb ffff93c7ac372580 2 2 1 1 4298929573 0 10 str P_#orphans



[root@harding-03 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000005 gr IV rq PR pid 7216 master 1 "V_revolution_9"
id 00000001 gr CR rq IV pid 9160 master 1 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA"
id 00000004 gr PW rq IV pid 18156 master 0 "P_#orphans"
id 00000002 gr CR rq IV pid 9160 master 0 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT"
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff9c5be9f90000 Name (len=14) "V_revolution_9"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 1
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:00000005 -- (PR) Master:     00000006
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff9c53e17b0b40 Name (len=64) "SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 1
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000001 CR Master:     00000003
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff9c5be9f90140 Name (len=10) "P_#orphans"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000004 PW
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff9c5bee7cd180 Name (len=64) "SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000003 CR Remote:   1 00000004
/sys/kernel/debug/dlm/clvmd:00000002 CR
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff9c5be9f90000 1 0 0 0 0 0 14 str V_revolution_9
/sys/kernel/debug/dlm/clvmd_all:lkb 5 1 6 7216 0 0 1 1 -1 3 0 0 0 0 83650438144450 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff9c53e17b0b40 1 0 0 0 0 0 64 str SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA
/sys/kernel/debug/dlm/clvmd_all:lkb 1 1 3 9160 0 1 1 2 1 -1 0 0 0 0 4068503064293 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff9c5be9f90140 0 0 0 0 0 0 10 str P_#orphans
/sys/kernel/debug/dlm/clvmd_all:lkb 4 0 0 18156 0 0 1 2 4 -1 0 0 0 0 4246521220895 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff9c5bee7cd180 0 0 0 0 0 0 64 str SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT
/sys/kernel/debug/dlm/clvmd_all:lkb 3 1 4 9142 0 1 10001 2 1 -1 0 0 0 0 4215704793911 0
/sys/kernel/debug/dlm/clvmd_all:lkb 2 0 0 9160 0 1 1 2 1 -1 0 0 0 0 4215701710165 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:5 1 6 7216 0 0 1 1 -1 3 29040794 1 14 "V_revolution_9"
/sys/kernel/debug/dlm/clvmd_locks:1 1 3 9160 0 1 1 2 1 -1 79610975881 1 64 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02Vpg0ck53kUwQmKVbWe7KQ0FO4R00ZhAKA"
/sys/kernel/debug/dlm/clvmd_locks:4 0 0 18156 0 0 1 2 4 -1 79432957746 0 10 "P_#orphans"
/sys/kernel/debug/dlm/clvmd_locks:3 1 4 9142 0 1 10001 2 1 -1 79463774181 0 64 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT"
/sys/kernel/debug/dlm/clvmd_locks:2 0 0 9160 0 1 1 2 1 -1 79463777269 0 64 "SkOqo4G1p8muTIl0dfSWksJqHiOMq02VXEV30auTytzjIG5zqTRnVdra1k3OJlqT"


[root@harding-02 ~]# dmsetup ls
revolution_9-mirror_1   (253:23)
mpathe  (253:10)
mpathe1 (253:11)
mpathd  (253:8)
revolution_9-mirror_1_mimagetmp_3       (253:21)
mpathc  (253:6)
mpathb  (253:4)
mpathh1 (253:17)
mpathc1 (253:7)
revolution_9-mirror_1_mlog      (253:19)
mpatha  (253:2)
rhel_harding--02-home   (253:18)
mpathf1 (253:13)
mpatha1 (253:3)
revolution_9-mirror_1_mimage_3  (253:24)
rhel_harding--02-swap   (253:1)
rhel_harding--02-root   (253:0)
revolution_9-mirror_1_mimage_2  (253:22)
mpathd1 (253:9)
mpathh  (253:16)
revolution_9-mirror_1_mimage_0  (253:20)
mpathg  (253:14)
mpathg1 (253:15)
mpathb1 (253:5)
mpathf  (253:12)

[root@harding-03 ~]# dmsetup ls
revolution_9-mirror_1   (253:23)
mpathe  (253:10)
mpathe1 (253:11)
mpathd  (253:8)
revolution_9-mirror_1_mimagetmp_3       (253:21)
mpathc  (253:6)
mpathb  (253:4)
mpathh1 (253:17)
mpathc1 (253:7)
revolution_9-mirror_1_mlog      (253:19)
mpatha  (253:2)
mpathf1 (253:13)
rhel_harding--03-home   (253:18)
mpatha1 (253:3)
revolution_9-mirror_1_mimage_3  (253:24)
revolution_9-mirror_1_mimage_2  (253:22)
mpathd1 (253:9)
rhel_harding--03-swap   (253:1)
rhel_harding--03-root   (253:0)
mpathh  (253:16)
revolution_9-mirror_1_mimage_0  (253:20)
mpathg  (253:14)
mpathg1 (253:15)
mpathb1 (253:5)
mpathf  (253:12)

[root@harding-02 ~]# dmsetup status
revolution_9-mirror_1: 0 4194304 mirror 2 253:21 253:24 1018/1024 1 DA 1 clustered-core
mpathe: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:240 A 0 0 1 65:240 A 0 0 1 E 0 2 2 8:112 A 0 0 1 65:112 A 0 0 1 
mpathe1: 0 524287920 linear 
mpathd: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:96 A 0 0 1 65:96 A 0 0 1 E 0 2 2 8:224 A 0 0 1 65:224 A 0 0 1 
revolution_9-mirror_1_mimagetmp_3: 0 4194304 error 
mpathc: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:208 A 0 0 1 65:208 A 0 0 1 E 0 2 2 8:80 A 0 0 1 65:80 A 0 0 1 
mpathb: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:64 A 0 0 1 65:64 A 1 0 1 E 0 2 2 8:192 A 0 0 1 65:192 A 1 0 1 
mpathh1: 0 524287920 linear 
mpathc1: 0 524287920 linear 
revolution_9-mirror_1_mlog: 0 8192 linear 
mpatha: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:176 A 0 0 1 65:176 A 0 0 1 E 0 2 2 8:48 A 0 0 1 65:48 A 0 0 1 
rhel_harding--02-home: 0 195362816 linear 
rhel_harding--02-home: 195362816 195362816 linear 
rhel_harding--02-home: 390725632 29794304 linear 
mpathf1: 0 524287920 linear 
mpatha1: 0 524287920 linear 
revolution_9-mirror_1_mimage_3: 0 4194304 linear 
rhel_harding--02-swap: 0 58613760 linear 
rhel_harding--02-root: 0 104857600 linear 
revolution_9-mirror_1_mimage_2: 0 4194304 linear 
mpathd1: 0 524287920 linear 
mpathh: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:160 A 0 0 1 65:160 A 0 0 1 E 0 2 2 65:32 A 0 0 1 66:32 A 0 0 1 
revolution_9-mirror_1_mimage_0: 0 4194304 linear 
mpathg: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 65:16 A 0 0 1 66:16 A 0 0 1 E 0 2 2 8:144 A 0 0 1 65:144 A 0 0 1 
mpathg1: 0 524287920 linear 
mpathb1: 0 524287920 linear 
mpathf: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:128 A 0 0 1 65:128 A 0 0 1 E 0 2 2 65:0 A 0 0 1 66:0 A 0 0 1 


[root@harding-03 ~]# dmsetup status
revolution_9-mirror_1: 0 4194304 mirror 2 253:21 253:24 1018/1024 1 DA 1 clustered-core
mpathe: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 65:112 A 0 0 1 65:240 A 0 0 1 E 0 2 2 8:112 A 0 0 1 8:240 A 0 0 1 
mpathe1: 0 524287920 linear 
mpathd: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:96 A 0 0 1 8:224 A 0 0 1 E 0 2 2 65:96 A 0 0 1 65:224 A 0 0 1 
revolution_9-mirror_1_mimagetmp_3: 0 4194304 error 
mpathc: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 65:80 A 0 0 1 65:208 A 0 0 1 E 0 2 2 8:80 A 0 0 1 8:208 A 0 0 1 
mpathb: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:64 A 1 0 1 8:192 A 1 0 1 E 0 2 2 65:64 A 1 0 1 65:192 A 1 0 1 
mpathh1: 0 524287920 linear 
mpathc1: 0 524287920 linear 
revolution_9-mirror_1_mlog: 0 8192 linear 
mpatha: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 65:48 A 0 0 1 65:176 A 0 0 1 E 0 2 2 8:48 A 0 0 1 8:176 A 0 0 1 
mpathf1: 0 524287920 linear 
rhel_harding--03-home: 0 195362816 linear 
rhel_harding--03-home: 195362816 195362816 linear 
rhel_harding--03-home: 390725632 29794304 linear 
mpatha1: 0 524287920 linear 
revolution_9-mirror_1_mimage_3: 0 4194304 linear 
revolution_9-mirror_1_mimage_2: 0 4194304 linear 
mpathd1: 0 524287920 linear 
rhel_harding--03-swap: 0 58613760 linear 
rhel_harding--03-root: 0 104857600 linear 
mpathh: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:160 A 0 0 1 65:32 A 0 0 1 E 0 2 2 65:160 A 0 0 1 66:32 A 0 0 1 
revolution_9-mirror_1_mimage_0: 0 4194304 linear 
mpathg: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 65:144 A 0 0 1 66:16 A 0 0 1 E 0 2 2 8:144 A 0 0 1 65:16 A 0 0 1 
mpathg1: 0 524287920 linear 
mpathb1: 0 524287920 linear 
mpathf: 0 524288000 multipath 2 0 1 0 2 1 A 0 2 2 8:128 A 0 0 1 65:0 A 0 0 1 E 0 2 2 65:128 A 0 0 1 66:0 A 0 0 1 



[root@harding-02 ~]# ps -ef | grep cmirror
root      9117     1  0 Jun10 ?        00:00:03 /usr/sbin/cmirrord
root     17296 16703  0 13:51 ttyS0    00:00:00 grep --color=auto cmirror


[root@harding-03 ~]# ps -ef | grep cmirror
root      7820  7248  0 13:51 ttyS0    00:00:00 grep --color=auto cmirror
root      9111     1  0 Jun10 ?        00:00:02 /usr/sbin/cmirrord


Version-Release number of selected component (if applicable):

3.10.0-1048.el7.x86_64

lvm2-2.02.185-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
lvm2-libs-2.02.185-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
lvm2-cluster-2.02.185-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
lvm2-lockd-2.02.185-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
lvm2-python-boom-0.9-17.el7.2    BUILT: Mon May 13 04:37:00 CDT 2019
cmirror-2.02.185-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
device-mapper-1.02.158-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
device-mapper-libs-1.02.158-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
device-mapper-event-1.02.158-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
device-mapper-event-libs-1.02.158-1.el7    BUILT: Mon May 13 04:36:30 CDT 2019
device-mapper-persistent-data-0.8.1-1.el7    BUILT: Sat May  4 14:53:53 CDT 2019



How reproducible:
Everytime

Comment 2 Corey Marthaler 2019-06-11 21:44:35 UTC
This existed in rhel7.6 as well.

3.10.0-957.el7.x86_64

lvm2-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
lvm2-libs-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
lvm2-cluster-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
lvm2-lockd-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
lvm2-python-boom-0.9-11.el7    BUILT: Mon Sep 10 04:49:22 CDT 2018
cmirror-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-libs-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-event-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-event-libs-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017


Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
sync percent for mirror_1 hasn't changed in the past minute
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   0/1 mirror(s) are fully synced: ( 99.51% )
   [...]

Comment 3 Corey Marthaler 2019-06-12 21:56:48 UTC
This doesn't require re'enabling the failed device and bringing it back into the VG, the deadlock and gfs withdraw happens before that takes place.

Comment 4 Corey Marthaler 2019-06-12 21:58:42 UTC
This issue does not exist in rhel7.5, so it appears to be a regression from 7.5 to 7.6.

Comment 5 Corey Marthaler 2019-06-13 18:33:10 UTC
I ran 20 iterations of this scenario on 7.5 and saw no issues, adding "Regression" keyword.

3.10.0-862.el7.x86_64

lvm2-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-libs-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-cluster-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
cmirror-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-libs-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-event-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-event-libs-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017

Comment 8 David Teigland 2019-06-17 18:24:48 UTC
Here's an attempt to reproduce the same kind of test.  I don't see any problems below or in /var/log/messages (which contains a lot of cmirror debugging), but I'm not entirely sure what I should be looking for.  I can provide lvm or clvmd or cmirror debugging messages if anyone wants to check for anything specific.


Create cmirror LV
-----------------

[root@null-02 ~]# lvcreate --type mirror --mirrorlog disk -m 2 -n gv2 -L1G cc
  Logical volume "gv2" created.

[root@null-02 ~]# dmsetup status
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_1: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 3 253:4 253:5 253:6 1593/2048 1 AAA 3 clustered-disk 253:3 A
cc-gv2_mlog: 0 8192 linear

[root@null-02 ~]# dmsetup table
cc-gv2_mimage_2: 0 2097152 linear 8:96 384
cc-gv2_mimage_1: 0 2097152 linear 8:64 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 3 253:4 0 253:5 0 253:6 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:80 384

[root@null-02 ~]# lvs -a -o+devices cc
  LV             VG Attr       LSize Log        Cpy%Sync Devices                             
  gv2            cc mwi-a-m--- 1.00g [gv2_mlog] 100.00   gv2_mimage_0(0),gv2_mimage_1(0),gv2_mimage_2(0)
  [gv2_mimage_0] cc iwi-aom--- 1.00g                     /dev/sdb(0)                        
  [gv2_mimage_1] cc iwi-aom--- 1.00g                     /dev/sde(0)                        
  [gv2_mimage_2] cc iwi-aom--- 1.00g                     /dev/sdg(0)                        
  [gv2_mlog]     cc lwi-aom--- 4.00m                     /dev/sdf(0)                        

[root@null-03 ~]# dmsetup status
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_1: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 3 253:4 253:5 253:6 2048/2048 1 AAA 3 clustered-disk 253:3 A
cc-gv2_mlog: 0 8192 linear

[root@null-03 ~]# dmsetup table
cc-gv2_mimage_2: 0 2097152 linear 8:96 384
cc-gv2_mimage_1: 0 2097152 linear 8:64 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 3 253:4 0 253:5 0 253:6 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:80 384

[root@null-03 ~]# lvs -a -o+devices cc
  LV             VG Attr       LSize Log        Cpy%Sync Devices                             
  gv2            cc mwi-a-m--- 1.00g [gv2_mlog] 100.00   gv2_mimage_0(0),gv2_mimage_1(0),gv2_mimage_2(0)
  [gv2_mimage_0] cc iwi-aom--- 1.00g                     /dev/sdb(0)                         
  [gv2_mimage_1] cc iwi-aom--- 1.00g                     /dev/sde(0)                         
  [gv2_mimage_2] cc iwi-aom--- 1.00g                     /dev/sdg(0)                         
  [gv2_mlog]     cc lwi-aom--- 4.00m                     /dev/sdf(0)                       



Create file system
------------------

[root@null-02 ~]# mkfs.gfs2 -o align=0 -J 32M -j 2 -p lock_dlm -t test:gv2 /dev/cc/gv2
It appears to contain an existing filesystem (gfs2)
/dev/cc/gv2 is a symbolic link to /dev/dm-7
This will destroy any data on /dev/dm-7
Are you sure you want to proceed? [y/n] y
Discarding device contents (may take a while on large devices): Done
Adding journals: Done
Building resource groups: Done
Creating quota file: Done
Writing superblock and syncing: Done
Device:                    /dev/cc/gv2
Block size:                4096
Device size:               1.00 GB (262144 blocks)
Filesystem size:           1.00 GB (262142 blocks)
Journals:                  2
Journal size:              32MB
Resource groups:           6
Locking protocol:          "lock_dlm"
Lock table:                "test:gv2"
UUID:                      a17fbd83-72cc-4224-b6d4-a8cfe8a2d41e


Mount and write to file system
------------------------------

[root@null-02 ~]# mount /dev/cc/gv2 /mnt
[root@null-02 ~]# cd /mnt
[root@null-02 mnt]# rsync -a --exclude *.o /root/lvm.git/ lvm-02.git/

[root@null-03 ~]# mount /dev/cc/gv2 /mnt
[root@null-03 ~]# cd /mnt
[root@null-03 mnt]# rsync -a --exclude *.o /root/lvm.git/ lvm-03.git/


Fail device for one mirror leg
------------------------------

[root@null-02 ~]# echo offline > /sys/block/sde/device/state
[root@null-03 ~]# echo offline > /sys/block/sde/device/state


Write to FS to trigger IO errors and recovery
----------------------------------------------

[root@null-02 ~]# dd if=/dev/zero of=/mnt/1M.2 bs=1M
^C^C^C168+0 records in
168+0 records out
176160768 bytes (176 MB) copied, 11.7896 s, 14.9 MB/s

[root@null-03 ~]# dd if=/dev/zero of=/mnt/1M.3 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00144624 s, 725 MB/s

[root@null-03 ~]# ls -l /mnt
total 173428
-rw-r--r--.  1 root root 176160768 Jun 17 06:41 1M.2
-rw-r--r--.  1 root root   1048576 Jun 17 06:47 1M.3
drwxrwxr-x. 25 3890 3890      3864 Jun 17 06:33 lvm-02.git
drwxrwxr-x. 25 3890 3890      3864 Jun 17 06:39 lvm-03.git


Check for failed mirror leg
---------------------------

[root@null-02 ~]# dmsetup table
cc-gv2_mimage_2: 0 2097152 linear 8:96 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 2 253:4 0 253:6 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:80 384

[root@null-02 ~]# dmsetup status
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A
cc-gv2_mlog: 0 8192 linear

[root@null-02 ~]# lvs -a -o+devices cc
  Couldn't find device with uuid zkOVWF-FQOi-XodY-u2z3-iCAs-Aj3u-PekSvj.
  LV             VG Attr       LSize Log        Cpy%Sync Devices
  gv2            cc mwi-aom--- 1.00g [gv2_mlog] 100.00   gv2_mimage_0(0),gv2_mimage_2(0)
  [gv2_mimage_0] cc iwi-aom--- 1.00g                     /dev/sdb(0)
  [gv2_mimage_2] cc iwi-aom--- 1.00g                     /dev/sdg(0)
  [gv2_mlog]     cc lwi-aom--- 4.00m                     /dev/sdf(0)

[root@null-03 ~]# dmsetup table
cc-gv2_mimage_2: 0 2097152 linear 8:96 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 2 253:4 0 253:6 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:80 384

[root@null-03 ~]# dmsetup status
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A
cc-gv2_mlog: 0 8192 linear

[root@null-03 ~]# lvs -a -o+devices cc
  Couldn't find device with uuid zkOVWF-FQOi-XodY-u2z3-iCAs-Aj3u-PekSvj.
  LV             VG Attr       LSize Log        Cpy%Sync Devices
  gv2            cc mwi-aom--- 1.00g [gv2_mlog] 100.00   gv2_mimage_0(0),gv2_mimage_2(0)
  [gv2_mimage_0] cc iwi-aom--- 1.00g                     /dev/sdb(0)
  [gv2_mimage_2] cc iwi-aom--- 1.00g                     /dev/sdg(0)
  [gv2_mlog]     cc lwi-aom--- 4.00m                     /dev/sdf(0)

Comment 9 David Teigland 2019-06-17 19:15:09 UTC
In my previous comment, mirror_image_fault_policy was "remove".  I changed it to "allocate" and things no longer work.

After failing the device I watched the dm state on both machines and copied the states below.  Perhaps the sequence of changes is meaningful to someone.  If anyone is interested, I have clvmd debugging (with lvm debugging included) in a terminal that I can search through, but didn't manage to get it saved to a file.



DM info from null-02
--------------------

[root@null-02 ~]# echo offline > /sys/block/sdg/device/state

[root@null-02 ~]# dmsetup table
cc-gv2_mimage_3: 0 2097152 linear 8:64 384
cc-gv2_mimage_2: 0 2097152 linear 8:80 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 3 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtYFmV17rBFakydtKSV5eRzwh4yZx8ePQo clustered-core 1024 2 253:5 0 253:8 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:32 384
cc-gv2_mimagetmp_3: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 2 253:4 0 253:6 0 1 handle_errors

[root@null-02 ~]# dmsetup status
cc-gv2_mimage_3: 0 2097152 linear
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 2 253:5 253:8 2048/2048 1 AA 1 clustered-core
cc-gv2_mlog: 0 8192 linear
cc-gv2_mimagetmp_3: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A

[root@null-02 ~]# dmsetup status
cc-gv2_mimage_3: 0 2097152 linear
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 2 253:5 253:8 2047/2048 1 RA 1 clustered-core
cc-gv2_mlog: 0 8192 linear
cc-gv2_mimagetmp_3: 0 2097152 error

[root@null-02 ~]# dmsetup table
cc-gv2_mimage_3: 0 2097152 linear 8:64 384
cc-gv2_mimage_2: 0 2097152 linear 8:80 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 3 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtYFmV17rBFakydtKSV5eRzwh4yZx8ePQo clustered-core 1024 2 253:5 0 253:8 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:32 384
cc-gv2_mimagetmp_3: 0 2097152 error

[root@null-02 ~]# dmsetup status
cc-gv2_mimage_3: 0 2097152 linear
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 2 253:5 253:8 2047/2048 1 RA 1 clustered-core
cc-gv2_mlog: 0 8192 linear
cc-gv2_mimagetmp_3: 0 2097152 error

[root@null-02 ~]# dmsetup table
cc-gv2_mimage_3: 0 2097152 linear 8:64 384
cc-gv2_mimage_2: 0 2097152 linear 8:80 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 3 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtYFmV17rBFakydtKSV5eRzwh4yZx8ePQo clustered-core 1024 2 253:5 0 253:8 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:32 384
cc-gv2_mimagetmp_3: 0 2097152 error



DM info from null-03
--------------------

[root@null-03 ~]# echo offline > /sys/block/sdg/device/state

[root@null-03 ~]# dmsetup status
cc-gv2_mimage_3: 0 2097152 linear
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 2 253:5 253:8 2048/2048 1 AA 1 clustered-core
cc-gv2_mlog: 0 8192 linear
cc-gv2_mimagetmp_3: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A

[root@null-03 ~]# dmsetup table
cc-gv2_mimage_3: 0 2097152 linear 8:64 384
cc-gv2_mimage_2: 0 2097152 linear 8:80 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:32 384
cc-gv2_mimagetmp_3: 0 2097152 error

[root@null-03 ~]# dmsetup status
cc-gv2_mimage_3: 0 2097152 linear
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA userspace 1 COM_FAILURE
cc-gv2_mlog: 0 8192 linear
cc-gv2_mimagetmp_3: 0 2097152 error

[root@null-03 ~]# dmsetup table
cc-gv2_mimage_3: 0 2097152 linear 8:64 384
cc-gv2_mimage_2: 0 2097152 linear 8:80 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:32 384
cc-gv2_mimagetmp_3: 0 2097152 error

[root@null-03 ~]# dmsetup status
cc-gv2_mimage_3: 0 2097152 linear
cc-gv2_mimage_2: 0 2097152 linear
cc-gv2_mimage_0: 0 2097152 linear
cc-gv2: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA userspace 1 COM_FAILURE
cc-gv2_mlog: 0 8192 linear
cc-gv2_mimagetmp_3: 0 2097152 error

[root@null-03 ~]# dmsetup table
cc-gv2_mimage_3: 0 2097152 linear 8:64 384
cc-gv2_mimage_2: 0 2097152 linear 8:80 384
cc-gv2_mimage_0: 0 2097152 linear 8:16 384
cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors
cc-gv2_mlog: 0 8192 linear 8:32 384
cc-gv2_mimagetmp_3: 0 2097152 error

Comment 10 David Teigland 2019-06-17 22:03:56 UTC
Repeated this without gfs2 on top of the LV.  I did dd to the gv3 to trigger the recovery.

Before:
  gv3               cc mwi-a-m--- 1.00g [gv3_mlog] 100.00                     gv3_mimage_0(0),gv3_mimage_1(0),gv3_mimage_2(0)
  [gv3_mimage_0]    cc iwi-aom--- 1.00g                                       /dev/sdb(256)
  [gv3_mimage_1]    cc iwi-aom--- 1.00g                                       /dev/sdf(256)
  [gv3_mimage_2]    cc iwi-aom--- 1.00g                                       /dev/sde(256)
  [gv3_mlog]        cc lwi-aom--- 4.00m                                       /dev/sdg(0)


# echo offline > /sys/block/sdf/device/state

After:
  gv3               cc mwi-a-m--- 1.00g [gv3_mlog] 100.00                     gv3_mimage_0(0),gv3_mimage_2(0),gv3_mimage_3(0)
  [gv3_mimage_0]    cc iwi-aom--- 1.00g                                       /dev/sdb(256)
  [gv3_mimage_2]    cc iwi-aom--- 1.00g                                       /dev/sde(256)
  [gv3_mimage_3]    cc iwi-aom--- 1.00g                                       /dev/sdc(1)
  [gv3_mlog]        cc lwi-aom--- 4.00m                                       /dev/sdg(0)


After:

[root@null-02 ~]# dmsetup status
cc-gv3_mimage_3: 0 2097152 linear 
cc-gv3_mimage_2: 0 2097152 linear 
cc-gv3_mimage_0: 0 2097152 linear 
cc-gv3: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA 3 clustered-disk 253:3 A
cc-gv3_mlog: 0 8192 linear 

[root@null-02 ~]# dmsetup table
cc-gv3_mimage_3: 0 2097152 linear 8:32 8576
cc-gv3_mimage_2: 0 2097152 linear 8:64 2097536
cc-gv3_mimage_0: 0 2097152 linear 8:16 2097536
cc-gv3: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDturrDY8xC8hBXeD7Obr1m3BywPXVmcCk4 clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors
cc-gv3_mlog: 0 8192 linear 8:96 384

[root@null-03 ~]# dmsetup status
cc-gv3_mimage_3: 0 2097152 linear 
cc-gv3_mimage_2: 0 2097152 linear 
cc-gv3_mimage_0: 0 2097152 linear 
cc-gv3: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA 3 clustered-disk 253:3 A
cc-gv3_mlog: 0 8192 linear 

[root@null-03 ~]# dmsetup table
cc-gv3_mimage_3: 0 2097152 linear 8:32 8576
cc-gv3_mimage_2: 0 2097152 linear 8:64 2097536
cc-gv3_mimage_0: 0 2097152 linear 8:16 2097536
cc-gv3: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDturrDY8xC8hBXeD7Obr1m3BywPXVmcCk4 clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors
cc-gv3_mlog: 0 8192 linear 8:96 384

Comment 11 Heinz Mauelshagen 2019-06-18 12:18:07 UTC
Dave,

'dmsetup ls -tree' would be helpful to see the temporary deeper device trees which come with 'mirror' layout changes (e.g. during repair).

Your examples in c#8 and c#9 show those deeper trees, i.e. clustered-core on top of clustered-disk and the new replacement leg.
It can be deducted that mimage_3 is the new leg because the lvm2 'mirror' runtime does not do in-place replacements but rather
increments suffixes addigng new legs. 'dmsetup ls --tree' would show those (interim) dependencies better.

In c#9, the primary leg of the temporary clustered-core log 'mirror' takes read error(s) which it shouldn't,
because that leg is composed of mimage_0 and mimage_2 forming a presumably intact clustered-disk log 'mirror'.

So why does it through read error(s)?

Comment 12 Jonathan Earl Brassow 2019-06-18 13:44:35 UTC
Solve the read error and you will solve the hang.

When the read from the primary fails, the sync for that region fails.  The sync process will move on and do the rest of the mirror, but it will never go back and redo the initial region - leaving the mirror in < 100% sync forever.

Comment 13 David Teigland 2019-06-18 14:39:30 UTC
Repeated the steps in comment 8, but I did not write to the file system after failing the device (I noticed that recovery had already been triggered.)  In this case things seemed to work.

Until now I hadn't noticed that dmeventd is leaving some messages about the event.  Here's what I found this time:

Jun 18 03:10:22 null-03 dmeventd[18342]: Secondary mirror device 253:5 has failed (D).
Jun 18 03:10:22 null-03 dmeventd[18342]: Device failure in cc-gv4.
Jun 18 03:10:22 null-03 dmeventd[18342]: WARNING: Not using lvmetad because of repair.
Jun 18 03:10:22 null-03 dmeventd[18342]: Couldn't find device with uuid 9DqAOn-76Yc-asP1-cMyK-der4-pI9G-ZVyQGr.
Jun 18 03:10:22 null-03 dmeventd[18342]: WARNING: Disabling lvmetad cache for repair command.
Jun 18 03:10:27 null-03 dmeventd[18342]: Mirror status: 1 of 3 images failed.
Jun 18 03:10:27 null-03 dmeventd[18342]: WARNING: Trying to up-convert to 3 images, 1 logs.
Jun 18 03:10:27 null-03 dmeventd[18342]: /run/lvm/lvmpolld.socket: connect failed: No such file or directory
Jun 18 03:10:27 null-03 dmeventd[18342]: WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld.
Jun 18 03:10:27 null-03 dmeventd[18342]: WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state.
Jun 18 03:10:27 null-03 dmeventd[18342]: cc/gv4: Converted: 1.17%
Jun 18 03:10:42 null-03 dmeventd[18342]: cc/gv4: Converted: 69.92%
Jun 18 03:10:57 null-03 dmeventd[18342]: cc/gv4: Converted: 100.00%
Jun 18 03:11:01 null-03 dmeventd[18342]: Logical volume cc/gv4 converted.
Jun 18 03:11:01 null-03 dmeventd[18342]: Repair of mirrored device cc-gv4 finished successfully.
Jun 18 03:11:01 null-03 dmeventd[18342]: cc-gv4 is now in-sync.

Looking back at these messages from previous tests, I do not see the lvmpolld warnnings, which means, I suspect, that previous instances were using lvmpolld.  This could be a clue that using lvmpolld is a problem.

Comment 14 David Teigland 2019-06-18 14:41:11 UTC
Created attachment 1581749 [details]
clvmd debugging from null-02

for test in previous comment

Comment 15 David Teigland 2019-06-18 14:41:51 UTC
Created attachment 1581750 [details]
clvmd debugging from null-03

Comment 16 David Teigland 2019-06-18 16:19:34 UTC
I've switched over to the simpler test that Jon showed us, of just adding legs to an existing mirror, and looking for dm error messages in the kernel logs.  This worked as a much simpler way of reproducing (assuming it's the same root issue .)


[root@null-03 ~]# lvcreate --type mirror -n lv1 -L2G -m1 cc
  Logical volume "lv1" created.

[root@null-03 ~]# lvconvert -m2 cc/lv1
  Logical volume cc/lv1 being converted.
  /run/lvm/lvmpolld.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld.
  WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state.
  cc/lv1: Converted: 0.59%
  cc/lv1: Converted: 25.00%
  cc/lv1: Converted: 49.22%
  cc/lv1: Converted: 73.24%
  cc/lv1: Converted: 97.66%
  cc/lv1: Converted: 100.00%
  Logical volume cc/lv1 converted.


Jun 18 04:55:16 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1 for events.
Jun 18 04:55:16 null-03 kernel: [66824.862710] device-mapper: raid1: Unable to read primary mirror during recovery
Jun 18 04:55:16 null-03 kernel: [66824.865747] device-mapper: raid1: Primary mirror (253:12) failed while out-of-sync: Reads may fail.
Jun 18 04:55:16 null-03 kernel: [66824.870731] device-mapper: raid1: Primary mirror (253:12) failed while out-of-sync: Reads may fail.
Jun 18 04:55:16 null-03 kernel: [66824.880482] device-mapper: raid1: Read failure on mirror device 253:12.  Trying alternative device.
Jun 18 04:55:16 null-03 kernel: [66824.880586] device-mapper: raid1: Mirror read failed.
Jun 18 04:55:16 null-03 kernel: [66824.905576] Buffer I/O error on dev dm-11, logical block 524272, async page read
Jun 18 04:55:19 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1_mimagetmp_2 for events.
Jun 18 04:55:19 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1 for events.
Jun 18 04:55:19 null-03 kernel: [66827.964811] device-mapper: raid1: Mirror read failed.
Jun 18 04:55:19 null-03 kernel: [66827.970632] Buffer I/O error on dev dm-11, logical block 524272, async page read
Jun 18 04:56:20 null-03 dmeventd[18342]: cc-lv1_mimagetmp_2 is now in-sync.
Jun 18 04:56:21 null-03 dmeventd[18342]: cc-lv1 is now in-sync.
Jun 18 04:56:39 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1_mimagetmp_2 for events.
Jun 18 04:56:39 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1 for events.
Jun 18 04:56:39 null-03 cmirrord[18281]: Log entry already exists: LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt9ejBHHU9CxFwcyRqzhdZQ0f9oaAOelKI
Jun 18 04:56:39 null-03 cmirrord[18281]: clog_resume:  Failed to create cluster CPG
Jun 18 04:56:39 null-03 kernel: [66907.129116] device-mapper: raid1: log presuspend failed
Jun 18 04:56:39 null-03 kernel: [66907.135853] device-mapper: raid1: log presuspend failed
Jun 18 04:56:39 null-03 kernel: [66907.144906] device-mapper: raid1: log postsuspend failed
Jun 18 04:56:39 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1 for events.
Jun 18 04:56:39 null-03 dmeventd[18342]: cc-lv1 is now in-sync.
Jun 18 04:56:39 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1 for events.
Jun 18 04:56:42 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1 for events.


Jun 18 04:48:58 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events.
Jun 18 04:49:25 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1 for events.
Jun 18 04:49:25 null-02 kernel: [66516.889274] device-mapper: raid1: Unable to read primary mirror during recovery
Jun 18 04:49:25 null-02 kernel: [66516.897430] device-mapper: raid1: Primary mirror (253:12) failed while out-of-sync: Reads may fail.
Jun 18 04:49:25 null-02 kernel: [66516.907741] device-mapper: raid1: Mirror read failed.
Jun 18 04:49:28 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1_mimagetmp_2 for events.
Jun 18 04:49:28 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events.
Jun 18 04:49:28 null-02 kernel: [66520.035985] device-mapper: raid1: Mirror read failed.
Jun 18 04:49:28 null-02 kernel: [66520.041901] Buffer I/O error on dev dm-11, logical block 524272, async page read
Jun 18 04:49:28 null-02 kernel: [66520.061904] device-mapper: raid1: Mirror read failed.
Jun 18 04:49:28 null-02 kernel: [66520.068474] device-mapper: raid1: Mirror read failed.
Jun 18 04:49:28 null-02 kernel: [66520.074374] Buffer I/O error on dev dm-11, logical block 524272, async page read
Jun 18 04:50:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1_mimagetmp_2 for events.
Jun 18 04:50:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1 for events.
Jun 18 04:50:47 null-02 kernel: [66599.133075] device-mapper: raid1: log presuspend failed
Jun 18 04:50:47 null-02 kernel: [66599.151707] device-mapper: raid1: log presuspend failed
Jun 18 04:50:47 null-02 kernel: [66599.157967] device-mapper: raid1: log postsuspend failed
Jun 18 04:50:47 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events.
Jun 18 04:50:48 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1 for events.
Jun 18 04:50:51 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events.

I have clvmd and cmirror debug logs that I will attach next.

Comment 17 David Teigland 2019-06-18 16:21:54 UTC
Created attachment 1581780 [details]
clvmd debug output from null-02

related to comment 16

Comment 18 David Teigland 2019-06-18 16:22:38 UTC
Created attachment 1581781 [details]
clvmd debug output from null-03

related to comment 16

Comment 19 David Teigland 2019-06-18 16:31:02 UTC
Created attachment 1581783 [details]
cmirrord debug output from null-02

related to comment 16

Comment 20 David Teigland 2019-06-18 17:11:09 UTC
Repeated comment 16 using lvm 2.02.176 so that working/non-working debug logs can be compared.

[root@null-02 ~]# lvcreate --type mirror -n lv3 -L2G -m1 cc
  Logical volume "lv3" created.

[root@null-02 ~]# lvs -a -o+devices cc
  LV             VG Attr       LSize Log        Cpy%Sync Devices
  lv3            cc mwi-a-m--- 2.00g [lv3_mlog] 100.00   lv3_mimage_0(0),lv3_mimage_1(0)
  [lv3_mimage_0] cc iwi-aom--- 2.00g                     /dev/sdb(0)
  [lv3_mimage_1] cc iwi-aom--- 2.00g                     /dev/sdg(0)
  [lv3_mlog]     cc lwi-aom--- 4.00m                     /dev/sde(0)

[root@null-02 ~]# lvconvert -m2 cc/lv3
  Logical volume cc/lv3 being converted.
  /run/lvm/lvmpolld.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld.
  WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state.
  cc/lv3: Converted: 1.17%
  cc/lv3: Converted: 35.94%
  cc/lv3: Converted: 70.12%
  cc/lv3: Converted: 100.00%
  Logical volume cc/lv3 converted.

[root@null-02 ~]# lvs -a -o+devices cc
  LV             VG Attr       LSize Log        Cpy%Sync Devices                                        
  lv3            cc mwi-a-m--- 2.00g [lv3_mlog] 100.00   lv3_mimage_0(0),lv3_mimage_1(0),lv3_mimage_2(0)
  [lv3_mimage_0] cc iwi-aom--- 2.00g                     /dev/sdb(0)                                    
  [lv3_mimage_1] cc iwi-aom--- 2.00g                     /dev/sdg(0)                                    
  [lv3_mimage_2] cc iwi-aom--- 2.00g                     /dev/sdf(0)                                    
  [lv3_mlog]     cc lwi-aom--- 4.00m                     /dev/sde(0)                                    


Jun 18 05:40:40 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events.
Jun 18 05:42:01 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3 for events.
Jun 18 05:42:01 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3_mimagetmp_2 for events.
Jun 18 05:42:01 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events.
Jun 18 05:42:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3_mimagetmp_2 for events.
Jun 18 05:42:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3 for events.
Jun 18 05:42:47 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events.
Jun 18 05:42:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3 for events.
Jun 18 05:42:50 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events.

Jun 18 05:46:31 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events.
Jun 18 05:47:33 null-03 dmeventd[18342]: cc-lv3 is now in-sync.
Jun 18 05:47:52 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3 for events.
Jun 18 05:47:52 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3_mimagetmp_2 for events.
Jun 18 05:47:53 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events.
Jun 18 05:48:36 null-03 dmeventd[18342]: cc-lv3 is now in-sync.
Jun 18 05:48:38 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3_mimagetmp_2 for events.
Jun 18 05:48:38 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3 for events.
Jun 18 05:48:38 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events.
Jun 18 05:48:38 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3 for events.
Jun 18 05:48:41 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events.


Attaching clvmd debug logs next.

Comment 21 David Teigland 2019-06-18 17:12:25 UTC
Created attachment 1581799 [details]
clvmd debug output from null-02

related to comment 20

Comment 22 David Teigland 2019-06-18 17:13:00 UTC
Created attachment 1581800 [details]
clvmd debug output from null-03

related to comment 20

Comment 23 David Teigland 2019-06-18 17:25:35 UTC
Looks like I uploaded the wrong files in comments 17 and 18, I'll have to rerun the test and collect the output again.

Comment 24 David Teigland 2019-06-18 17:34:47 UTC
Repeating the test in comment 16 to recollect clvmd debug logs.  It noticed that the kernel dm errors appear immediately after running the lvconvert -m2, so something is going wrong right at the start of the command.

[root@null-02 ~]# lvcreate --type mirror -n lv4 -L2G -m1 cc
  Logical volume "lv4" created.

[root@null-02 ~]# lvs -a -o+devices cc
  LV             VG Attr       LSize Log        Cpy%Sync Devices
  lv4            cc mwi-a-m--- 2.00g [lv4_mlog] 100.00   lv4_mimage_0(0),lv4_mimage_1(0)
  [lv4_mimage_0] cc iwi-aom--- 2.00g                     /dev/sdb(0)
  [lv4_mimage_1] cc iwi-aom--- 2.00g                     /dev/sdg(0)
  [lv4_mlog]     cc lwi-aom--- 4.00m                     /dev/sde(0)

[root@null-02 ~]# dmsetup ls --tree
cc-lv4 (253:6)
 ├─cc-lv4_mimage_1 (253:5)
 │  └─ (8:96)
 ├─cc-lv4_mimage_0 (253:4)
 │  └─ (8:16)
 └─cc-lv4_mlog (253:3)
    └─ (8:64)

[root@null-02 ~]# dmsetup table
cc-lv4_mimage_1: 0 4194304 linear 8:96 384
cc-lv4: 0 4194304 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt1oI7Pq0KGDfvmvpS09DS7nop9GbSKP01 clustered-disk 253:3 1024 2 253:4 0 253:5 0 1 handle_errors
cc-lv4_mimage_0: 0 4194304 linear 8:16 384
cc-lv4_mlog: 0 8192 linear 8:64 384

[root@null-02 ~]# lvconvert -m2 cc/lv4
  Logical volume cc/lv4 being converted.
  /run/lvm/lvmpolld.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld.
  WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state.
  cc/lv4: Converted: 0.59%
  cc/lv4: Converted: 34.96%
  cc/lv4: Converted: 68.75%
  cc/lv4: Converted: 100.00%
  Logical volume cc/lv4 converted.

[root@null-02 ~]# dmsetup status
rhel_null--02-swap: 0 16416768 linear 
rhel_null--02-root: 0 104857600 linear 
cc-lv4_mimage_2: 0 4194304 linear 
cc-lv4_mimage_1: 0 4194304 linear 
cc-lv4: 0 4194304 mirror 3 253:4 253:5 253:8 4096/4096 1 AAA 3 clustered-disk 253:3 A
cc-lv4_mimage_0: 0 4194304 linear 
cc-lv4_mlog: 0 8192 linear 
rhel_null--02-home: 0 854466560 linear 

[root@null-02 ~]# dmsetup table
rhel_null--02-swap: 0 16416768 linear 8:2 2048
rhel_null--02-root: 0 104857600 linear 8:2 870885376
cc-lv4_mimage_2: 0 4194304 linear 8:80 384
cc-lv4_mimage_1: 0 4194304 linear 8:96 384
cc-lv4: 0 4194304 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt1oI7Pq0KGDfvmvpS09DS7nop9GbSKP01 clustered-disk 253:3 1024 3 253:4 0 253:5 0 253:8 0 1 handle_errors
cc-lv4_mimage_0: 0 4194304 linear 8:16 384
cc-lv4_mlog: 0 8192 linear 8:64 384
rhel_null--02-home: 0 854466560 linear 8:2 16418816


Jun 18 06:06:23 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events.
Jun 18 06:08:08 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4 for events.
Jun 18 06:08:08 null-02 kernel: [71239.777158] device-mapper: raid1: Unable to read primary mirror during recovery
Jun 18 06:08:08 null-02 kernel: [71239.785568] device-mapper: raid1: Primary mirror (253:7) failed while out-of-sync: Reads may fail.
Jun 18 06:08:08 null-02 kernel: [71239.797233] device-mapper: raid1: Mirror read failed.
Jun 18 06:08:08 null-02 kernel: [71239.803463] device-mapper: raid1: Mirror read failed.
Jun 18 06:08:08 null-02 kernel: [71239.809550] Buffer I/O error on dev dm-6, logical block 524272, async page read
Jun 18 06:08:11 null-02 kernel: [71242.905454] device-mapper: raid1: Mirror read failed.
Jun 18 06:08:11 null-02 kernel: [71242.912274] device-mapper: raid1: Mirror read failed.
Jun 18 06:08:11 null-02 kernel: [71242.918414] Buffer I/O error on dev dm-6, logical block 524272, async page read
Jun 18 06:08:11 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4_mimagetmp_2 for events.
Jun 18 06:08:11 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events.
Jun 18 06:08:57 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4_mimagetmp_2 for events.
Jun 18 06:08:57 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4 for events.
Jun 18 06:08:57 null-02 kernel: [71288.625293] device-mapper: raid1: log presuspend failed
Jun 18 06:08:57 null-02 kernel: [71288.632268] device-mapper: raid1: log presuspend failed
Jun 18 06:08:57 null-02 kernel: [71288.638662] device-mapper: raid1: log postsuspend failed
Jun 18 06:08:57 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events.
Jun 18 06:08:58 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4 for events.
Jun 18 06:09:01 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events.


Jun 18 06:12:14 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events.
Jun 18 06:13:16 null-03 dmeventd[18342]: cc-lv4 is now in-sync.
Jun 18 06:14:00 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4 for events.
Jun 18 06:14:00 null-03 kernel: [71548.124980] device-mapper: raid1: Unable to read primary mirror during recovery
Jun 18 06:14:00 null-03 kernel: [71548.128941] device-mapper: raid1: Primary mirror (253:7) failed while out-of-sync: Reads may fail.
Jun 18 06:14:00 null-03 kernel: [71548.132884] device-mapper: raid1: Primary mirror (253:7) failed while out-of-sync: Reads may fail.
Jun 18 06:14:00 null-03 kernel: [71548.142573] device-mapper: raid1: Read failure on mirror device 253:7.  Trying alternative device.
Jun 18 06:14:00 null-03 kernel: [71548.162409] device-mapper: raid1: Mirror read failed.
Jun 18 06:14:00 null-03 kernel: [71548.168098] Buffer I/O error on dev dm-6, logical block 524272, async page read
Jun 18 06:14:03 null-03 kernel: [71551.224224] device-mapper: raid1: Mirror read failed.
Jun 18 06:14:03 null-03 kernel: [71551.230041] Buffer I/O error on dev dm-6, logical block 524272, async page read
Jun 18 06:14:03 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4_mimagetmp_2 for events.
Jun 18 06:14:03 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events.
Jun 18 06:14:47 null-03 dmeventd[18342]: cc-lv4 is now in-sync.
Jun 18 06:14:48 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4_mimagetmp_2 for events.
Jun 18 06:14:48 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4 for events.
Jun 18 06:14:48 null-03 cmirrord[18281]: Log entry already exists: LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt1oI7Pq0KGDfvmvpS09DS7nop9GbSKP01
Jun 18 06:14:48 null-03 cmirrord[18281]: clog_resume:  Failed to create cluster CPG
Jun 18 06:14:48 null-03 kernel: [71596.993354] device-mapper: raid1: log presuspend failed
Jun 18 06:14:48 null-03 kernel: [71597.007352] device-mapper: raid1: log presuspend failed
Jun 18 06:14:49 null-03 kernel: [71597.017959] device-mapper: raid1: log postsuspend failed
Jun 18 06:14:49 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events.
Jun 18 06:14:49 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4 for events.
Jun 18 06:14:52 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events.

Comment 25 David Teigland 2019-06-18 17:37:45 UTC
Created attachment 1581804 [details]
clvmd debug output from null-02

related to comment 24 (using lv4)

Comment 26 David Teigland 2019-06-18 17:38:28 UTC
Created attachment 1581805 [details]
clvmd debug output from null-03

related to comment 24 (using lv4)

Comment 27 David Teigland 2019-06-18 18:47:13 UTC
We can now compare clvmd debug from comment 21 (old and working) vs clvmd debug from comment 25 (current unworking).  Others will have better insight into the differences than I do.  I'm just looking for differences without much understanding of what's happening, and notice what seems to be a significant difference here:

old:
#activate/activate.c:1986   Monitoring cc/lv3
#activate/activate.c:1809   Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtUkZ6bljf2derSOBNfYrzL3XfXd52KIF9 for events

new:
#activate/activate.c:2015    Monitoring cc/lv4 with libdevmapper-event-lvm2mirror.so.
#activate/activate.c:1824    Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtIABo3wqHQ4X2IlVFykgFEh466n32unJc for ev
#activate/dev_manager.c:3230    Creating ACTIVATE origin-only tree for cc/lv4.
(followed by a significant amount of activation code being run that does not appear in the old code)

then later the same kind of thing,

old:
#activate/activate.c:1986   Monitoring cc/lv3_mimagetmp_2
#activate/activate.c:1809   Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtBZzuiGmdDcdhmygdrvdq3wnnW99x9t42 for events

new:
#activate/activate.c:2015    Monitoring cc/lv4_mimagetmp_2 with libdevmapper-event-lvm2mirror.so.
#activate/activate.c:1824    Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtkWX68H0Li66mMWTc39XtKRGRzwgADs2d for eve
#activate/dev_manager.c:3230    Creating ACTIVATE origin-only tree for cc/lv4_mimagetmp_2.
(followed by a significant amount of activation code being run that does not appear in the old code)

Each time that Monitoring occurs, the old code seems to do nothing, but the new code executes some significant activation-related code.

Comment 28 David Teigland 2019-06-18 19:41:32 UTC
Following instructions from Zdenek, I reverted commit a8921be641afe865c177e11b8859f4b937f76995 and applied the patch from https://bugzilla.redhat.com/show_bug.cgi?id=1628529#c54

The problem appears to be gone.

[root@null-02 ~]# lvcreate --type mirror -n lv5 -L2G -m1 cc
  Logical volume "lv5" created.

[root@null-02 ~]# lvs -a -o+devices cc
  LV             VG Attr       LSize Log        Cpy%Sync Devices                        
  lv5            cc mwi-a-m--- 2.00g [lv5_mlog] 100.00   lv5_mimage_0(0),lv5_mimage_1(0)
  [lv5_mimage_0] cc iwi-aom--- 2.00g                     /dev/sdb(0)                    
  [lv5_mimage_1] cc iwi-aom--- 2.00g                     /dev/sdg(0)                    
  [lv5_mlog]     cc lwi-aom--- 4.00m                     /dev/sde(0)                    

[root@null-02 ~]# lvconvert -m2 cc/lv5
  Logical volume cc/lv5 being converted.
  /run/lvm/lvmpolld.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld.
  WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state.
  cc/lv5: Converted: 1.56%
  cc/lv5: Converted: 35.74%
  cc/lv5: Converted: 69.92%
  cc/lv5: Converted: 100.00%
  Logical volume cc/lv5 converted.

[root@null-02 ~]# lvs -a -o+devices cc
  LV             VG Attr       LSize Log        Cpy%Sync Devices                                        
  lv5            cc mwi-a-m--- 2.00g [lv5_mlog] 100.00   lv5_mimage_0(0),lv5_mimage_1(0),lv5_mimage_2(0)
  [lv5_mimage_0] cc iwi-aom--- 2.00g                     /dev/sdb(0)                                    
  [lv5_mimage_1] cc iwi-aom--- 2.00g                     /dev/sdg(0)                                    
  [lv5_mimage_2] cc iwi-aom--- 2.00g                     /dev/sdf(0)                                    
  [lv5_mlog]     cc lwi-aom--- 4.00m                     /dev/sde(0)  


Jun 18 08:13:25 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events.
Jun 18 08:16:16 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5 for events.
Jun 18 08:16:16 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5_mimagetmp_2 for events.
Jun 18 08:16:16 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events.
Jun 18 08:17:03 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5_mimagetmp_2 for events.
Jun 18 08:17:03 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5 for events.
Jun 18 08:17:03 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events.
Jun 18 08:17:03 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5 for events.
Jun 18 08:17:06 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events.

Jun 18 08:19:16 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events.
Jun 18 08:20:17 null-03 dmeventd[18342]: cc-lv5 is now in-sync.
Jun 18 08:22:08 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5 for events.
Jun 18 08:22:08 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5_mimagetmp_2 for events.
Jun 18 08:22:08 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events.
Jun 18 08:22:52 null-03 dmeventd[18342]: cc-lv5 is now in-sync.
Jun 18 08:22:54 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5_mimagetmp_2 for events.
Jun 18 08:22:54 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5 for events.
Jun 18 08:22:54 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events.
Jun 18 08:22:54 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5 for events.
Jun 18 08:22:57 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events.

Comment 29 David Teigland 2019-06-18 19:42:59 UTC
Created attachment 1581876 [details]
clvmd debug output from null-02

related to comment 28 (lv5)

Comment 30 David Teigland 2019-06-18 19:43:43 UTC
Created attachment 1581877 [details]
clvmd debug output from null-03

related to comment 28 (lv5)

Comment 32 Zdenek Kabelac 2019-06-19 13:32:25 UTC
Pushed 'small' diff patch upstream:

https://www.redhat.com/archives/lvm-devel/2019-June/msg00069.html

Comment 33 Jonathan Earl Brassow 2019-06-19 21:06:45 UTC
I've tested this on my systems with the latest stable branch (top commit f3a87a2c2e339328ea4d7448f43d5317806a6b24) and it works again.  No more 'primary read failure's.

Comment 34 Jonathan Earl Brassow 2019-06-19 21:14:46 UTC
The fix in place for bug 1711427 addresses this bug.  Things are getting stuck because of the 'primary read failure'.  This isssue is distilled a bit better in bug 1711427.  We will fix it there.

*** This bug has been marked as a duplicate of bug 1711427 ***


Note You need to log in before you can comment on or make changes to this bug.