Bug 444187 - cmirror copy percent stuck after upconvert
cmirror copy percent stuck after upconvert
Status: CLOSED DEFERRED
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror-kernel (Show other bugs)
4
All Linux
high Severity high
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-25 14:44 EDT by Corey Marthaler
Modified: 2013-09-23 11:32 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 11:32:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2008-04-25 14:44:41 EDT
Description of problem:
I saw this issue while running the cmirror failure scenario test helter_skelter
during 4.6.Z regression testing.

It had properly failed the secondary leg of a sync'ed two legged cmirror,
re-added that failed device, and upconverted the cmirror back to it's original
self. However after that upconvt, the syncing got stuck at 24.50%. The device
that had been failed was /dev/sde.

Here are the cmds that the test executed (some of them on multiple nodes):
qarshd[8006]: Running cmdline: echo running > /sys/block/sde/device/state
qarshd[8008]: Running cmdline: pvcreate /dev/sde1
qarshd[8011]: Running cmdline: vgextend helter_skelter /dev/sde1
qarshd[8014]: Running cmdline: lvconvert -m 1
helter_skelter/syncd_secondary_2legs_1 /dev/sdg1:0-1000 /dev/sde1:0-1000
/dev/sdh1:0-150
qarshd[8076]: Running cmdline: lvs -a -o +devices


  syncd_secondary_2legs_1            helter_skelter mwi-ao 800.00M             
      syncd_secondary_2legs_1_mlog  24.50
syncd_secondary_2legs_1_mimage_0(0),syncd_secondary_2legs_1_mimage_1(0)
  [syncd_secondary_2legs_1_mimage_0] helter_skelter iwi-ao 800.00M             
                                          /dev/sdg1(0)   
  [syncd_secondary_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M             
                                          /dev/sde1(0)   
  [syncd_secondary_2legs_1_mlog]     helter_skelter lwi-ao   4.00M             
                                          /dev/sdh1(0)   


[root@taft-04 ~]# dmsetup ls
helter_skelter-syncd_secondary_2legs_1_mlog     (253, 2)
helter_skelter-syncd_secondary_2legs_1_mimage_1 (253, 4)
helter_skelter-syncd_secondary_2legs_1_mimage_0 (253, 3)
VolGroup00-LogVol01     (253, 1)
helter_skelter-syncd_secondary_2legs_1  (253, 5)
VolGroup00-LogVol00     (253, 0)

[root@taft-04 ~]# dmsetup table
helter_skelter-syncd_secondary_2legs_1_mlog: 0 8192 linear 8:113 384
helter_skelter-syncd_secondary_2legs_1_mimage_1: 0 1638400 linear 8:65 384
helter_skelter-syncd_secondary_2legs_1_mimage_0: 0 1638400 linear 8:97 384
VolGroup00-LogVol01: 0 4063232 linear 8:2 138740096
helter_skelter-syncd_secondary_2legs_1: 0 1638400 mirror clustered_disk 4 253:2
1024 LVM-cC8UJX3m1XXl0YkXFyqz0M9Sn5OYmS4dGjhWTH2p7vcuD1IxuTzQrCz9adFECdn5
block_on_error 2 253:3 0 253:4 0
VolGroup00-LogVol00: 0 138739712 linear 8:2 384

[root@taft-04 ~]# dmsetup status
helter_skelter-syncd_secondary_2legs_1_mlog: 0 8192 linear
helter_skelter-syncd_secondary_2legs_1_mimage_1: 0 1638400 linear
helter_skelter-syncd_secondary_2legs_1_mimage_0: 0 1638400 linear
VolGroup00-LogVol01: 0 4063232 linear
helter_skelter-syncd_secondary_2legs_1: 0 1638400 mirror 2 253:3 253:4 395/1600
1 AA 3 clustered_disk 253:2 A
VolGroup00-LogVol00: 0 138739712 linear


[root@taft-04 ~]# dmsetup info
Name:              helter_skelter-syncd_secondary_2legs_1_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-cC8UJX3m1XXl0YkXFyqz0M9Sn5OYmS4dGjhWTH2p7vcuD1IxuTzQrCz9adFECdn5

Name:              helter_skelter-syncd_secondary_2legs_1_mimage_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 4
Number of targets: 1
UUID: LVM-cC8UJX3m1XXl0YkXFyqz0M9Sn5OYmS4dKT2F2Ho3htZsHu2AxD0sywXWlhZZt8eG

Name:              helter_skelter-syncd_secondary_2legs_1_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-cC8UJX3m1XXl0YkXFyqz0M9Sn5OYmS4dUYeASAijyVKQUyzHa5m8idBXAnDpti3c

Name:              VolGroup00-LogVol01
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-dfWPWFeIMdkAIVi5lw0bYx42qTAQn1AWPNgWk6b75ZhbLyV4S6UFcdzCqlgwYc17

Name:              helter_skelter-syncd_secondary_2legs_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      11
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-cC8UJX3m1XXl0YkXFyqz0M9Sn5OYmS4dwwVGSScYzra36AgG6rgHnqTK4LHPPSCg

Name:              VolGroup00-LogVol00
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-dfWPWFeIMdkAIVi5lw0bYx42qTAQn1AW8sS8p0w32HSh384Qo5b04OHYCDpoZU3M


The only failure messages that I saw in the console was the following:
Apr 24 22:29:41 taft-01 qarshd[24468]: Running cmdline: echo running >
/sys/block/sde/device/state
Apr 24 22:29:47 taft-01 [5594]: Monitoring mirror device
helter_skelter-syncd_secondary_2legs_1 for events
Apr 24 22:30:34 taft-01 kernel: dm-cmirror: Error listening for server(1)
response for adFECdn5: -110


Apr 24 22:30:35 taft-04 kernel: dm-cmirror: Error listening for server(1)
response for adFECdn5: -110


Version-Release number of selected component (if applicable):
2.6.9-67.0.15.ELsmp
cmirror-kernel-2.6.9-38.12
lvm2-cluster-2.02.27-2.el4_6.1
lvm2-2.02.27-2.el4_6.1
Comment 1 Corey Marthaler 2008-07-09 17:48:02 EDT
I appear to have reproduced this issue. Just like in the original comment, this
happened while doing an upconvert:

Up converting linear(s) back to mirror(s) on taft-01...
taft-01: lvconvert -m 2 helter_skelter/syncd_primary_3legs_1 /dev/sdh1:0-1000
/dev/sdg1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150
taft-01: lvconvert -m 2 helter_skelter/syncd_primary_3legs_2 /dev/sdh1:0-1000
/dev/sdg1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150
Verifying the up conversion from linear(s) to mirror(s)
Verifying device /dev/sdh1 *IS* in the mirror(s)
Verifying device /dev/sdg1 *IS* in the mirror(s)
Verifying device /dev/sde1 *IS* in the mirror(s)
Verifying device /dev/sdf1 *IS* in the mirror(s)

Waiting until all mirrors become fully syncd...
        1/2 mirror(s) are fully synced: ( 1=100.00% 2=99.50% )
        1/2 mirror(s) are fully synced: ( 1=100.00% 2=99.50% )
        1/2 mirror(s) are fully synced: ( 1=100.00% 2=99.50% )

I also saw the server errors again on taft-04:
dm-cmirror: Error listening for server(1) response for PWmuUdfP: -110


  syncd_primary_3legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_1_mlog 100.00        
syncd_primary_3legs_1_mimage_0(0),syncd_primary_3legs_1_mimage_1(0),syncd_primary_3legs_1_mimage_2(0)
  [syncd_primary_3legs_1_mimage_0] helter_skelter iwi-ao 800.00M               
                                              /dev/sde1(0)            
  [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 800.00M               
                                              /dev/sdh1(0)            
  [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 800.00M               
                                              /dev/sdg1(0)            
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                              /dev/sdf1(0)            
  syncd_primary_3legs_2            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_2_mlog  99.50        
syncd_primary_3legs_2_mimage_0(0),syncd_primary_3legs_2_mimage_1(0),syncd_primary_3legs_2_mimage_2(0)
  [syncd_primary_3legs_2_mimage_0] helter_skelter Iwi-ao 800.00M               
                                              /dev/sde1(200)          
  [syncd_primary_3legs_2_mimage_1] helter_skelter Iwi-ao 800.00M               
                                              /dev/sdh1(200)          
  [syncd_primary_3legs_2_mimage_2] helter_skelter Iwi-ao 800.00M               
                                              /dev/sdg1(200)          
  [syncd_primary_3legs_2_mlog]     helter_skelter lwi-ao   4.00M               
                                              /dev/sdf1(1)   

[root@taft-01 ~]# dmsetup status
helter_skelter-syncd_primary_3legs_1_mimage_0: 0 1638400 linear
helter_skelter-syncd_primary_3legs_2: 0 1638400 mirror 3 253:8 253:9 253:10
1599/1600 1 AAA 3 clustered_disk 253:7 A
helter_skelter-syncd_primary_3legs_1_mlog: 0 8192 linear
helter_skelter-syncd_primary_3legs_1: 0 1638400 mirror 3 253:3 253:4 253:5
1600/1600 1 AAA 3 clustered_disk 253:2 A
helter_skelter-syncd_primary_3legs_2_mlog: 0 8192 linear
VolGroup00-LogVol01: 0 20447232 linear
helter_skelter-syncd_primary_3legs_2_mimage_2: 0 1638400 linear
VolGroup00-LogVol00: 0 122355712 linear
helter_skelter-syncd_primary_3legs_2_mimage_1: 0 1638400 linear
helter_skelter-syncd_primary_3legs_1_mimage_2: 0 1638400 linear
helter_skelter-syncd_primary_3legs_2_mimage_0: 0 1638400 linear
helter_skelter-syncd_primary_3legs_1_mimage_1: 0 1638400 linear


2.6.9-76.ELsmp

lvm2-2.02.37-3.el4    BUILT: Thu Jun 12 10:09:19 CDT 2008
lvm2-cluster-2.02.37-3.el4    BUILT: Thu Jun 12 10:22:07 CDT 2008
device-mapper-1.02.25-2.el4    BUILT: Mon Jun  9 09:28:41 CDT 2008
cmirror-1.0.1-1    BUILT: Tue Jan 30 17:28:02 CST 2007
cmirror-kernel-2.6.9-43.2    BUILT: Wed Jul  2 15:10:42 CDT 2008
Comment 2 Corey Marthaler 2009-02-17 10:40:30 EST
This is still an issue in 4.8. I hit this while running cmirror_lock_stress.

[root@hayes-03 ~]# lvs -a -o +devices lock_stress/hayes-03.16959
  LV             VG          Attr   LSize   Origin Snap%  Move Log                 Copy%  Convert Devices                                                                         
  hayes-03.16959 lock_stress mwi-a- 500.00M                    hayes-03.16959_mlog  35.20         hayes-03.16959_mimage_0(0),hayes-03.16959_mimage_1(0),hayes-03.16959_mimage_2(0)

[root@hayes-03 ~]# lvs -a -o +devices lock_stress/hayes-03.16959
  LV             VG          Attr   LSize   Origin Snap%  Move Log                 Copy%  Convert Devices                                                                         
  hayes-03.16959 lock_stress mwi-a- 500.00M                    hayes-03.16959_mlog  35.20         hayes-03.16959_mimage_0(0),hayes-03.16959_mimage_1(0),hayes-03.16959_mimage_2(0)


[root@hayes-03 ~]# dmsetup status
lock_stress-hayes--03.16959_mimage_2: 0 1024000 linear 
lock_stress-hayes--03.16959_mimage_1: 0 1024000 linear 
lock_stress-hayes--03.16959_mimage_0: 0 1024000 linear 
VolGroup00-LogVol01: 0 20578304 linear 
lock_stress-hayes--03.16959_mlog: 0 8192 linear 
VolGroup00-LogVol00: 0 135397376 linear 
lock_stress-hayes--03.16959: 0 1024000 mirror 3 253:10 253:11 253:13 354/1000 1 AAA 3 clustered_disk 253:9 A


dm-cmirror: Error listening for server(1) response for vOdrpq93: -110
dm-cmirror: Error listening for server(1) response for rtEp0mCj: -110
dm-cmirror: Error while getting resync work: bad region              
dm-cmirror: Error listening for server(2) response for UyRwqW5T: -110
dm-cmirror: Error while getting resync work: bad region 

2.6.9-80.ELsmp

lvm2-2.02.42-2.el4    BUILT: Thu Jan 15 12:48:25 CST 2009
lvm2-cluster-2.02.42-1.el4    BUILT: Tue Dec 16 13:16:35 CST 2008
device-mapper-1.02.28-1.el4    BUILT: Tue Dec 16 02:57:37 CST 2008
cmirror-1.0.1-1    BUILT: Tue Jan 30 17:28:02 CST 2007
cmirror-kernel-2.6.9-43.7.el4    BUILT: Tue Feb  3 16:00:34 CST 2009
Comment 3 Corey Marthaler 2009-04-27 10:15:54 EDT
Hit this during 4.8 LVM2 regression testing running the pvmove testcase. This issue basically blocks all rhel4 cmirror testing from passing since one of us in QA ends up hitting this everytime there's a new update or Z release.

dm-cmirror: Error listening for server(1) response for KrGHefKj: -110
dm-cmirror: Error listening for server(1) response for KrGHefKj: -110
dm-cmirror: [PXpzwh8L] Timed out waiting for log I/O: Write of bits  
dm-cmirror: [SRwmsT3c] Timed out waiting for log I/O: Write of bits
dm-cmirror: [kBLvM71Q] Timed out waiting for log I/O: Write of bits
dm-cmirror: Error listening for server(1) response for 1Cw6k8xf: -110
dm-cmirror: Error listening for server(1) response for 1Cw6k8xf: -110

Create a couple small segmented lvs and then pvmove them
Running lv_seg on hayes-02 to create the segmented volumes
Deactivating segment0 mirror
Moving data from pv /dev/etherd/e1.1p1
hayes-02: pvmove -v /dev/etherd/e1.1p1
    Finding volume group "mirror_sanity"
    Executing: /sbin/modprobe dm-cmirror
    Archiving volume group "mirror_sanity" metadata (seqno 341).
    Creating logical volume pvmove0
    Moving 3050 extents of logical volume mirror_sanity/segment0
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 342).
    Checking progress every 15 seconds
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 343).
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 344).
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 345).
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 346).
  /dev/etherd/e1.1p1: Moved: 4.8%
  /dev/etherd/e1.1p1: Moved: 9.2%
  /dev/etherd/e1.1p1: Moved: 13.6%
  /dev/etherd/e1.1p1: Moved: 18.1%
  /dev/etherd/e1.1p1: Moved: 22.6%
  /dev/etherd/e1.1p1: Moved: 90.0%
  /dev/etherd/e1.1p1: Moved: 90.7%
  /dev/etherd/e1.1p1: Moved: 91.3%
  /dev/etherd/e1.1p1: Moved: 91.9%
  /dev/etherd/e1.1p1: Moved: 92.6%
  /dev/etherd/e1.1p1: Moved: 92.6%
  /dev/etherd/e1.1p1: Moved: 92.6%
  /dev/etherd/e1.1p1: Moved: 92.6%
  /dev/ethepvmove appears stuck, copy percent hasn't changed in 1.5 minutes
rd/e1.1p1: Moved: 92.6%
  /dev/etherd/e1.1p1: Moved: 92.6%
  /dev/etherd/e1.1p1: Moved: 92.6%
  /dev/etherd/e1.1p1: Moved: 92.6%
Comment 6 Corey Marthaler 2011-01-18 17:13:16 EST
FWIW, this still exists in the latest rpms and continues to cause issues while running regression tests.

2.6.9-94.ELsmp

lvm2-2.02.42-9.el4    BUILT: Thu Oct 21 15:49:57 CDT 2010
lvm2-cluster-2.02.42-10.el4    BUILT: Tue Jan 18 06:17:17 CST 2011
device-mapper-1.02.28-3.el4    BUILT: Thu Mar  4 14:48:16 CST 2010
cmirror-1.0.2-1.el4    BUILT: Thu Feb 26 15:29:27 CST 2009
cmirror-kernel-2.6.9-43.14.el4    BUILT: Wed Dec 22 16:24:19 CST 2010
Comment 7 Lon Hohberger 2013-09-23 11:32:04 EDT
The Red Hat Cluster Suite product is past end-of-life; closing.

Note You need to log in before you can comment on or make changes to this bug.