Description of problem: I had three multi legged mirrors on a 4 node x86_64 cluster. Each of the three had a GFS filesystem on top of it and each was running I/O (xiogen/xdoio). I then failed /dev/sdh1 which was a legg in all three mirrors. That ended up causing all clvmd operations to hang. [root@link-08 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices mirror1 corey mwi-ao 5.00G mirror1_mlog 100.00 mirror1_mimage_0(0),mirror1_mimage_1(0) [mirror1_mimage_0] corey iwi-ao 5.00G /dev/sdh1(0) [mirror1_mimage_1] corey iwi-ao 5.00G /dev/sda1(0) [mirror1_mlog] corey lwi-ao 4.00M /dev/sdg1(0) mirror2 corey mwi-ao 5.00G mirror2_mlog 100.00 mirror2_mimage_0(0),mirror2_mimage_1(0),mirror2_mimage_2(0) [mirror2_mimage_0] corey iwi-ao 5.00G /dev/sdh1(1280) [mirror2_mimage_1] corey iwi-ao 5.00G /dev/sdb1(0) [mirror2_mimage_2] corey iwi-ao 5.00G /dev/sdc1(0) [mirror2_mlog] corey lwi-ao 4.00M /dev/sda1(1280) mirror3 corey mwi-ao 5.00G mirror3_mlog 100.00 mirror3_mimage_0(0),mirror3_mimage_1(0),mirror3_mimage_2(0),mirror3_mimage_3(0) [mirror3_mimage_0] corey iwi-ao 5.00G /dev/sdh1(2560) [mirror3_mimage_1] corey iwi-ao 5.00G /dev/sdd1(0) [mirror3_mimage_2] corey iwi-ao 5.00G /dev/sde1(0) [mirror3_mimage_3] corey iwi-ao 5.00G /dev/sdf1(0) [mirror3_mlog] corey lwi-ao 4.00M /dev/sda1(1281) After the failure: [root@link-07 ~]# dmsetup table corey-mirror1_mlog: 0 8192 linear 8:97 384 corey-mirror3_mimage_3: 0 10485760 linear 8:81 384 corey-mirror3_mimage_2: 0 10485760 linear 8:65 384 corey-mirror2_mlog: 0 8192 linear 8:1 10486144 corey-mirror3: 0 10485760 mirror clustered_disk 4 253:11 1024 LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNHaArgVnIEdototLjaZ1n3gGBfGLUE5UX block_on_error 4 253:12 0 253:13 0 253:14 0 253:15 0 corey-mirror3_mimage_1: 0 10485760 linear 8:49 384 corey-mirror2: 0 10485760 mirror clustered_disk 4 253:6 1024 LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNsm3bgSmAfNwdBCvVj5czrJoP1pfeWWtA block_on_error 3 253:7 0 253:8 0 253:9 0 corey-mirror2_mimage_2: 0 10485760 linear 8:33 384 corey-mirror3_mimage_0: 0 10485760 error corey-mirror3_mlog: 0 8192 linear 8:1 10494336 corey-mirror2_mimage_1: 0 10485760 linear 8:17 384 corey-mirror1: 0 10485760 mirror clustered_disk 4 253:2 1024 LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN5YTb2ydV3fDD0WE0n8EAexTiiSO16TmN block_on_error 2 253:3 0 253:4 0 corey-mirror2_mimage_0: 0 10485760 error corey-mirror1_mimage_1: 0 10485760 linear 8:1 384 corey-mirror1_mimage_0: 0 10485760 error VolGroup00-LogVol01: 0 4063232 linear 3:2 151781760 VolGroup00-LogVol00: 0 151781376 linear 3:2 384 [root@link-07 ~]# dmsetup info Name: corey-mirror1_mlog State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN5YTb2ydV3fDD0WE0n8EAexTiiSO16TmN Name: corey-mirror3_mimage_3 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 15 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNRN3uVucbxgF6dUUyvce4F6Z56Q5xWbjU Name: corey-mirror3_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 14 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNXYFWtroAvH9vpDleSAD7xZ25yeWEm1MU Name: corey-mirror2_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNsm3bgSmAfNwdBCvVj5czrJoP1pfeWWtA Name: corey-mirror3 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 5 Major, minor: 253, 16 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNXQk2lh5R7BJyMeqiBZNLtN2b19Ln5jtl Name: corey-mirror3_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 13 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNZ8j7Y5lKdS57yFTWnnoxM3tzuWCam2DL Name: corey-mirror2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 2 Major, minor: 253, 10 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN7VblEpnv25VsheGU7csOAdAZ9r5GY0W1 Name: corey-mirror2_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 9 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN84BMKoGOMZu2GMGL7rR5EBaVBYkVvbBd Name: corey-mirror3_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 12 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNpvomnKVRe1Ktx5r1AanmniQWmcUy3Nh4 Name: corey-mirror3_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 11 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNHaArgVnIEdototLjaZ1n3gGBfGLUE5UX Name: corey-mirror2_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNZbv7CvUlWG20vzlPTYhik4u43DhI7M98 Name: corey-mirror1 State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 1 Event number: 2 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN2RIp5WgoVj46mrlYYU1fdrBKP4BjJuus Name: corey-mirror2_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNQGetVoz6YbvBiX7cVlh8Oul5vPLZKVvv Name: corey-mirror1_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNUgs5K6Msk6mfVRtM8vh3BD5x2JEmUDSm Name: corey-mirror1_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNQ0xDdJtYU863Myf9i4VuJAQZXhjFKpyX Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xQjIYEP6InPgUU5nsDPYSZl5EAEKqRWcY Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xrapcuzOGNgADTzIRUNTk0MZBbtWAy Version-Release number of selected component (if applicable): 2.6.9-49.ELlargesmp cmirror-kernel-largesmp-2.6.9-24.0 I'll attach all the stack traces for the hung/running processes
Created attachment 149388 [details] log from link-02
Created attachment 149389 [details] log from link-04
Created attachment 149390 [details] log from link-07
Created attachment 149391 [details] log from link-08
This bug is reproducable. Name: corey-mirror4 State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 1 Event number: 1 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-r25APQaO2jVckDoLKsoAARInYt1p0mqop2zn17MV0vt2xIcSGS1tjxAfGr0KB9X5
After trying this a few times with complicated 3 and 4 legged cmirror configurations, I attempted the simpliest case. One cmirror, one GFS filesystem, one failure, and it appears to hang just like the more difficult cases above. This failure scenario used to work greater than 90% of the time, so it appears that a regression has been introduced into cmirror-kernel-largesmp-2.6.9-24.0. That, or this is a largesmp issue. I'll attempt this now with smp. [root@link-08 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices cmirror feist mwi-ao 8.00G cmirror_mlog 100.00 cmirror_mimage_0(0),cmirror_mimage_1(0) [cmirror_mimage_0] feist iwi-ao 8.00G /dev/sdh1(0) [cmirror_mimage_1] feist iwi-ao 8.00G /dev/sda1(0) [cmirror_mlog] feist lwi-ao 4.00M /dev/sdg1(0) [ killed /dev/sdh ] [root@link-08 ~]# lvs -a -o +devices /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error LV VG Attr LSize Origin Snap% Move Log Copy% Devices cmirror feist mwi-so 8.00G cmirror_mlog 99.95 cmirror_mimage_0(0),cmirror_mimage_1(0) [cmirror_mimage_0] feist iwi-so 8.00G [cmirror_mimage_1] feist iwi-so 8.00G /dev/sda1(0) [cmirror_mlog] feist lwi-so 4.00M /dev/sdg1(0) [root@link-08 ~]# lvs -a -o +devices /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error [HANG] [root@link-08 ~]# dmsetup info Name: feist-cmirror State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 1 Event number: 1 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFrP78jR8t27LUNs2TuoUI4bNvnf9VIJcG Name: feist-cmirror_mlog State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFHuA3j3gV3DsqnwONzJrvi6wAHTW0402u Name: feist-cmirror_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFgFCtJ639S8sFOi12g2F6JvO1ZBZxLW8w Name: feist-cmirror_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFcIpRkYVU2XcQNyqdZkrjWlaXhdCh6q1m Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkT1h6aidVTpJgaACOvfJduKZ88jI03FEf Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkXAqtIL9TdeqFs3GAoSPDj3JVguOzhxCd
reproduced on smp cluster as well.
Reproduced this issue without I/O or GFS on top of the cmirror. It no longer appears that a proper down convert works after a cmirror leg failure. This issue will need to block the release of cluster mirrors.
I can easily reproduce this. It appears that the suspended devices are not the problem, as those appear on the machines that are making progress. It's the machine that has a LIVE & INACTIVE table for the mirror (but no suspended devices) that is causing the problem. This machine also seems to be the log server (every time).
reverting one of the previous changes put in for another bug seems to fix the problem. Not sure why yet...
Notes from check-in: The problem here appears to be timeouts related to clvmd. During failures under heavy load, clvmd commands (suspend/resume/ activate/deactivate) can take a long time. Clvmd assumes to quickly that they have failed. This results in the fault handling being left half done. Further calls to vgreduce (by hand or by dmeventd) will not help because the _on-disk_ version of the meta-data is consistent - that is, the faulty device has been removed. The most significant change in this patch is the removal of the 'is_remote_recovering' function. This function was designed to check if a remote node was recovering a region so that writes to the region could be delayed. However, even with this function, it was possible for a remote node to begin recovery on a region _after_ the function was called, but before the write (mark request) took place. Because of this, checking is done during the mark request stage - rendering the call to 'is_remote_recovering' meaningless. Given the useless nature of this function, it has been pulled. The benefits of its removal are increased performance and much faster (more than an order of magnitude) response during the mirror suspend process. The faster suspend process leads to less clvmd timeouts and reduced probability that bug 231230 will be triggered. However, when a mirror device is reconfigured, the mirror sub-devices are removed. This is done by activating them cluster-wide before their removal. With high enough load during recovery, these operations can still take a long time - even though they are linear devices. This too has the potential for causing clvmd to timeout and trigger bug 231230. There is no cluster logging fix for this issue. The delay on the linear devices must be determined. A temporary work-around would be to increase the timeout of clvmd (e.g. clvmd -t #).
assinged -> post
post -> modified
Initial tests seems to show that simple mirror failure cases once again work. However before marking this verified, when a cmirror is not fully sync'ed and a failure takes place, clvmd appears to hang. Is this a new bug, or an unresolved portion of this bug?
<insert the usual questions> Also, if you can capture the clvmd debug output, that might help us figure out what clvmd is stuck on. To do this, you can run 'clvmd -d >& clvmd_output.txt &'
I added my lastest non-synced cmirror leg failure attempt to bz 217895 as it resulted in lost election results from cmirror server and caused a node to be fenced.
Lately I can no longer recreate the hang mentioned in comment #14, however, the mirror -> linear down conversion is still not taking place. Instead it appears to be corrupted. SCENARIO - [fail_non_syncd_primary_leg] Creating mirror on link-08 using device /dev/sdh1 (that we will fail) for primary leg lvcreate -m 1 -n fail_non_syncd_primary -L 500M helter_skelter /dev/sdh1:0-500 /dev/sdf1:0-500 /dev/sdb1:0-50 mirror is only 65.60% synced right now Disabling device sdh on link-02 Disabling device sdh on link-04 Disabling device sdh on link-07 Disabling device sdh on link-08 Attempting I/O to cause mirror conversion dd: writing to `/dev/helter_skelter/fail_non_syncd_primary': Input/output error 1+0 records in 0+0 records out Verifying the down conversion from mirror to linear /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error /dev/dm-5: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/dm-3: read failed after 0 of 4096 at 524222464: Input/output error [...] /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid 'Xhx3VZ-NGbq-1Mzy-n2rk-3xS1-aG2k-LJYjqz'. Couldn't find all physical volumes for volume group helter_skelter. Volume group "helter_skelter" not found down conversion didn't appear to work as /dev/sdf1 should now be apart of a linear I see these I/O errors on all the consoles: device-mapper: A read failure occurred on a mirror device. device-mapper: Unable to retry read. scsi2 (1:1): rejecting I/O to offline device scsi2 (1:1): rejecting I/O to offline device device-mapper: A read failure occurred on a mirror device. device-mapper: Unable to retry read. scsi2 (1:1): rejecting I/O to offline device scsi2 (1:1): rejecting I/O to offline device device-mapper: A read failure occurred on a mirror device. device-mapper: Unable to retry read. scsi2 (1:1): rejecting I/O to offline device
Jon - are you seeing any issues w/ non sync'ed cmirror failures, or am I still the only one?
I am not seeing this at all. I wonder if it has something to do with new LVM packages. [When you update to new versions of the software, please indicate that.] I will try updating my LVM/device-mapper packages
I've updated and I still don't see it (after about ten tries). I'll keep trying, but I hope you can help me reproduce.
Jon and I have narrowed this issue down to the write size being used to trigger the down conversion. The following will cause the down conversion to work properly: dd if=/dev/zero of=/dev/vg/cmirror bs=4M count=1 The follow will not: dd if=/dev/zero of=/dev/vg/cmirror count=1
I've seen this now. It depends on the arguments you use (and tools) to do the "write". corey has been using 'dd if=/dev/zero of=/dev/vg/lv count=10'. The dd actually does a read before writing if the request sizes are small enough. Looking at the mirror code for reads: static void do_reads(struct mirror_set *ms, struct bio_list *reads) { struct bio *bio; struct mirror *m; while ((bio = bio_list_pop(reads))) { /* * We can only read balance if the region is in sync. */ if (likely(rh_in_sync(&ms->rh, bio_to_region(&ms->rh, bio), 0))) m = choose_mirror(ms); else { m = ms->default_mirror; /* If the default fails, we give up .*/ if (unlikely(m && atomic_read(&m->error_count))) m = NULL; } if (likely(m)) read_async_bio(m, bio); else bio_endio(bio, bio->bi_size, -EIO); } } We see that we call rh_in_sync without the ability to block (third argument). Cluster mirroring (unlike single machine mirroring) must always block. Therefore, it is never allowed to 'choose_mirror'. Since the primary device has failed, 'bio_endio(bio, bio->bi_size, -EIO);' is called. This fails the read, and the writes from dd never take place... meaning an event never gets triggered and the mirror does not get down converted. I believe it is a bug in the kernel that we do not allow blocking in the rh_in_sync. Since we do not, reads are never allowed to balance [even when the mirrors are in-sync]. There should also be a debate about whether or not to trigger an event if a read fails... I don't think so, but that can be discussed in another bug. assigned -> modified
Marking this bug verified. The easy to reproduce cmiror leg failure regression no longer happens. All other lesser reproducable cmirror failure cases should be documented in other BZs.
Fixed in current release (4.7).