Bug 1416099 - Raid1 fails on read while primary leg contains error sectors [NEEDINFO]
Summary: Raid1 fails on read while primary leg contains error sectors
Keywords:
Status: NEW
Alias: None
Product: LVM and device-mapper
Classification: Community
Component: device-mapper
Version: 2.02.169
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Heinz Mauelshagen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1388103 1476097
TreeView+ depends on / blocked
 
Reported: 2017-01-24 15:17 UTC by Zdenek Kabelac
Modified: 2018-04-04 19:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1476097 (view as bug list)
Environment:
Last Closed:
rule-engine: lvm-technical-solution?
heinzm: needinfo? (zkabelac)


Attachments (Terms of Use)
lvm2 test suite example (958 bytes, application/x-shellscript)
2017-01-24 15:17 UTC, Zdenek Kabelac
no flags Details
Test with mounted filesystem (1.17 KB, text/plain)
2017-01-25 23:04 UTC, Zdenek Kabelac
no flags Details
Pure mdadm test for lvm2 test suite (790 bytes, text/plain)
2017-01-27 18:31 UTC, Zdenek Kabelac
no flags Details

Description Zdenek Kabelac 2017-01-24 15:17:57 UTC
Created attachment 1243949 [details]
lvm2 test suite example

Description of problem:

Dm mdraid wrapper target (and possibly original md as well) fail when sector unaligned write happens on 'error sector':

Attached is a reproducer script from lvm2 test suite which  created some PVs
and places  raid image leg on precise place.
(Reproducer could be placed into 'tests/shell/' subdir and tested with
make check_local T=lvconvert-repair-raid-primary.sh  VERBOSE=1)

First it creates 10M LV on first extents on first PV.
Then 'upconvert' to 3leg raid1 device and waits for sync state.

Then it replaces  1st. sector of 1st. extent with error sector.
And then does a single write to raid LV which hits this sector.

This ends with this kernel error log:
(and 'dd' blocks)

[ 1758.022033] raid6: sse2x1   gen()  3738 MB/s
[ 1758.039036] raid6: sse2x1   xor()  3814 MB/s
[ 1758.056076] raid6: sse2x2   gen()  2292 MB/s
[ 1758.073039] raid6: sse2x2   xor()  4312 MB/s
[ 1758.090044] raid6: sse2x4   gen()  3460 MB/s
[ 1758.107024] raid6: sse2x4   xor()  5033 MB/s
[ 1758.107095] raid6: using algorithm sse2x1 gen() 3738 MB/s
[ 1758.107098] raid6: .... xor() 3814 MB/s, rmw enabled
[ 1758.107101] raid6: using ssse3x2 recovery algorithm
[ 1758.109969] async_tx: api initialized (async)
[ 1758.112968] xor: measuring software checksum speed
[ 1758.122019]    prefetch64-sse:  8568.000 MB/sec
[ 1758.132028]    generic_sse:  8184.000 MB/sec
[ 1758.132055] xor: using function: prefetch64-sse (8568.000 MB/sec)
[ 1758.212598] device-mapper: raid: Loading target version 1.9.1
[ 1758.257096] device-mapper: raid: Superblocks created for new raid set
[ 1758.272413] md/raid1:mdX: not clean -- starting background reconstruction
[ 1758.272420] md/raid1:mdX: active with 3 out of 3 mirrors
[ 1758.295708] mdX: bitmap file is out of date, doing full recovery
[ 1758.299044] md: resync of RAID array mdX
[ 1758.395378] md: mdX: resync done.
[ 1758.823082] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.823531] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.823703] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.825825] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.825985] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.827225] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.827381] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.828609] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.828771] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.830201] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.830359] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.832221] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.832391] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.833623] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.833785] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.835492] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.835649] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.836829] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1758.836987] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1758.838108] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.824399] handle_read_error: 3878 callbacks suppressed
[ 1763.824406] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.824561] raid1_end_read_request: 3879 callbacks suppressed
[ 1763.824565] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1763.826112] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.826272] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1763.827338] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.827496] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1763.828902] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.829115] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1763.830432] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.830593] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1763.831711] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.831871] md/raid1:mdX: dm-6: rescheduling sector 0
[ 1763.833163] md/raid1:mdX: redirecting sector 0 to other mirror: dm-6
[ 1763.833317] md/raid1:mdX: dm-6: rescheduling sector 0

to unlock from this case - user can use  'dmsetup remove_all --force' 
(in case only 'raid' devices were in table - otherwise one has
to user precise device names! to drop only raid1 devices)

Version-Release number of selected component (if applicable):
also seen with 4.10-rc4 kernel.

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Zdenek Kabelac 2017-01-25 23:04:57 UTC
Created attachment 1244510 [details]
Test with mounted filesystem

In this example we make 10M device, convert it raid1 and use 'modern' ext4 filesystem on top.  
Then mount filesystem to 'mnt'  mountpoint.

Then we make 'error' segments ~500 first sectors on 'primary' leg.
(With the hope  ext4 hits those for 'write' operation)

Then we simply write 'a'  to file  mnt/a.

Here is the outcome of 4.10-rc4 kernel:

dmsetup status
#lvconvert-repair-raid-primary.sh:41+ dmsetup status
@PREFIX@vg-LV1_rimage_1: 0 20480 linear
@PREFIX@pv4: 0 40960 linear
@PREFIX@vg-LV1_rimage_0: 0 20480 linear
@PREFIX@pv3: 0 40960 linear
@PREFIX@pv2: 0 40960 linear
@PREFIX@pv1: 0 2048 linear
@PREFIX@pv1: 2048 500 error
@PREFIX@pv1: 2548 38412 linear
@PREFIX@vg-LV1: 0 20480 raid raid1 2 AA 20480/20480 idle 0 0
@PREFIX@vg-LV1_rmeta_1: 0 1024 linear
@PREFIX@vg-LV1_rmeta_0: 0 1024 linear
dmsetup table
#lvconvert-repair-raid-primary.sh:42+ dmsetup table
@PREFIX@vg-LV1_rimage_1: 0 20480 linear 253:1 3072
@PREFIX@pv4: 0 40960 linear 7:5 124928
@PREFIX@vg-LV1_rimage_0: 0 20480 linear 253:0 2048
@PREFIX@pv3: 0 40960 linear 7:5 83968
@PREFIX@pv2: 0 40960 linear 7:5 43008
@PREFIX@pv1: 0 2048 linear 7:5 2048
@PREFIX@pv1: 2048 500 error
@PREFIX@pv1: 2548 38412 linear 7:5 4596
@PREFIX@vg-LV1: 0 20480 raid raid1 3 0 region_size 1024 2 253:5 253:6 253:7 253:8
@PREFIX@vg-LV1_rmeta_1: 0 1024 linear 253:1 2048
@PREFIX@vg-LV1_rmeta_0: 0 1024 linear 253:0 22528

echo a > mnt/a
#lvconvert-repair-raid-primary.sh:44+ echo a
/home/kabi/export/lvm2/test/shell/lvconvert-repair-raid-primary.sh: line 44: mnt/a: No space left on device
4,3252,74483736696,-;raid1_end_read_request: 2550 callbacks suppressed
3,3253,74483736703,-;md/raid1:mdX: dm-6: rescheduling sector 168
4,3254,74483736943,-;handle_read_error: 2551 callbacks suppressed
6,3255,74483736947,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3256,74483736962,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3257,74483738185,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3258,74483738202,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3259,74483739143,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3260,74483739160,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3261,74483739937,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3262,74483739954,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3263,74483740859,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3264,74483740880,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3265,74483741637,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3266,74483741652,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3267,74483742427,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3268,74483742442,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3269,74483743396,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3270,74483743412,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3271,74483744195,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
3,3272,74483744211,-;md/raid1:mdX: dm-6: rescheduling sector 168
6,3273,74483744910,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6
4,3274,74488737442,-;handle_read_error: 5957 callbacks suppressed
6,3275,74488737448,-;md/raid1:mdX: redirecting sector 168 to other mirror: dm-6

Comment 2 Zdenek Kabelac 2017-01-26 08:29:15 UTC
So as a result - it 'dies' on read of error sector on 'primary' leg.

The 'workaround' could be to mark 'leg' with error sectors to be 'writemostly'
(lvchange --writemostly  /dev/mostly_failing_device  VG/RaidLV)

Comment 3 Zdenek Kabelac 2017-01-27 18:31:35 UTC
Created attachment 1245247 [details]
Pure mdadm test for lvm2 test suite

Same issue reproduced without 'lvm2'  - pure mdadm with 'sector' failing device hits the very same problem.

Comment 4 Heinz Mauelshagen 2017-11-08 18:28:08 UTC
(In reply to Zdenek Kabelac from comment #3)
> Created attachment 1245247 [details]
> Pure mdadm test for lvm2 test suite
> 
> Same issue reproduced without 'lvm2'  - pure mdadm with 'sector' failing
> device hits the very same problem.

Test as of attachment doesn't fail (any more) on 4.14.0-rc8 kernel.

Comment 5 Heinz Mauelshagen 2017-11-09 13:18:17 UTC
Created attachment 1349926 [details]
Zdenek's scsi_debug based test

Comment 6 Heinz Mauelshagen 2017-11-09 13:20:25 UTC
(In reply to Heinz Mauelshagen from comment #5)
> Created attachment 1349926 [details]
> Zdenek's scsi_debug based test

Doesn't fail on 4.14.0-rc8!

Excerpt of kernel log showing properly redirected raid1 ios:
...
[  669.999842] md/raid1:md0: redirecting sector 4630 to other mirror: sdq2
[  670.003724] sd 3:0:0:0: [sdq] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  670.005017] sd 3:0:0:0: [sdq] tag#1 Sense Key : Medium Error [current] 
[  670.006021] sd 3:0:0:0: [sdq] tag#1 Add. Sense: Unrecovered read error
[  670.007021] sd 3:0:0:0: [sdq] tag#1 CDB: Read(10) 28 00 00 00 12 37 00 00 01 00
[  670.008021] print_req_error: critical medium error, dev sdq, sector 4663
[  670.009026] md/raid1:md0: sdq1: rescheduling sector 4631
[  670.014426] md/raid1:md0: redirecting sector 4631 to other mirror: sdq2
...

Comment 8 Nigel Croxon 2017-12-12 19:19:29 UTC
top of log removed.....
[ 3161.583234] EXT4-fs (md200): mounted filesystem with ordered data mode. Opts: (null)
[18103.269122] md200: detected capacity change from 119898832896 to 0
[18103.269129] md: md200 stopped.
[18117.313637] scsi_debug: host protection
[18117.313642] scsi host5: scsi_debug, version 1.82 [20100324], dev_size_mb=33, opts=0x0
[18117.313741] scsi 5:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
[18117.330174] sd 5:0:0:0: Attached scsi generic sg5 type 0
[18117.330491] sd 5:0:0:0: [sdf] 67584 512-byte logical blocks: (34.6 MB/33.0 MiB)
[18117.331493] sd 5:0:0:0: [sdf] Write Protect is off
[18117.331496] sd 5:0:0:0: [sdf] Mode Sense: 73 00 10 08
[18117.333522] sd 5:0:0:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
[18117.342522]  sdf: sdf1 sdf2
[18117.347599] sd 5:0:0:0: [sdf] Attached SCSI disk
[18117.425948] md200: WARNING: sdf2 appears to be on the same physical disk as sdf1.
[18117.425950] True protection against single-disk failure might be compromised.
[18117.426008] md/raid1:md200: not clean -- starting background reconstruction
[18117.426009] md/raid1:md200: active with 2 out of 2 mirrors
[18117.426021] md200: detected capacity change from 0 to 16711680
[18117.426094] md: resync of RAID array md200
[18117.537761] md: md200: resync done.
[18123.175534] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.175538] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.175540] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.175541] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 34 00 00 01 00
[18123.175543] blk_update_request: critical medium error, dev sdf, sector 4660
[18123.175544] md/raid1:md200: sdf1: rescheduling sector 4628
[18123.176536] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.176537] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.176538] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.176539] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 34 00 00 01 00
[18123.176540] blk_update_request: critical medium error, dev sdf, sector 4660
[18123.179539] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.179541] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.179542] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.179544] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 34 00 00 01 00
[18123.179545] blk_update_request: critical medium error, dev sdf, sector 4660
[18123.179561] md/raid1:md200: redirecting sector 4628 to other mirror: sdf2
[18123.181521] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.181523] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.181524] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.181525] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 35 00 00 01 00
[18123.181526] blk_update_request: critical medium error, dev sdf, sector 4661
[18123.181527] md/raid1:md200: sdf1: rescheduling sector 4629
[18123.182521] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.182522] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.182524] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.182525] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 35 00 00 01 00
[18123.182527] blk_update_request: critical medium error, dev sdf, sector 4661
[18123.185538] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.185540] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.185541] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.185543] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 35 00 00 01 00
[18123.185544] blk_update_request: critical medium error, dev sdf, sector 4661
[18123.185547] md/raid1:md200: redirecting sector 4629 to other mirror: sdf2
[18123.187517] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.187520] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.187521] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.187522] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 36 00 00 01 00
[18123.187523] blk_update_request: critical medium error, dev sdf, sector 4662
[18123.187525] md/raid1:md200: sdf1: rescheduling sector 4630
[18123.188539] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.188540] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.188542] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.188543] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 36 00 00 01 00
[18123.188544] blk_update_request: critical medium error, dev sdf, sector 4662
[18123.191526] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.191544] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.191545] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.191546] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 36 00 00 01 00
[18123.191547] blk_update_request: critical medium error, dev sdf, sector 4662
[18123.191563] md/raid1:md200: redirecting sector 4630 to other mirror: sdf2
[18123.193537] sd 5:0:0:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[18123.193540] sd 5:0:0:0: [sdf] Sense Key : Medium Error [current] 
[18123.193541] sd 5:0:0:0: [sdf] Add. Sense: Unrecovered read error
[18123.193542] sd 5:0:0:0: [sdf] CDB: Read(10) 28 00 00 00 12 37 00 00 01 00
[18123.193543] blk_update_request: critical medium error, dev sdf, sector 4663
[18123.193545] md/raid1:md200: sdf1: rescheduling sector 4631
[18123.197543] md/raid1:md200: redirecting sector 4631 to other mirror: sdf2
[18123.199537] md/raid1:md200: sdf1: rescheduling sector 4632
[18123.203543] md/raid1:md200: redirecting sector 4632 to other mirror: sdf2
[18123.205540] md/raid1:md200: sdf1: rescheduling sector 4633
[18123.209540] md/raid1:md200: redirecting sector 4633 to other mirror: sdf2
[18123.211539] md/raid1:md200: sdf1: rescheduling sector 4634
[18123.215540] md/raid1:md200: redirecting sector 4634 to other mirror: sdf2
[18123.217539] md/raid1:md200: sdf1: rescheduling sector 4635
[18123.221541] md/raid1:md200: redirecting sector 4635 to other mirror: sdf2
[18123.223539] md/raid1:md200: sdf1: rescheduling sector 4636
[18123.227543] md/raid1:md200: redirecting sector 4636 to other mirror: sdf2
[18123.229538] md/raid1:md200: sdf1: rescheduling sector 4637
[18123.233540] md/raid1:md200: redirecting sector 4637 to other mirror: sdf2
...Sleeping done...
Collecting some recent kernel messages...
Personalities : [raid1] 
md200 : active raid1 sdf2[1](W) sdf1[0]
      16320 blocks super 1.0 [2/2] [UU]
      
md126 : active raid1 sdb[1] sde[0]
      390699008 blocks super external:/md127/0 [2/2] [UU]
      
md127 : inactive sdb[1](S) sde[0](S)
      6306 blocks super external:imsm
       
unused devices: <none>
./mddev_scsi_debug.sh: line 80: WMOSTLY: unbound variable
[root@think ~]# 32640+0 records in
32640+0 records out
16711680 bytes (17 MB) copied, 32.742 s, 510 kB/s


Note You need to log in before you can comment on or make changes to this bug.