+++ This bug was initially created as a clone of Bug #1176062 +++ Description of problem: I mkdir /mountpoint/a/b/c -p, after that exec dd if=/dev/zero of=/mountpoint/a/b/c/test.bak bs=1M. then I relace-brick commit force. replace-brick success, but the write return Input/output error. Version-Release number of selected component (if applicable): glusterfs-master or glusterfs-3.6.2beta1 How reproducible: Steps to Reproduce: 1.I create a disperse 3 redundancy 1 volume Volume Name: test Type: Disperse Volume ID: bfdbfc8e-3dcc-4459-a1e4-9de17df03db5 Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: node-1:/sda/ Brick2: node-1:/sdb/ Brick3: node-1:/sdc/ Options Reconfigured: features.quota: on performance.high-prio-threads: 64 performance.low-prio-threads: 64 performance.least-prio-threads: 64 performance.normal-prio-threads: 64 performance.io-thread-count: 64 server.allow-insecure: on features.lock-heal: on network.ping-timeout: 5 performance.client-io-threads: enable 2.mkdir -p /mountpoint/a/b/c 3.dd if=/dev/zero of=/mountpoint/a/b/c/test.bak bs=1M 4.gluster volume replace-brick node-1:/sda node-1:/sdd commit force Actual results: replace-brick success, but dd write return Input/output error. Expected results: replace-brick success and the persistent write all should be OK. Additional info: --- Additional comment from jiademing on 2014-12-19 11:30:26 CET --- I test the the persistent read also has this problem.(glusterfs-master or glusterfs-release-3.6.2beta1) --- Additional comment from Anand Avati on 2015-01-07 12:51:33 CET --- REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#1) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from jiademing on 2015-01-09 09:08:45 CET --- (In reply to Anand Avati from comment #2) > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) > posted (#1) for review on master by Xavier Hernandez (xhernandez) I test this patch, after force relpace-brick,it can persistent write, but I ls /mountpoint, return Input/output error Occasionally. then I stop the dd write, ls /mountpoint is OK. --- Additional comment from jiademing on 2015-01-09 10:36:54 CET --- (In reply to jiademing from comment #3) > (In reply to Anand Avati from comment #2) > > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) > > posted (#1) for review on master by Xavier Hernandez (xhernandez) > > I test this patch, after force relpace-brick,it can persistent write, but I > ls /mountpoint, return Input/output error Occasionally. then I stop the dd > write, ls /mountpoint is OK. Error logs: [2015-01-09 17:30:04.058135] E [ec-helpers.c:410:ec_loc_setup_path] 3-test-disperse-0: Invalid path '<gfid:060bd8ef-6e58-4fcd-ac21-2c0e85b70e54>' in loc [2015-01-09 17:30:04.058165] I [dht-layout.c:663:dht_layout_normalize] 3-test-dht: Found anomalies in <gfid:060bd8ef-6e58-4fcd-ac21-2c0e85b70e54> (gfid = 060bd8ef-6e58-4fcd-ac21-2c0e85b70e54). Holes=1 overlaps=0 [2015-01-09 17:30:04.058187] W [fuse-resolve.c:147:fuse_resolve_gfid_cbk] 0-fuse: 060bd8ef-6e58-4fcd-ac21-2c0e85b70e54: failed to resolve (Input/output error) [2015-01-09 17:30:04.058201] E [fuse-bridge.c:808:fuse_getattr_resume] 0-digioceanfs-fuse: 47449: GETATTR 6883340 (060bd8ef-6e58-4fcd-ac21-2c0e85b70e54) resolution failed --- Additional comment from Xavier Hernandez on 2015-01-09 11:59:53 CET --- (In reply to jiademing from comment #3) > (In reply to Anand Avati from comment #2) > > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) > > posted (#1) for review on master by Xavier Hernandez (xhernandez) > > I test this patch, after force relpace-brick,it can persistent write, but I > ls /mountpoint, return Input/output error Occasionally. then I stop the dd > write, ls /mountpoint is OK. I've tried to do an ls of <mountpoint>, <mountpoint>/a, <mountpoint>/a/b and <mountpoint>/a/b/c while the dd was running in background and replace brick had completed. I haven't seen any Input/Output error. However I've seen that 'ls' sometimes takes more time than expected to complete. I'll try to see why. The error logs you show seem to come from a different version of ec (program lines do not match with current code). I've tried it with current master with this patch added. What version are you trying ? --- Additional comment from jiademing on 2015-01-12 07:01:00 CET --- (In reply to Xavier Hernandez from comment #5) > (In reply to jiademing from comment #3) > > (In reply to Anand Avati from comment #2) > > > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) > > > posted (#1) for review on master by Xavier Hernandez (xhernandez) > > > > I test this patch, after force relpace-brick,it can persistent write, but I > > ls /mountpoint, return Input/output error Occasionally. then I stop the dd > > write, ls /mountpoint is OK. > > I've tried to do an ls of <mountpoint>, <mountpoint>/a, <mountpoint>/a/b and > <mountpoint>/a/b/c while the dd was running in background and replace brick > had completed. I haven't seen any Input/Output error. However I've seen that > 'ls' sometimes takes more time than expected to complete. I'll try to see > why. > > The error logs you show seem to come from a different version of ec (program > lines do not match with current code). I've tried it with current master > with this patch added. What version are you trying ? Sorry, I merged this patch by manual.Then I try on master + this patch, that's OK.
REVIEW: http://review.gluster.org/9560 (ec: Fix failures with missing files) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)
REVIEW: http://review.gluster.org/9560 (ec: Fix failures with missing files) posted (#2) for review on release-3.6 by Xavier Hernandez (xhernandez)
COMMIT: http://review.gluster.org/9560 committed in release-3.6 by Raghavendra Bhat (raghavendra) ------ commit 1c14d8268b36e401ad7ac74ba3f082100fbe2bcc Author: Xavier Hernandez <xhernandez> Date: Wed Jan 7 12:29:48 2015 +0100 ec: Fix failures with missing files When a file does not exist on a brick but it does on others, there could be problems trying to access it because there was some loc_t structures with null 'pargfid' but 'name' was set. This forced inode resolution based on <pargfid>/name instead of <gfid> which would be the correct one. To solve this problem, 'name' is always set to NULL when 'pargfid' is not present. Another problem was caused by an incorrect management of errors while doing incremental locking. The only allowed error during an incremental locking was ENOTCONN, but missing files on a brick can be returned as ESTALE. This caused an EIO on the operation. This patch doesn't care of errors during an incremental locking. At the end of the operation it will check if there are enough successfully locked bricks to continue or not. This is a backport of http://review.gluster.org/9407/ Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a BUG: 1183716 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/9560 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Dan Lambright <dlambrig> Reviewed-by: Raghavendra Bhat <raghavendra>