Bug 1183716 - Force replace-brick lead to the persistent write(use dd) return Input/output error
Summary: Force replace-brick lead to the persistent write(use dd) return Input/output ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On: 1176062 1220011
Blocks: 1159529 glusterfs-3.6.3
TreeView+ depends on / blocked
 
Reported: 2015-01-19 14:49 UTC by Xavi Hernandez
Modified: 2015-08-07 11:33 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.6.3beta1
Clone Of: 1176062
Environment:
Last Closed: 2015-08-07 11:33:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Xavi Hernandez 2015-01-19 14:49:48 UTC
+++ This bug was initially created as a clone of Bug #1176062 +++

Description of problem:
    I mkdir /mountpoint/a/b/c -p, after that exec dd if=/dev/zero of=/mountpoint/a/b/c/test.bak bs=1M.  then I relace-brick commit force.  replace-brick success, but the write return Input/output error.

Version-Release number of selected component (if applicable):
 glusterfs-master or glusterfs-3.6.2beta1

How reproducible:


Steps to Reproduce:
1.I create a disperse 3 redundancy 1 volume

Volume Name: test
Type: Disperse
Volume ID: bfdbfc8e-3dcc-4459-a1e4-9de17df03db5
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: node-1:/sda/
Brick2: node-1:/sdb/
Brick3: node-1:/sdc/
Options Reconfigured:
features.quota: on
performance.high-prio-threads: 64
performance.low-prio-threads: 64
performance.least-prio-threads: 64
performance.normal-prio-threads: 64
performance.io-thread-count: 64
server.allow-insecure: on
features.lock-heal: on
network.ping-timeout: 5
performance.client-io-threads: enable

2.mkdir -p /mountpoint/a/b/c

3.dd if=/dev/zero of=/mountpoint/a/b/c/test.bak bs=1M

4.gluster volume replace-brick node-1:/sda node-1:/sdd commit force

Actual results:

replace-brick success, but dd write return Input/output error.

Expected results:

replace-brick success and the persistent write all should be OK.

Additional info:

--- Additional comment from jiademing on 2014-12-19 11:30:26 CET ---

I test the the persistent read also has this problem.(glusterfs-master or glusterfs-release-3.6.2beta1)

--- Additional comment from Anand Avati on 2015-01-07 12:51:33 CET ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#1) for review on master by Xavier Hernandez (xhernandez)

--- Additional comment from jiademing on 2015-01-09 09:08:45 CET ---

(In reply to Anand Avati from comment #2)
> REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> posted (#1) for review on master by Xavier Hernandez (xhernandez)

I test this patch, after force relpace-brick,it can persistent write, but  I ls /mountpoint,  return Input/output error Occasionally. then I stop the dd write, ls /mountpoint is OK.

--- Additional comment from jiademing on 2015-01-09 10:36:54 CET ---

(In reply to jiademing from comment #3)
> (In reply to Anand Avati from comment #2)
> > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> > posted (#1) for review on master by Xavier Hernandez (xhernandez)
> 
> I test this patch, after force relpace-brick,it can persistent write, but  I
> ls /mountpoint,  return Input/output error Occasionally. then I stop the dd
> write, ls /mountpoint is OK.


Error logs:

[2015-01-09 17:30:04.058135] E [ec-helpers.c:410:ec_loc_setup_path] 3-test-disperse-0: Invalid path '<gfid:060bd8ef-6e58-4fcd-ac21-2c0e85b70e54>' in loc
[2015-01-09 17:30:04.058165] I [dht-layout.c:663:dht_layout_normalize] 3-test-dht: Found anomalies in <gfid:060bd8ef-6e58-4fcd-ac21-2c0e85b70e54> (gfid = 060bd8ef-6e58-4fcd-ac21-2c0e85b70e54). Holes=1 overlaps=0
[2015-01-09 17:30:04.058187] W [fuse-resolve.c:147:fuse_resolve_gfid_cbk] 0-fuse: 060bd8ef-6e58-4fcd-ac21-2c0e85b70e54: failed to resolve (Input/output error)
[2015-01-09 17:30:04.058201] E [fuse-bridge.c:808:fuse_getattr_resume] 0-digioceanfs-fuse: 47449: GETATTR 6883340 (060bd8ef-6e58-4fcd-ac21-2c0e85b70e54) resolution failed

--- Additional comment from Xavier Hernandez on 2015-01-09 11:59:53 CET ---

(In reply to jiademing from comment #3)
> (In reply to Anand Avati from comment #2)
> > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> > posted (#1) for review on master by Xavier Hernandez (xhernandez)
> 
> I test this patch, after force relpace-brick,it can persistent write, but  I
> ls /mountpoint,  return Input/output error Occasionally. then I stop the dd
> write, ls /mountpoint is OK.

I've tried to do an ls of <mountpoint>, <mountpoint>/a, <mountpoint>/a/b and <mountpoint>/a/b/c while the dd was running in background and replace brick had completed. I haven't seen any Input/Output error. However I've seen that 'ls' sometimes takes more time than expected to complete. I'll try to see why.

The error logs you show seem to come from a different version of ec (program lines do not match with current code). I've tried it with current master with this patch added. What version are you trying ?

--- Additional comment from jiademing on 2015-01-12 07:01:00 CET ---

(In reply to Xavier Hernandez from comment #5)
> (In reply to jiademing from comment #3)
> > (In reply to Anand Avati from comment #2)
> > > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> > > posted (#1) for review on master by Xavier Hernandez (xhernandez)
> > 
> > I test this patch, after force relpace-brick,it can persistent write, but  I
> > ls /mountpoint,  return Input/output error Occasionally. then I stop the dd
> > write, ls /mountpoint is OK.
> 
> I've tried to do an ls of <mountpoint>, <mountpoint>/a, <mountpoint>/a/b and
> <mountpoint>/a/b/c while the dd was running in background and replace brick
> had completed. I haven't seen any Input/Output error. However I've seen that
> 'ls' sometimes takes more time than expected to complete. I'll try to see
> why.
> 
> The error logs you show seem to come from a different version of ec (program
> lines do not match with current code). I've tried it with current master
> with this patch added. What version are you trying ?

Sorry, I merged this patch by manual.Then I try on master + this patch, that's OK.

Comment 1 Anand Avati 2015-02-03 08:48:11 UTC
REVIEW: http://review.gluster.org/9560 (ec: Fix failures with missing files) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 2 Anand Avati 2015-02-03 17:16:15 UTC
REVIEW: http://review.gluster.org/9560 (ec: Fix failures with missing files) posted (#2) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 3 Anand Avati 2015-02-11 09:20:43 UTC
COMMIT: http://review.gluster.org/9560 committed in release-3.6 by Raghavendra Bhat (raghavendra) 
------
commit 1c14d8268b36e401ad7ac74ba3f082100fbe2bcc
Author: Xavier Hernandez <xhernandez>
Date:   Wed Jan 7 12:29:48 2015 +0100

    ec: Fix failures with missing files
    
    When a file does not exist on a brick but it does on others, there
    could be problems trying to access it because there was some loc_t
    structures with null 'pargfid' but 'name' was set. This forced
    inode resolution based on <pargfid>/name instead of <gfid> which
    would be the correct one. To solve this problem, 'name' is always
    set to NULL when 'pargfid' is not present.
    
    Another problem was caused by an incorrect management of errors
    while doing incremental locking. The only allowed error during an
    incremental locking was ENOTCONN, but missing files on a brick can
    be returned as ESTALE. This caused an EIO on the operation.
    
    This patch doesn't care of errors during an incremental locking. At
    the end of the operation it will check if there are enough successfully
    locked bricks to continue or not.
    
    This is a backport of http://review.gluster.org/9407/
    
    Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a
    BUG: 1183716
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/9560
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Dan Lambright <dlambrig>
    Reviewed-by: Raghavendra Bhat <raghavendra>


Note You need to log in before you can comment on or make changes to this bug.