Bug 1220011 - Force replace-brick lead to the persistent write(use dd) return Input/output error
Summary: Force replace-brick lead to the persistent write(use dd) return Input/output ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1176062
Blocks: 1183716
TreeView+ depends on / blocked
 
Reported: 2015-05-09 05:08 UTC by Pranith Kumar K
Modified: 2015-05-14 17:47 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1176062
Environment:
Last Closed: 2015-05-14 17:29:40 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Pranith Kumar K 2015-05-09 05:08:29 UTC
+++ This bug was initially created as a clone of Bug #1176062 +++

Description of problem:
    I mkdir /mountpoint/a/b/c -p, after that exec dd if=/dev/zero of=/mountpoint/a/b/c/test.bak bs=1M.  then I relace-brick commit force.  replace-brick success, but the write return Input/output error.

Version-Release number of selected component (if applicable):
 glusterfs-master or glusterfs-3.6.2beta1

How reproducible:


Steps to Reproduce:
1.I create a disperse 3 redundancy 1 volume

Volume Name: test
Type: Disperse
Volume ID: bfdbfc8e-3dcc-4459-a1e4-9de17df03db5
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: node-1:/sda/
Brick2: node-1:/sdb/
Brick3: node-1:/sdc/
Options Reconfigured:
features.quota: on
performance.high-prio-threads: 64
performance.low-prio-threads: 64
performance.least-prio-threads: 64
performance.normal-prio-threads: 64
performance.io-thread-count: 64
server.allow-insecure: on
features.lock-heal: on
network.ping-timeout: 5
performance.client-io-threads: enable

2.mkdir -p /mountpoint/a/b/c

3.dd if=/dev/zero of=/mountpoint/a/b/c/test.bak bs=1M

4.gluster volume replace-brick node-1:/sda node-1:/sdd commit force

Actual results:

replace-brick success, but dd write return Input/output error.

Expected results:

replace-brick success and the persistent write all should be OK.

Additional info:

--- Additional comment from jiademing on 2014-12-19 05:30:26 EST ---

I test the the persistent read also has this problem.(glusterfs-master or glusterfs-release-3.6.2beta1)

--- Additional comment from Anand Avati on 2015-01-07 06:51:33 EST ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#1) for review on master by Xavier Hernandez (xhernandez@datalab.es)

--- Additional comment from jiademing on 2015-01-09 03:08:45 EST ---

(In reply to Anand Avati from comment #2)
> REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> posted (#1) for review on master by Xavier Hernandez (xhernandez@datalab.es)

I test this patch, after force relpace-brick,it can persistent write, but  I ls /mountpoint,  return Input/output error Occasionally. then I stop the dd write, ls /mountpoint is OK.

--- Additional comment from jiademing on 2015-01-09 04:36:54 EST ---

(In reply to jiademing from comment #3)
> (In reply to Anand Avati from comment #2)
> > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> > posted (#1) for review on master by Xavier Hernandez (xhernandez@datalab.es)
> 
> I test this patch, after force relpace-brick,it can persistent write, but  I
> ls /mountpoint,  return Input/output error Occasionally. then I stop the dd
> write, ls /mountpoint is OK.


Error logs:

[2015-01-09 17:30:04.058135] E [ec-helpers.c:410:ec_loc_setup_path] 3-test-disperse-0: Invalid path '<gfid:060bd8ef-6e58-4fcd-ac21-2c0e85b70e54>' in loc
[2015-01-09 17:30:04.058165] I [dht-layout.c:663:dht_layout_normalize] 3-test-dht: Found anomalies in <gfid:060bd8ef-6e58-4fcd-ac21-2c0e85b70e54> (gfid = 060bd8ef-6e58-4fcd-ac21-2c0e85b70e54). Holes=1 overlaps=0
[2015-01-09 17:30:04.058187] W [fuse-resolve.c:147:fuse_resolve_gfid_cbk] 0-fuse: 060bd8ef-6e58-4fcd-ac21-2c0e85b70e54: failed to resolve (Input/output error)
[2015-01-09 17:30:04.058201] E [fuse-bridge.c:808:fuse_getattr_resume] 0-digioceanfs-fuse: 47449: GETATTR 6883340 (060bd8ef-6e58-4fcd-ac21-2c0e85b70e54) resolution failed

--- Additional comment from Xavier Hernandez on 2015-01-09 05:59:53 EST ---

(In reply to jiademing from comment #3)
> (In reply to Anand Avati from comment #2)
> > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> > posted (#1) for review on master by Xavier Hernandez (xhernandez@datalab.es)
> 
> I test this patch, after force relpace-brick,it can persistent write, but  I
> ls /mountpoint,  return Input/output error Occasionally. then I stop the dd
> write, ls /mountpoint is OK.

I've tried to do an ls of <mountpoint>, <mountpoint>/a, <mountpoint>/a/b and <mountpoint>/a/b/c while the dd was running in background and replace brick had completed. I haven't seen any Input/Output error. However I've seen that 'ls' sometimes takes more time than expected to complete. I'll try to see why.

The error logs you show seem to come from a different version of ec (program lines do not match with current code). I've tried it with current master with this patch added. What version are you trying ?

--- Additional comment from jiademing on 2015-01-12 01:01:00 EST ---

(In reply to Xavier Hernandez from comment #5)
> (In reply to jiademing from comment #3)
> > (In reply to Anand Avati from comment #2)
> > > REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files)
> > > posted (#1) for review on master by Xavier Hernandez (xhernandez@datalab.es)
> > 
> > I test this patch, after force relpace-brick,it can persistent write, but  I
> > ls /mountpoint,  return Input/output error Occasionally. then I stop the dd
> > write, ls /mountpoint is OK.
> 
> I've tried to do an ls of <mountpoint>, <mountpoint>/a, <mountpoint>/a/b and
> <mountpoint>/a/b/c while the dd was running in background and replace brick
> had completed. I haven't seen any Input/Output error. However I've seen that
> 'ls' sometimes takes more time than expected to complete. I'll try to see
> why.
> 
> The error logs you show seem to come from a different version of ec (program
> lines do not match with current code). I've tried it with current master
> with this patch added. What version are you trying ?

Sorry, I merged this patch by manual.Then I try on master + this patch, that's OK.

--- Additional comment from Anand Avati on 2015-05-03 07:38:51 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2015-05-03 22:58:27 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2015-05-04 00:24:07 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2015-05-06 10:30:42 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#5) for review on master by Xavier Hernandez (xhernandez@datalab.es)

--- Additional comment from Anand Avati on 2015-05-06 12:51:05 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#6) for review on master by Xavier Hernandez (xhernandez@datalab.es)

--- Additional comment from Anand Avati on 2015-05-07 03:34:33 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#7) for review on master by Xavier Hernandez (xhernandez@datalab.es)

--- Additional comment from Anand Avati on 2015-05-07 06:51:44 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#8) for review on master by Xavier Hernandez (xhernandez@datalab.es)

--- Additional comment from Anand Avati on 2015-05-08 02:51:22 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#9) for review on master by Vijay Bellur (vbellur@redhat.com)

--- Additional comment from Anand Avati on 2015-05-09 01:07:21 EDT ---

REVIEW: http://review.gluster.org/9407 (ec: Fix failures with missing files) posted (#10) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 1 Anand Avati 2015-05-09 05:13:53 UTC
REVIEW: http://review.gluster.org/10701 (ec: Fix failures with missing files) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 2 Anand Avati 2015-05-09 09:16:07 UTC
REVIEW: http://review.gluster.org/10701 (ec: Fix failures with missing files) posted (#2) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 3 Anand Avati 2015-05-10 00:30:53 UTC
COMMIT: http://review.gluster.org/10701 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 72f80aeba1268ed4836c10aee5fa41b6a04194e9
Author: Xavier Hernandez <xhernandez@datalab.es>
Date:   Wed Jan 7 12:29:48 2015 +0100

    ec: Fix failures with missing files
    
          Backport of http://review.gluster.com/9407
    
    When a file does not exist on a brick but it does on others, there
    could be problems trying to access it because there was some loc_t
    structures with null 'pargfid' but 'name' was set. This forced
    inode resolution based on <pargfid>/name instead of <gfid> which
    would be the correct one. To solve this problem, 'name' is always
    set to NULL when 'pargfid' is not present.
    
    Another problem was caused by an incorrect management of errors
    while doing incremental locking. The only allowed error during an
    incremental locking was ENOTCONN, but missing files on a brick can
    be returned as ESTALE. This caused an EIO on the operation.
    
    This patch doesn't care of errors during an incremental locking. At
    the end of the operation it will check if there are enough successfully
    locked bricks to continue or not.
    
    BUG: 1220011
    Change-Id: I4a1e6235d80e20ef7ef12daba0807b859ee5c435
    Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
    Reviewed-on: http://review.gluster.org/10701
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Tested-by: Gluster Build System <jenkins@build.gluster.com>

Comment 4 Niels de Vos 2015-05-14 17:29:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:36:02 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:38:23 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:47:32 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.