Bug 875035

Summary:

IO on nfs mount fails with " Input/output error " message when a distribute volume is changed to distribute-replicate volume

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

spandura

Component:

glusterfs

Assignee:

shishir gowda <sgowda>

Status:

CLOSED DUPLICATE

QA Contact:

spandura

Severity:

high

Docs Contact:

Priority:

medium

Version:

2.0

CC:

amarts, jdarcy, nsathyan, redhat.bugs, rhs-bugs, shaines, vbellur

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-03-22 07:06:11 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
NFS Log	none

Description spandura 2012-11-09 11:49:38 UTC

Description of problem:
========================
When performed add-brick on a distribute volume with 2 bricks to change the volume to distribute-replicate volume (2x2) and subsequent rebalance volume operation,  the file operations on NFS mount  fails  with Input/output error.


Version-Release number of selected component (if applicable):
=============================================================
[11/09/12 - 06:38:37 root@king ~]# rpm -qa | grep gluster
glusterfs-fuse-3.3.0.5rhs-37.el6rhs.x86_64

[11/09/12 - 06:38:44 root@king ~]# gluster --version
glusterfs 3.3.0.5rhs built on Nov  8 2012 22:30:35


How reproducible:
=================
Often

Steps to Reproduce:
=====================
1.Create a distribute volume with 2 bricks. Start the volume

2.create nfs and fuse mounts to volume on client machine

3.on fuse mount execute : mkdir testdir1 ; cd testdir1 ; for i in `seq 1 100`; do mkdir dir.$i ; for j in `seq 1 100`; do dd if=/dev/input_file of=dir.$i/file.$j bs=1k count=1024 ; done ; done

4. on nfs mount execute : mkdir testdir2; cd testdir2  ; for i in `seq 1 100`; do mkdir dir.$i ; for j in `seq 1 100`; do dd if=/dev/input_file of=dir.$i/file.$j bs=1k count=1024 ; done ; done

5. add brick to the distribute volume to make it distribute-replicate volume

6. perform rebalance. 
  
Actual results:
================
The dd on the NFS mount fails with "Input/output error"

NFS Log message for one of the failure:-
=======================================
[2012-11-09 05:53:34.869753] W [client3_1-fops.c:418:client3_1_open_cbk] 0-distribute-client-1: remote operation failed: No such file or directory. Path: <gfid:d06f37d2-327b-49b9-a926-efaac2c4d39e> (00000000-0000-0000-0000-000000000000)
[2012-11-09 05:53:34.869801] E [afr-self-heal-data.c:1311:afr_sh_data_open_cbk] 0-distribute-replicate-0: open of <gfid:d06f37d2-327b-49b9-a926-efaac2c4d39e> failed on child distribute-client-1 (No such file or directory)
[2012-11-09 05:53:34.869822] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-distribute-replicate-0: background  meta-data data entry self-heal failed on <gfid:d06f37d2-327b-49b9-a926-efaac2c4d39e>
[2012-11-09 05:53:34.869850] E [nfs3-helpers.c:3603:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:d06f37d2-327b-49b9-a926-efaac2c4d39e>: Input/output error
[2012-11-09 05:53:34.869940] E [nfs3.c:2195:nfs3_write_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.70.34.110:795) distribute : d06f37d2-327b-49b9-a926-efaac2c4d39e
[2012-11-09 05:53:34.869965] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: fcb62144, WRITE: NFS: 5(I/O error), POSIX: 14(Bad address)


Expected results:
===================
dd on all the files should be successful

Comment 1 spandura 2012-11-09 11:50:30 UTC

Created attachment 641484 [details]
NFS Log

Comment 3 spandura 2012-11-09 13:15:11 UTC

This test case passed on the update2 build:

[11/09/12 - 07:31:30 root@king ~]# rpm -qa | grep gluster
glusterfs-server-3.3.0.2rhs-30.el6rhs.x86_64

Comment 5 Amar Tumballi 2012-11-12 05:26:33 UTC

seems to be an issue with self-heal not considered for entries as in this particular case, file was present only on one node, and on other node, it was not present.

Comment 6 Amar Tumballi 2012-11-20 05:30:01 UTC

pranith, assigning this to you, see if the issue is related to self-heal. If not re-assign it to me.

Comment 7 Richard 2013-01-11 14:34:28 UTC

This also happens with the GlusterFS 3.3.1 using the fuse client to mount the gluster volume.

I am using these RPM's from the main website:
glusterfs-swift-plugin-3.3.1-1.fc17.noarch
glusterfs-server-3.3.1-1.fc17.x86_64
glusterfs-devel-3.3.1-1.fc17.x86_64
glusterfs-debuginfo-3.3.1-1.fc17.x86_64
glusterfs-swift-proxy-3.3.1-1.fc17.noarch
glusterfs-swift-container-3.3.1-1.fc17.noarch
glusterfs-3.3.1-1.fc17.x86_64
glusterfs-swift-account-3.3.1-1.fc17.noarch
glusterfs-fuse-3.3.1-1.fc17.x86_64
glusterfs-swift-3.3.1-1.fc17.noarch
glusterfs-geo-replication-3.3.1-1.fc17.x86_64
glusterfs-rdma-3.3.1-1.fc17.x86_64
glusterfs-swift-doc-3.3.1-1.fc17.noarch
glusterfs-swift-object-3.3.1-1.fc17.noarch
glusterfs-vim-3.2.7-2.fc17.x86_64

I have the same issue when adding the 2nd brick to a 1 brick Distribute volume and I also get an issue when adding bricks 3 and 4 to a replicated volume of just two bricks.

In my experience its something to do with the volume going from a single sub volume to two sub volumes. Adding sub volume 3 goes ok, and so do any after the 3rd one is added.

To reproduce the error create a gluster volume with one sub volume and mount it with the normal "mount -t glusterfs ip:/vol /path" command and then run this in another session:

watch --interval=0 find /path

With that watch running every 0.1s go and add in your 2nd sub volume of either 1 brick (as in distribute) or two bricks (for replicate) and you should also get the error.

It may help if you setup some VirtualBox nodes to test with as I've found that slower systems create this problem quicker.

Hope this helps and please shout if you want me to test some RPM's or something?

Rich

Comment 9 Richard 2013-01-15 16:38:17 UTC

I don't know if this will help, but I get lots of these in my logs when it happens.

For reference "md0" is the name of the gluster volume and "recon" is the only file in the volume.



[2013-01-15 16:34:40.795592] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 16499: LOOKUP() /recon => -1 (Invalid argument)
[2013-01-15 16:34:40.795660] W [dht-layout.c:186:dht_layout_search] 1-md0-dht: no subvolume for hash (value) = 3228047937
[2013-01-15 16:34:40.795670] E [dht-common.c:1372:dht_lookup] 1-md0-dht: Failed to get hashed subvol for /recon
[2013-01-15 16:34:40.795680] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 16500: LOOKUP() /recon => -1 (Invalid argument)
[2013-01-15 16:34:40.795827] W [dht-layout.c:186:dht_layout_search] 1-md0-dht: no subvolume for hash (value) = 3228047937
[2013-01-15 16:34:40.795843] E [dht-common.c:1372:dht_lookup] 1-md0-dht: Failed to get hashed subvol for /recon
[2013-01-15 16:34:40.795854] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 16501: LOOKUP() /recon => -1 (Invalid argument)
[2013-01-15 16:34:40.892750] I [client-handshake.c:1636:select_server_supported_programs] 1-md0-client-1: Using Program GlusterFS 3.3.1, Num (1298437), Version (330)
[2013-01-15 16:34:40.920663] I [client-handshake.c:1433:client_setvolume_cbk] 1-md0-client-1: Connected to 169.254.0.44:24009, attached to remote volume '/mnt/md0/brick1'.
[2013-01-15 16:34:40.920744] I [client-handshake.c:1445:client_setvolume_cbk] 1-md0-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-01-15 16:34:40.921550] I [client-handshake.c:453:client_set_lk_version_cbk] 1-md0-client-1: Server lk version = 1

Comment 10 Richard 2013-01-17 10:57:09 UTC

Hmmm, i've updated to Fedora 18 and installed Gluster with these RPM's:

# rpm -qa | grep gluster
glusterfs-fuse-3.3.1-4.fc18.x86_64
glusterfs-rdma-3.3.1-4.fc18.x86_64
glusterfs-3.3.1-4.fc18.x86_64
glusterfs-geo-replication-3.3.1-4.fc18.x86_64
glusterfs-server-3.3.1-4.fc18.x86_64

And the problem doesn't seem to happen any more... well, at least I've not been able to reproduce it yet.

I don't know what changes there were between 3.3.1-1 and 3.3.1-4 but it may have resolved my issue.

I'll come back and post an update if I get it to error again.

Rich

Comment 11 Richard 2013-01-17 12:46:24 UTC

ok, small udpate, the problem is still there, but it only lasts for a second or two now rather than the 20+ seconds before.

Comment 13 shishir gowda 2013-03-22 07:06:11 UTC

Conversion of a non distribute volume to a distribute volume leads to the above errors. Fix  http://review.gluster.org/3838 for bug 815227 handles this by default adding distribute xlator for any volumes created. The fix should be available in release-3.4 or upstream master.

*** This bug has been marked as a duplicate of bug 815227 ***