Bug 875035
Summary: | IO on nfs mount fails with " Input/output error " message when a distribute volume is changed to distribute-replicate volume | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura | ||||
Component: | glusterfs | Assignee: | shishir gowda <sgowda> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | spandura | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.0 | CC: | amarts, jdarcy, nsathyan, redhat.bugs, rhs-bugs, shaines, vbellur | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-03-22 07:06:11 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
spandura
2012-11-09 11:49:38 UTC
Created attachment 641484 [details]
NFS Log
This test case passed on the update2 build: [11/09/12 - 07:31:30 root@king ~]# rpm -qa | grep gluster glusterfs-server-3.3.0.2rhs-30.el6rhs.x86_64 seems to be an issue with self-heal not considered for entries as in this particular case, file was present only on one node, and on other node, it was not present. pranith, assigning this to you, see if the issue is related to self-heal. If not re-assign it to me. This also happens with the GlusterFS 3.3.1 using the fuse client to mount the gluster volume. I am using these RPM's from the main website: glusterfs-swift-plugin-3.3.1-1.fc17.noarch glusterfs-server-3.3.1-1.fc17.x86_64 glusterfs-devel-3.3.1-1.fc17.x86_64 glusterfs-debuginfo-3.3.1-1.fc17.x86_64 glusterfs-swift-proxy-3.3.1-1.fc17.noarch glusterfs-swift-container-3.3.1-1.fc17.noarch glusterfs-3.3.1-1.fc17.x86_64 glusterfs-swift-account-3.3.1-1.fc17.noarch glusterfs-fuse-3.3.1-1.fc17.x86_64 glusterfs-swift-3.3.1-1.fc17.noarch glusterfs-geo-replication-3.3.1-1.fc17.x86_64 glusterfs-rdma-3.3.1-1.fc17.x86_64 glusterfs-swift-doc-3.3.1-1.fc17.noarch glusterfs-swift-object-3.3.1-1.fc17.noarch glusterfs-vim-3.2.7-2.fc17.x86_64 I have the same issue when adding the 2nd brick to a 1 brick Distribute volume and I also get an issue when adding bricks 3 and 4 to a replicated volume of just two bricks. In my experience its something to do with the volume going from a single sub volume to two sub volumes. Adding sub volume 3 goes ok, and so do any after the 3rd one is added. To reproduce the error create a gluster volume with one sub volume and mount it with the normal "mount -t glusterfs ip:/vol /path" command and then run this in another session: watch --interval=0 find /path With that watch running every 0.1s go and add in your 2nd sub volume of either 1 brick (as in distribute) or two bricks (for replicate) and you should also get the error. It may help if you setup some VirtualBox nodes to test with as I've found that slower systems create this problem quicker. Hope this helps and please shout if you want me to test some RPM's or something? Rich I don't know if this will help, but I get lots of these in my logs when it happens. For reference "md0" is the name of the gluster volume and "recon" is the only file in the volume. [2013-01-15 16:34:40.795592] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 16499: LOOKUP() /recon => -1 (Invalid argument) [2013-01-15 16:34:40.795660] W [dht-layout.c:186:dht_layout_search] 1-md0-dht: no subvolume for hash (value) = 3228047937 [2013-01-15 16:34:40.795670] E [dht-common.c:1372:dht_lookup] 1-md0-dht: Failed to get hashed subvol for /recon [2013-01-15 16:34:40.795680] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 16500: LOOKUP() /recon => -1 (Invalid argument) [2013-01-15 16:34:40.795827] W [dht-layout.c:186:dht_layout_search] 1-md0-dht: no subvolume for hash (value) = 3228047937 [2013-01-15 16:34:40.795843] E [dht-common.c:1372:dht_lookup] 1-md0-dht: Failed to get hashed subvol for /recon [2013-01-15 16:34:40.795854] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 16501: LOOKUP() /recon => -1 (Invalid argument) [2013-01-15 16:34:40.892750] I [client-handshake.c:1636:select_server_supported_programs] 1-md0-client-1: Using Program GlusterFS 3.3.1, Num (1298437), Version (330) [2013-01-15 16:34:40.920663] I [client-handshake.c:1433:client_setvolume_cbk] 1-md0-client-1: Connected to 169.254.0.44:24009, attached to remote volume '/mnt/md0/brick1'. [2013-01-15 16:34:40.920744] I [client-handshake.c:1445:client_setvolume_cbk] 1-md0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-01-15 16:34:40.921550] I [client-handshake.c:453:client_set_lk_version_cbk] 1-md0-client-1: Server lk version = 1 Hmmm, i've updated to Fedora 18 and installed Gluster with these RPM's: # rpm -qa | grep gluster glusterfs-fuse-3.3.1-4.fc18.x86_64 glusterfs-rdma-3.3.1-4.fc18.x86_64 glusterfs-3.3.1-4.fc18.x86_64 glusterfs-geo-replication-3.3.1-4.fc18.x86_64 glusterfs-server-3.3.1-4.fc18.x86_64 And the problem doesn't seem to happen any more... well, at least I've not been able to reproduce it yet. I don't know what changes there were between 3.3.1-1 and 3.3.1-4 but it may have resolved my issue. I'll come back and post an update if I get it to error again. Rich ok, small udpate, the problem is still there, but it only lasts for a second or two now rather than the 20+ seconds before. Conversion of a non distribute volume to a distribute volume leads to the above errors. Fix http://review.gluster.org/3838 for bug 815227 handles this by default adding distribute xlator for any volumes created. The fix should be available in release-3.4 or upstream master. *** This bug has been marked as a duplicate of bug 815227 *** |