1248067 – fuse mount fails with "failed to fetch volume file"

Bug 1248067 - fuse mount fails with "failed to fetch volume file"

Summary: fuse mount fails with "failed to fetch volume file"

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-29 14:12 UTC by Carlos Mestre González
Modified:	2017-03-08 11:00 UTC (History)
CC List:	25 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-08 11:00:58 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
try to mount a volume and fails, also the mnt.log (2.24 KB, text/plain) 2015-07-31 08:52 UTC, Carlos Mestre González	no flags	Details
mount volume pass in 3.7.2-3 (7.81 KB, text/plain) 2015-07-31 08:52 UTC, Carlos Mestre González	no flags	Details
glusterd.log (390.88 KB, text/plain) 2015-08-04 10:04 UTC, Carlos Mestre González	no flags	Details
View All

Description Carlos Mestre González 2015-07-29 14:12:42 UTC

Description of problem:
Fail to add a gluster domain

Version-Release number of selected component (if applicable):
engine:
ovirt-engine-3.6.0-0.0.master.20150726172446.git65db93d.el6.noarch
host RHEL 7.1 7Server:
glusterfs-3.7.3-1.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Trying to add a glusterfs domain (replica 3)

Actual results:
Thread-7568::DEBUG::2015-07-29 16:22:23,450::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StoragePool.connectStorageServer' in bridge with {u'connectionParams': [{u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'10.35.160.6:/StorageGE2_volume04', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '********', u'port': u''}], u'storagepoolID': u'00000000-0000-0000-0000-000000000000', u'domainType': 7}
Thread-7568::DEBUG::2015-07-29 16:22:23,451::task::595::Storage.TaskManager.Task::(_updateState) Task=`0ee3da93-591c-4cf3-aa5c-70d3eade0810`::moving from state init -> state preparing
Thread-7568::INFO::2015-07-29 16:22:23,451::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=7, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'10.35.160.6:/StorageGE2_volume04', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '********', u'port': u''}], options=None)
Thread-7568::ERROR::2015-07-29 16:22:23,482::hsm::2455::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2452, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 213, in connect
    self.validate()
  File "/usr/share/vdsm/storage/storageServer.py", line 315, in validate
    replicaCount = self.volinfo['replicaCount']
  File "/usr/share/vdsm/storage/storageServer.py", line 311, in volinfo
    self._volinfo = self._get_gluster_volinfo()
  File "/usr/share/vdsm/storage/storageServer.py", line 329, in _get_gluster_volinfo
    self._volfileserver)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in glusterVolumeInfo
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
GlusterCmdExecFailedException: Command execution failed
error: Connection failed. Please check if gluster daemon is operational.
return code: 1
Thread-7568::DEBUG::2015-07-29 16:22:23,483::hsm::2474::Storage.HSM::(connectStorageServ:

Additional info:
I cannot even mount a gluster domain in the console:
[2015-07-29 14:03:39.460787] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs --volfile-server=10.35.160.6 --volfile-id=/StorageGE2_volume02 /mnt)
[2015-07-29 14:03:39.487540] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-07-29 14:03:39.496198] W [socket.c:642:__socket_rwv] 0-glusterfs: readv on 10.35.160.6:24007 failed (No data available)
[2015-07-29 14:03:39.497450] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f4764ba49e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f476496f9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f476496face] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f476497147c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f4764971c38] ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at 2015-07-29 14:03:39.487940 (xid=0x1)
[2015-07-29 14:03:39.497494] E [glusterfsd-mgmt.c:1604:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/StorageGE2_volume02)
[2015-07-29 14:03:39.497546] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(saved_frames_unwind+0x205) [0x7f476496f9e5] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x430) [0x7f476507c230] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f4765076609] ) 0-: received signum (0), shutting down

So I'm guessing a problem with the gluster version I have... package is from EPEL:
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/$basearch/

In the previous version with glusterfs-3.7.2-3.el7.x86_64 it worked. Mark as regression.

Comment 1 Carlos Mestre González 2015-07-29 17:12:55 UTC

As stated above this doesn't seem to happen with 3.7.2-3, tentative to just close this and open the proper bug in glusterfs project, what version is suppose to be tested with last build?

Comment 2 Allon Mureinik 2015-07-30 08:41:40 UTC

oVirt just attempts to mount the directory, and as Carlos stated, this issue stems from an error inside gluster.

Moving there in case the relevant stakeholders want/need to investigate.

Comment 5 Sahina Bose 2015-07-31 05:44:11 UTC

Is glusterd running on 10.35.160.6?
From the error, it seems to indicate that it is not - ": Command execution failed
error: Connection failed. Please check if gluster daemon is operational."

Could you also attach the glusterfs logs, as well as the gluster mount log from the client?

Comment 6 Carlos Mestre González 2015-07-31 08:52:02 UTC

Created attachment 1057990 [details]
try to mount a volume and fails, also the mnt.log

Comment 7 Carlos Mestre González 2015-07-31 08:52:48 UTC

Created attachment 1057991 [details]
mount volume pass in 3.7.2-3

Comment 8 Carlos Mestre González 2015-07-31 08:55:56 UTC

Hi Sahina,

I attached the logs from two different clients , with 3.7.2-3 and 3.7.3-1, the first one works but not the latests. I also added the packages versions.

The only log I can see anything in the server is etc-glusterfs-glusterd.vol.log.
When I try to mount it from the client I get:
[2015-07-31 08:55:06.846132] E [rpcsvc.c:638:rpcsvc_handle_rpc_call] 0-rpc-service: Request received from non-privileged port. Failing request

and a lot of these:
[2015-07-31 08:55:07.881943] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5274336107d113fcd6650a75a4695d2b.socket failed (Invalid argument)
[2015-07-31 08:55:10.893913] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5274336107d113fcd6650a75a4695d2b.socket failed (Invalid argument)
[2015-07-31 08:55:13.895145] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5274336107d113fcd6650a75a4695d2b.socket failed (Invalid argument)

Comment 9 Carlos Mestre González 2015-08-03 08:14:01 UTC

Hi Sahina,

any update on this?

Comment 11 SATHEESARAN 2015-08-03 17:10:36 UTC

Carlos, 

1. Check whether glusterd is running on the node, in your case its 10.35.160.6 ?

2. Could you provide volume information  ?-  output of 'gluster volume info' command

3. Could you also attach glusterd log file from the node - 10.35.160.6 ?

We will try to help you at the earliest

Comment 12 SATHEESARAN 2015-08-04 03:41:27 UTC

(In reply to SATHEESARAN from comment #11)
> Carlos, 
> 
> 1. Check whether glusterd is running on the node, in your case its
> 10.35.160.6 ?
> 
> 2. Could you provide volume information  ?-  output of 'gluster volume info'
> command
> 
> 3. Could you also attach glusterd log file from the node - 10.35.160.6 ?

missed to mention the location - /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

> 
> We will try to help you at the earliest

Comment 13 Carlos Mestre González 2015-08-04 10:00:40 UTC

Client with 3.7.2-3 and 3.7.3-1:

root@storage-ge5-vdsm1 ~]# rpm -qa | grep glusterfs
glusterfs-3.7.2-3.el7.x86_64
glusterfs-fuse-3.7.2-3.el7.x86_64
glusterfs-libs-3.7.2-3.el7.x86_64
glusterfs-debuginfo-3.7.2-3.el7.x86_64
glusterfs-client-xlators-3.7.2-3.el7.x86_64
glusterfs-cli-3.7.2-3.el7.x86_64
glusterfs-server-3.7.2-3.el7.x86_64
glusterfs-api-devel-3.7.2-3.el7.x86_64
glusterfs-rdma-3.7.2-3.el7.x86_64
glusterfs-api-3.7.2-3.el7.x86_64
glusterfs-extra-xlators-3.7.2-3.el7.x86_64
glusterfs-devel-3.7.2-3.el7.x86_64
(reverse-i-search)`mount': ^Cunt
[root@storage-ge5-vdsm1 ~]# date; mount -t glusterfs 10.35.160.6:/StorageGE5_volume01 /mnt/
Tue Aug  4 09:45:37 IDT 2015
[root@storage-ge5-vdsm1 ~]# umount /mnt/

ENABLE:
[ovirt-master-glusterfs-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes.
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/$basearch/
enabled=1
skip_if_unavailable=1
gpgcheck=1
gpgkey=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/pub.key

[ovirt-master-glusterfs-noarch-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes.
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/noarch
enabled=1
skip_if_unavailable=1
gpgcheck=1
gpgkey=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/pub.key


[root@storage-ge5-vdsm1 ~]# rpm -qa | grep glusterfs
glusterfs-libs-3.7.3-1.el7.x86_64
glusterfs-client-xlators-3.7.3-1.el7.x86_64
glusterfs-cli-3.7.3-1.el7.x86_64
glusterfs-devel-3.7.3-1.el7.x86_64
glusterfs-rdma-3.7.3-1.el7.x86_64
glusterfs-api-3.7.3-1.el7.x86_64
glusterfs-fuse-3.7.3-1.el7.x86_64
glusterfs-server-3.7.3-1.el7.x86_64
glusterfs-api-devel-3.7.3-1.el7.x86_64
glusterfs-debuginfo-3.7.3-1.el7.x86_64
glusterfs-3.7.3-1.el7.x86_64
glusterfs-extra-xlators-3.7.3-1.el7.x86_64
[root@storage-ge5-vdsm1 ~]# date; mount -t glusterfs 10.35.160.6:/StorageGE5_volume01 /mnt/
Tue Aug  4 09:55:03 IDT 2015
Mount failed. Please check the log file for more details.

Comment 14 Carlos Mestre González 2015-08-04 10:01:04 UTC

server:
[root@gluster-storage-03 ~]# ps xau | grep StorageGE2_volume01
root      2180  0.0  0.5 904304 21060 ?        Ssl  12:27   0:00 /usr/sbin/glusterfsd -s 10.35.160.6 --volfile-id StorageGE2_volume01.10.35.160.6.gluster_volumes-StorageGE2_volume01 -p /var/lib/glusterd/vols/StorageGE2_volume01/run/10.35.160.6-gluster_volumes-StorageGE2_volume01.pid -S /var/run/gluster/fbd825d57281527e79d845def2ecc5af.socket --brick-name /gluster_volumes/StorageGE2_volume01 -l /var/log/glusterfs/bricks/gluster_volumes-StorageGE2_volume01.log --xlator-option *-posix.glusterd-uuid=50891c26-f41d-4f71-9a2d-32998ed1e3a0 --brick-port 49353 --xlator-option StorageGE2_volume01-server.listen-port=49353
[root@gluster-storage-03 ~]# gluster volume info StorageGE5_volume01

Volume Name: StorageGE5_volume01
Type: Replicate
Volume ID: ac5c201d-9eb0-4ecd-bce2-16d41dba3c30
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.35.160.202:/gluster_volumes/StorageGE5_volume01
Brick2: 10.35.160.203:/gluster_volumes/StorageGE5_volume01
Brick3: 10.35.160.6:/gluster_volumes/StorageGE5_volume01
Options Reconfigured:
performance.readdir-ahead: on
storage.owner-gid: 36
storage.owner-uid: 36

[root@gluster-storage-03 ~]# rpm -qa | grep gluster
glusterfs-fuse-3.7.1-11.el7rhgs.x86_64
glusterfs-libs-3.7.1-11.el7rhgs.x86_64
glusterfs-3.7.1-11.el7rhgs.x86_64
glusterfs-api-3.7.1-11.el7rhgs.x86_64
glusterfs-server-3.7.1-11.el7rhgs.x86_64
glusterfs-client-xlators-3.7.1-11.el7rhgs.x86_64
glusterfs-cli-3.7.1-11.el7rhgs.x86_64

Comment 15 Carlos Mestre González 2015-08-04 10:04:23 UTC

Created attachment 1059000 [details]
glusterd.log

Comment 16 Kaushal 2015-08-05 07:36:38 UTC

This is happening because bind-insecure is enabled by default in the rpc library in 3.7.3.

The 3.7.3 clients attempt to connect to the 3.7.1 servers using insecure ports. The older servers still expect incoming connections to use secure ports (unless bind-insecure is turned on), and reject the connections from 3.7.3 clients.

Some solutions I can think of for this include,
- Turn bind-insecure off for clients by setting `client.bind-insecure` to `off` for the volumes. This will force 3.7.3 clients to use secure ports to establish the connection.

- Turn bind-insecure on for the bricks by setting `server.allow-insecure` to `on` for the volumes. This will enable the older servers to accept connections coming from insecure ports.

This had to be documented in the release-notes for 3.7.3, but I missed this commit when preparing the notes.

Comment 17 Elad 2015-08-06 14:38:53 UTC

Hi Kaushal, 

I tried the solution you suggested - configuring both the server and the client to work in the same mode. I tried to configure it both as secure:


gluster volume set elad2 server.allow-insecure off
gluster volume set elad2 client.bind-insecure off

And also, tried to configure them both as insecure:


gluster volume set elad2 server.allow-insecure on
gluster volume set elad2 client.bind-insecure on

Before configuring it, I stopped the volume and started it after.

This configuration did not solve the mount failure.

Can you please verify I did it the right way?

Thanks

Comment 18 SATHEESARAN 2015-08-10 04:28:13 UTC

The root cause and workaround for this issue is mentioned in this mail thread [1] - 

Elad, Carlos,

Hope the workaround mentioned in the mail thread[1] helps
Is this still a problem for you ?

[1] - http://www.gluster.org/pipermail/gluster-devel/2015-August/046361.html

Comment 19 Carlos Mestre González 2015-08-10 10:28:13 UTC

Hi,

I've tried both approach with one of the volumes, even stop the volume and start it again just in case, and same result:

# gluster volume info StorageGE2_volume01
 
Volume Name: StorageGE2_volume01
Type: Replicate
Volume ID: 8347806f-5075-4431-b143-984692370d8d
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.35.160.202:/gluster_volumes/StorageGE2_volume01
Brick2: 10.35.160.203:/gluster_volumes/StorageGE2_volume01
Brick3: 10.35.160.6:/gluster_volumes/StorageGE2_volume01
Options Reconfigured:
storage.owner-uid: 36
storage.owner-gid: 36
performance.readdir-ahead: on
server.allow-insecure: on
client.bind-insecure: off

from log:
[2015-08-10 10:26:30.619489] E [rpcsvc.c:638:rpcsvc_handle_rpc_call] 0-rpc-service: Request received from non-privileged port. Failing request

The mail states "on all volumes", I have only tried for one beacuse we have quite a lot of those and I just want to test it quickly. am I doing something wrong?

Comment 20 Kaushal 2015-08-18 07:02:17 UTC

Hi Carlos,

Could also try setting `option rpc-auth-allow-insecure on` in /etc/glusterfs/glusterd.vol ? You will need to restart GlusterD after doing this change.

This one additional step is needed for both of the workarounds mentioned in the mail thread.

The clients need to connect to glusterd to fetch the volfile. There is currently no way to specify the connection parameters for this initial connection to fetch the volfile. The new 3.7 clients will use the default insecure parameters to setup the connection. The older glusterds will reject these incoming connections.

Comment 21 Aharon Canan 2015-08-18 08:27:27 UTC

after adding "option rpc-auth-allow-insecure on" to /etc/glusterfs/glusterd.vol and service restart it works fine now.

[root@camel-vdsb ~]# mount -t glusterfs 10.35.160.6:/acanan02 /mnt/

[root@camel-vdsb ~]# rpm -qa |grep gluster
glusterfs-libs-3.7.3-1.el7.x86_64
glusterfs-fuse-3.7.3-1.el7.x86_64
glusterfs-client-xlators-3.7.3-1.el7.x86_64
glusterfs-api-3.7.3-1.el7.x86_64
glusterfs-3.7.3-1.el7.x86_64

Volume Name: acanan02
Type: Distribute
Volume ID: 8bb98061-eb50-44f7-bd03-cc011c31ac2c
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.35.160.6:/gluster_volumes/acanan02
Options Reconfigured:
performance.readdir-ahead: on
server.allow-insecure: on
client.bind-insecure: off

Comment 27 Sandro Bonazzola 2015-09-11 12:28:48 UTC

(In reply to Aharon Canan from comment #21)
> after adding "option rpc-auth-allow-insecure on" to
> /etc/glusterfs/glusterd.vol and service restart it works fine now.

Note that adding "option rpc-auth-allow-insecure on" to /etc/glusterfs/glusterd.vol is mandatory for Hosted Engine external glusterfs support http://www.ovirt.org/Features/Self_Hosted_Engine_Gluster_Support
So dropping bug #1083025 from blocked bugs.

Comment 28 Kaushal 2017-03-08 11:00:58 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.

acanan
ahino
amukherj
amureini
bmcclain
bugs
bugs
cmestreg
ebenahar
gklein
kaushal
lsurette
mike
ndevos
nlevinki
rbalakri
rcyriac
sabose
sasundar
sbonazzo
smohan
srevivo
vbellur
ykaul
ylavi