Red Hat Bugzilla – Bug 1249339
Gluster 3.7.3 client fails to mount volume on 3.7.2 servers
Last modified: 2017-03-08 05:49:39 EST
Created attachment 1058374 [details]
gv-homes-AD.log showing me trying to mount the volume before and after each server update
Description of problem:
After I upgraded my GlusterFS FUSE client from 3.5.5 to 3.7.3, I unmounted the gluster volume (/gv-homes-AD). I then attempted to mount it multiple times but doing so failed.
This error was produced repeatedly in the log (/var/log/glusterfs/gv-homes-AD.log):
E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x19a)[0x7f169c7e33fa] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f169c5ae4af] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f169c5ae5ce] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f169c5afd9c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f169c5b0548] ))))) 0-gv-homes-AD-client-0: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-08-01 16:37:27.513395 (xid=0x3)
Both Gluster servers for the 2-brick replicated volume (I am about to make the client machine a 3rd replica) were running 3.7.2. JoeJulian on IRC identified this as likely an RPC bug.
After I updated one of the servers to 3.7.3, the client was able to mount the volume, but the errors persisted. (The 1st machine to be updated, death-star, was specified in /etc/fstab).
After I updated the 2nd server, the errors went away. So I am good to go now, but I am reporting this as a bug/issue.
Version-Release number of selected component (if applicable):
Client is F21 x86_64.
It was running 3.5.5-2.fc21.x86_64 (provided by Fedora.)
Then I updated it to 3.7.3-1.fc21 from download.gluster.org.
Both servers are RHEL 7.1 x86_64.
They were running 3.7.21.el7 from download.gluster.org.
Then I updated them to 3.7.3-1.el7 from download.gluster.org.
The mount command failed multiple times in a row, but I fixed the issue by updating the servers.
death-star:/gv-homes-AD /gv-homes-AD glusterfs acl 0 0
Is there any firewall blocking the data? Can you give the output of,
# iptables -L
from all the bricks and client node?
Sorry, I have already updated all 3 machines.
I think I had the firewall disabled on all of them at the time. I know that I later verified that the firewall was disabled on all of them by running that command.
We are seeing the same issue. The 3.7.3 client to a 3.7.2 server fails to mount. I didn't see any log messages on the server side.
Log From Client:
[2015-08-03 14:36:55.678451] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs --volfile-server=gluster1 --volfile-server=gluster2 --volfile-id=general1 /gluster/general1)
[2015-08-03 14:36:55.689039] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-08-03 14:36:55.726060] W [socket.c:642:__socket_rwv] 0-glusterfs: readv on 10.x.x.x:24007 failed (No data available)
[2015-08-03 14:36:55.726894] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f084bb7bf46] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f084b94c54e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f084b94c65e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f084b94df1c] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f084b94e6b8] ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at 2015-08-03 14:36:55.707046 (xid=0x1)
[2015-08-03 14:36:55.726988] E [glusterfsd-mgmt.c:1604:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:general1)
[2015-08-03 14:36:55.727131] W [glusterfsd.c:1219:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x205) [0x7f084b94c575] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x400) [0x7f084c051f00] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f084c04c589] ) 0-: received signum (0), shutting down
[2015-08-03 14:36:55.727252] I [fuse-bridge.c:5595:fini] 0-fuse: Unmounting '/gluster/general1'.
[2015-08-03 14:36:55.740108] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f084b30f182] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f084c04c6f5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f084c04c589] ) 0-: received signum (15), shutting down
Downgrading to the client to 3.7.2 fixes the issue.
I just ran into this same issue with oVirt 4.0 on CentOS 7 when tryng to connect a new host into an existing Gluster storage domain.
If I manually mount with client at 3.7.3, then I see the "failed to fetch volume file" error. I was able to manually downgrade the client to 3.7.2 (as suggested in comment#4) and this issue is not seen.
This is now a one-year-old issue which effects new hosts using the current, just-released version of oVirt.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.
Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.