Bug 1191176

Summary: Since 3.6.2: failed to get the 'volume file' from server
Product: [Community] GlusterFS Reporter: bugzilla.redhat.com
Component: docAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.6.2CC: bugs, bugzilla.redhat.com, kaushal, lmohanty, monotek23, ndevos, pkarampu, rabhat, redhat.bugs, romeo.r, smohan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
GlusterFS 3.6.2 introduced improvements which required a change to the on-disk names of the volfiles. This change requires that the volfiles for existing volumes be regenerated, or client will not be able to mount these volumes. The volfiles can be regenerated by the running 'glusterd --xlator-option *.upgrade=on -N'. This command is normally run as a part of the post-upgrade steps when upgrading GlusterFS using binary packages. This currently happens only with RPMs on CentOS and Fedora. Users on other distributions need to do this manually for now.
Story Points: ---
Clone Of:
: 1249124 (view as bug list) Environment:
Last Closed: 2016-04-19 12:50:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1249124    

Description bugzilla.redhat.com 2015-02-10 15:46:21 UTC
Description of problem:

I upgraded to Glusterfs-3.6.2 today, and it didn't work. Downgrading to 3.6.1 fixes the issue.

All clients say, this happens on localhost as well:

I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/bin/glusterfs: Started running /usr/bin/glusterfs version  (args: /usr/bin/glusterfs --volfile-server=xxx --volfile-id=/xxx /xxx)
E [glusterfsd-mgmt.c:1494:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server
E [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/xxx)
W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down
I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/shared'.
W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15), shutting down

The `gluster volume info` looks fine and all volume come up. I'd like to debug this further but am not sure how.

Comment 1 bugzilla.redhat.com 2015-02-12 09:54:13 UTC
Are you missing information? This is a critical bug; glusterfs does not work after upgrading to 3.6.2; don't you have tests for this? How can I debug this? Does 3.6.2 work for other people?

Comment 2 Richard 2015-02-16 11:36:29 UTC
I've kept off the 3.5 and 3.6 version's and stuck with 3.4 (glusterfs-3.4.6-1.el6.x86_64) as that works best for me. 

On 3.6.x (any version including beta) I can't get my volumes to remount after cleanly shutting down all servers and rebooting them. I get some constant error about acl's in my logs so gave up with it.

in 3.5.x I can't peer probe a 2nd node after createing the 1st one which obviously is a non starter for using that.

There are bug fixes that have got into 3.4 that just don't seem to be being fixed in 3.5 or 3.6 releases :-(

Comment 3 Lalatendu Mohanty 2015-02-17 12:15:07 UTC
Have you upgraded all the nodes and then try to mount the volume?  Can you please tell us more about the upgrade sequence you followed?. Or is it just yum update on gluster nodes.

I think as of now having different versions of glusterfs in a gluster cluster is not a supported use case, in case you were trying to do that.

Comment 4 Niels de Vos 2015-02-17 12:42:17 UTC
This sounds like an issue that Pranith debugged over email for a different user. It should not happen on RPM based installation, but it could be an issue in Debian/Ubuntu packaging.

> hey, I found the root cause. I am seeing the following log '[2015-02-09
> 09:48:04.312800] E [glusterd-handshake.c:771:__server_getspec] 0-glusterd:
> Unable to stat FILENAME_REDACTED (No such file or directory)' This means the
> configuration file for the mount is not available in the directory. In 3.6.2
> filenames of the vol-files i.e. configuration files changed. How did you
> upgrade to 3.6.2? Is it rpm based installation or something else?  In any
> case, this is what you need to do: stop the volume. killall the glusterds on
> the machines of the cluster.  run 'glusterd --xlator-option *.upgrade=on -N'
> on all the machines. Then restart glusterds on the machines of the cluster
> and everything should be operational after that.
> 
> Pranith.

Comment 5 bugzilla.redhat.com 2015-02-18 08:48:41 UTC
(In reply to Lalatendu Mohanty from comment #3)
> Have you upgraded all the nodes and then try to mount the volume?  Can you
> please tell us more about the upgrade sequence you followed?. Or is it just
> yum update on gluster nodes.

I use Archlinux; I've upgraded all nodes; stopped all volumes; rebooted all machines and started the volume succesfully. Mounting simply doesn't work anymore.

(In reply to Niels de Vos from comment #4)
> This sounds like an issue that Pranith debugged over email for a different
> user. It should not happen on RPM based installation, but it could be an
> issue in Debian/Ubuntu packaging.

You're right; this fixes the issue! Is this command supposted to be run after each upgrade; or was this in a changelog somewhere?

Thanks!

Comment 6 Kaushal 2015-02-18 09:53:29 UTC
The names of volfiles on disk was changed for improved rdma support. This change was introduced in 3.6.2. For reference the commit-ids of the changes are,
 50952cd rdma: Client volfile name change for supporting rdma
 605db00 rdma: Wrong volfile fetch on fuse mounting tcp,rdma volume via rdma

This requires that the volfiles of existing volumes be regenerated, so that they use the new names. Without the regeneration, glusterd would be looking for files by the new names on a volfile fetch request but would not find them, which would lead to a mount failure.

This regeneration is done by running glusterd in upgrade mode, 'glusterd --xlator-option *.upgrade=on -N'.

RPM upgrades run this command as a part of the post-update. As we mainly test on RPMs, we didn't hit the issues faced by you guys.

I suggest you open a bug on the Arch Linux package, to have a post upgrade step added. I'd be happy to open the bug on your behalf.

I'll also make sure we add a release note stating this change for 3.6.3 at least.

Comment 7 Kaushal 2015-02-18 10:03:00 UTC
Changing component to doc, as we need a proper release note.

Comment 8 bugzilla.redhat.com 2015-02-18 10:12:40 UTC
(In reply to Kaushal from comment #6)
> The names of volfiles on disk was changed for improved rdma support. This
> change was introduced in 3.6.2. For reference the commit-ids of the changes
> are,
>  50952cd rdma: Client volfile name change for supporting rdma
>  605db00 rdma: Wrong volfile fetch on fuse mounting tcp,rdma volume via rdma

Thanks; since I checked all commits from 3.6.1 to 3.6.2 I had a hunch it would be related to this. But I didn't find any mention of the need to manually upgrade the volume file in the commits or the blogpost. Is there another channel I missed or should I check the rpm sources for this?

Comment 9 bugzilla.redhat.com 2015-02-18 10:16:05 UTC
(In reply to Kaushal from comment #6)
> RPM upgrades run this command as a part of the post-update. As we mainly
> test on RPMs, we didn't hit the issues faced by you guys.
> 
> I suggest you open a bug on the Arch Linux package, to have a post upgrade
> step added. I'd be happy to open the bug on your behalf.

I have reported this @ https://bugs.archlinux.org/task/43872

Comment 10 Raghavendra Bhat 2015-05-20 11:39:27 UTC
This has been addressed in glusterfs-3.6.3.

Comment 11 Roman 2015-07-14 12:05:38 UTC
Hi,

3.6.4 is out, but .deb pkgs are still affected.

Comment 12 André Bauer 2015-07-29 18:42:33 UTC
Imho this also affects 3.7.3. Could not find anything in the postinst script of the deb packages which is about volume rename.

The postinst script of deb packages should get something like:

#!/bin/bash
VOL_DIR="/var/lib/glusterd/vols"
for VOLUME in $(find ${VOL_DIR} -iname *-fuse.vol); do
    cp ${VOLUME} ${VOLUME}.dpkg-pre3.6.2
    mv ${VOLUME} $(echo ${VOLUME} | sed -e 's/-fuse.vol/.tcp-fuse.vol/g')
done

This is untested, cause im still on 3.5.
Don't know if gluster restart or something else is needed.

Comment 13 André Bauer 2016-03-11 22:04:55 UTC
Just did the upgrade from 3.5.8 to 3.7.8 like:

Stopped all volumes.
Upgrade.
Started all volumes.

Everything works so far.

Only thing i'm curious about is that i have 2 version of the volume files.

For my www volume i have:

-rw-------  1 root root 2384 Mär 11 19:58 trusted-www-fuse.vol
-rw-------  1 root root 2600 Mär 11 22:40 trusted-www.tcp-fuse.vol
-rw-------  1 root root 1928 Mär 11 19:58 www-fuse.vol
-rw-------  1 root root 2144 Mär 11 22:40 www.tcp-fuse.vol

Should i delete the old files?

Neverthelesse... I think the bug can be closed...

Comment 14 Raghavendra Bhat 2016-03-11 22:07:58 UTC

Thanks for the udate.

Nope.

you should not delete those volfiles. 


Regards,
Raghavendra

Comment 15 Kaleb KEITHLEY 2016-04-19 12:50:34 UTC
If the debian packages aren't doing the correct post-inst handling you could {open a, reopen this} bug against the packaging component.

It seems like this is resolved for now, closing it.