Bug 1249124

Summary: postinst script needed to filter vol files after fuse/tcp-fuse rename in 3.6.3
Product: [Community] GlusterFS Reporter: Kaleb KEITHLEY <kkeithle>
Component: packagingAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.6.3CC: bugs, bugzilla.redhat.com, kaushal, kkeithle, lmohanty, monotek23, ndevos, pkarampu, rabhat, redhat.bugs, romeo.r, smohan
Target Milestone: ---Keywords: RFE, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1191176 Environment:
Last Closed: 2016-08-23 12:42:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1191176    
Bug Blocks:    

Description Kaleb KEITHLEY 2015-07-31 14:40:17 UTC
+++ This bug was initially created as a clone of Bug #1191176 +++

Description of problem:

I upgraded to Glusterfs-3.6.2 today, and it didn't work. Downgrading to 3.6.1 fixes the issue.

All clients say, this happens on localhost as well:

I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/bin/glusterfs: Started running /usr/bin/glusterfs version  (args: /usr/bin/glusterfs --volfile-server=xxx --volfile-id=/xxx /xxx)
E [glusterfsd-mgmt.c:1494:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server
E [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/xxx)
W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down
I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/shared'.
W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15), shutting down

The `gluster volume info` looks fine and all volume come up. I'd like to debug this further but am not sure how.

--- Additional comment from  on 2015-02-12 04:54:13 EST ---

Are you missing information? This is a critical bug; glusterfs does not work after upgrading to 3.6.2; don't you have tests for this? How can I debug this? Does 3.6.2 work for other people?

--- Additional comment from Richard on 2015-02-16 06:36:29 EST ---

I've kept off the 3.5 and 3.6 version's and stuck with 3.4 (glusterfs-3.4.6-1.el6.x86_64) as that works best for me. 

On 3.6.x (any version including beta) I can't get my volumes to remount after cleanly shutting down all servers and rebooting them. I get some constant error about acl's in my logs so gave up with it.

in 3.5.x I can't peer probe a 2nd node after createing the 1st one which obviously is a non starter for using that.

There are bug fixes that have got into 3.4 that just don't seem to be being fixed in 3.5 or 3.6 releases :-(

--- Additional comment from Lalatendu Mohanty on 2015-02-17 07:15:07 EST ---

Have you upgraded all the nodes and then try to mount the volume?  Can you please tell us more about the upgrade sequence you followed?. Or is it just yum update on gluster nodes.

I think as of now having different versions of glusterfs in a gluster cluster is not a supported use case, in case you were trying to do that.

--- Additional comment from Niels de Vos on 2015-02-17 07:42:17 EST ---

This sounds like an issue that Pranith debugged over email for a different user. It should not happen on RPM based installation, but it could be an issue in Debian/Ubuntu packaging.

> hey, I found the root cause. I am seeing the following log '[2015-02-09
> 09:48:04.312800] E [glusterd-handshake.c:771:__server_getspec] 0-glusterd:
> Unable to stat FILENAME_REDACTED (No such file or directory)' This means the
> configuration file for the mount is not available in the directory. In 3.6.2
> filenames of the vol-files i.e. configuration files changed. How did you
> upgrade to 3.6.2? Is it rpm based installation or something else?  In any
> case, this is what you need to do: stop the volume. killall the glusterds on
> the machines of the cluster.  run 'glusterd --xlator-option *.upgrade=on -N'
> on all the machines. Then restart glusterds on the machines of the cluster
> and everything should be operational after that.
> 
> Pranith.

--- Additional comment from  on 2015-02-18 03:48:41 EST ---

(In reply to Lalatendu Mohanty from comment #3)
> Have you upgraded all the nodes and then try to mount the volume?  Can you
> please tell us more about the upgrade sequence you followed?. Or is it just
> yum update on gluster nodes.

I use Archlinux; I've upgraded all nodes; stopped all volumes; rebooted all machines and started the volume succesfully. Mounting simply doesn't work anymore.

(In reply to Niels de Vos from comment #4)
> This sounds like an issue that Pranith debugged over email for a different
> user. It should not happen on RPM based installation, but it could be an
> issue in Debian/Ubuntu packaging.

You're right; this fixes the issue! Is this command supposted to be run after each upgrade; or was this in a changelog somewhere?

Thanks!

--- Additional comment from Kaushal on 2015-02-18 04:53:29 EST ---

The names of volfiles on disk was changed for improved rdma support. This change was introduced in 3.6.2. For reference the commit-ids of the changes are,
 50952cd rdma: Client volfile name change for supporting rdma
 605db00 rdma: Wrong volfile fetch on fuse mounting tcp,rdma volume via rdma

This requires that the volfiles of existing volumes be regenerated, so that they use the new names. Without the regeneration, glusterd would be looking for files by the new names on a volfile fetch request but would not find them, which would lead to a mount failure.

This regeneration is done by running glusterd in upgrade mode, 'glusterd --xlator-option *.upgrade=on -N'.

RPM upgrades run this command as a part of the post-update. As we mainly test on RPMs, we didn't hit the issues faced by you guys.

I suggest you open a bug on the Arch Linux package, to have a post upgrade step added. I'd be happy to open the bug on your behalf.

I'll also make sure we add a release note stating this change for 3.6.3 at least.

--- Additional comment from Kaushal on 2015-02-18 05:03:00 EST ---

Changing component to doc, as we need a proper release note.

--- Additional comment from  on 2015-02-18 05:12:40 EST ---

(In reply to Kaushal from comment #6)
> The names of volfiles on disk was changed for improved rdma support. This
> change was introduced in 3.6.2. For reference the commit-ids of the changes
> are,
>  50952cd rdma: Client volfile name change for supporting rdma
>  605db00 rdma: Wrong volfile fetch on fuse mounting tcp,rdma volume via rdma

Thanks; since I checked all commits from 3.6.1 to 3.6.2 I had a hunch it would be related to this. But I didn't find any mention of the need to manually upgrade the volume file in the commits or the blogpost. Is there another channel I missed or should I check the rpm sources for this?

--- Additional comment from  on 2015-02-18 05:16:05 EST ---

(In reply to Kaushal from comment #6)
> RPM upgrades run this command as a part of the post-update. As we mainly
> test on RPMs, we didn't hit the issues faced by you guys.
> 
> I suggest you open a bug on the Arch Linux package, to have a post upgrade
> step added. I'd be happy to open the bug on your behalf.

I have reported this @ https://bugs.archlinux.org/task/43872

--- Additional comment from Raghavendra Bhat on 2015-05-20 07:39:27 EDT ---

This has been addressed in glusterfs-3.6.3.

--- Additional comment from Roman on 2015-07-14 08:05:38 EDT ---

Hi,

3.6.4 is out, but .deb pkgs are still affected.

--- Additional comment from André Bauer on 2015-07-29 14:42:33 EDT ---

Imho this also affects 3.7.3. Could not find anything in the postinst script of the deb packages which is about volume rename.

The postinst script of deb packages should get something like:

#!/bin/bash
VOL_DIR="/var/lib/glusterd/vols"
for VOLUME in $(find ${VOL_DIR} -iname *-fuse.vol); do
    cp ${VOLUME} ${VOLUME}.dpkg-pre3.6.2
    mv ${VOLUME} $(echo ${VOLUME} | sed -e 's/-fuse.vol/.tcp-fuse.vol/g')
done

This is untested, cause im still on 3.5.
Don't know if gluster restart or something else is needed.

Comment 1 Kaleb KEITHLEY 2015-07-31 14:42:23 UTC
We will consider a postinst script for both RPMs and dpkgs.

Comment 2 André Bauer 2015-09-02 12:07:48 UTC
Any news? 
Seems the 3.7.4 packages on Ubuntu Launchpad are still affected.

Comment 3 Kaleb KEITHLEY 2016-08-23 12:42:37 UTC
This seems to be an issue when updating from < 3.6.2 to >= 3.6.3.  IOW ancient history, closing as WONTFIX.

Please reopen or file a new bug if it's still a problem.