Bug 1697820 - rhgs 3.5 server not compatible with 3.4 client
Summary: rhgs 3.5 server not compatible with 3.4 client
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.5.0
Assignee: Sanju
QA Contact: Kshithij Iyer
URL:
Whiteboard:
Depends On: 1697907 1698042 1698471
Blocks: 1696807
TreeView+ depends on / blocked
 
Reported: 2019-04-09 07:39 UTC by Sweta Anandpara
Modified: 2019-10-30 12:21 UTC (History)
9 users (show)

Fixed In Version: glusterfs-6.0-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-30 12:20:50 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 None None None 2019-10-30 12:21:14 UTC

Description Sweta Anandpara 2019-04-09 07:39:59 UTC
Description of problem:
=======================
Had a 6node cluster with the build glusterfs-6.0-1 and a n*3 volume 'testvol' created. Op-version was set to 60000.
Mounted it over a client with glusterfs-3.12.2-47 bits and that failed with the error --> "glusterfs: failed to get the 'volume file' from server....Server is operating at an op-version which is not supported."  Updated the client bits to glusterfs-6.0-1 and the mount was successful. 


Version-Release number of selected component (if applicable):
============================================================
# rpm -qa | grep gluster
glusterfs-cli-6.0-1.el7rhgs.x86_64
glusterfs-cloudsync-plugins-6.0-1.el7rhgs.x86_64
tmp-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-3.0-0.noarch
python2-gluster-6.0-1.el7rhgs.x86_64
glusterfs-geo-replication-6.0-1.el7rhgs.x86_64
glusterfs-6.0-1.el7rhgs.x86_64
glusterfs-api-6.0-1.el7rhgs.x86_64
glusterfs-devel-6.0-1.el7rhgs.x86_64
glusterfs-client-xlators-6.0-1.el7rhgs.x86_64
glusterfs-fuse-6.0-1.el7rhgs.x86_64
glusterfs-events-6.0-1.el7rhgs.x86_64
glusterfs-rdma-6.0-1.el7rhgs.x86_64
glusterfs-thin-arbiter-6.0-1.el7rhgs.x86_64
glusterfs-libs-6.0-1.el7rhgs.x86_64
glusterfs-server-6.0-1.el7rhgs.x86_64
glusterfs-debuginfo-6.0-1.el7rhgs.x86_64
#

How reproducible:
=================
Always


Steps to Reproduce:
==================
1. Have a n (n>1) node cluster with RHGS 3.5 interim build 
2. Mount it over the current live RHGS 3.4 (BU4) client - glusterfs-3.12.2-47

Actual results:
================
Mount fails.

# mount -t glusterfs gqas001.sbu.lab.eng.bos.redhat.com:testvol /mnt/tmp/
Mount failed. Please check the log file for more details.


Expected results:
=================
n-1 (and n-2) compatibility should not break for RHGS 3.5.


Additional info:
================
Not attaching sosreports, as these are perf machines and it is going to be on the heavier side, for a relatively straight-forward issue.. Please do let me know if it is required, and I'll work on uploading the same. 

Client logs:
------------

[root@dhcp46-85 ~]# rpm -qa | grep gluster
glusterfs-libs-3.12.2-47.el7.x86_64
glusterfs-client-xlators-3.12.2-47.el7.x86_64
glusterfs-3.12.2-47.el7.x86_64
glusterfs-fuse-3.12.2-47.el7.x86_64
[root@dhcp46-85 ~]# 
[root@dhcp46-85 yum.repos.d]# yum repolist
Loaded plugins: product-id, search-disabled-repos, subscription-manager
rh-gluster-3-client-for-rhel-7-server-rpms                                                                                                       | 4.0 kB  00:00:00     
rhel-7-server-rpms                                                                                                                               | 3.4 kB  00:00:00     
(1/6): rh-gluster-3-client-for-rhel-7-server-rpms/7Server/x86_64/group                                                                           |  124 B  00:00:01     
(2/6): rh-gluster-3-client-for-rhel-7-server-rpms/7Server/x86_64/updateinfo                                                                      |  87 kB  00:00:01     
(3/6): rh-gluster-3-client-for-rhel-7-server-rpms/7Server/x86_64/primary_db                                                                      | 120 kB  00:00:01     
(4/6): rhel-7-server-rpms/7Server/x86_64/group                                                                                                   | 774 kB  00:00:02     
(5/6): rhel-7-server-rpms/7Server/x86_64/updateinfo                                                                                              | 3.0 MB  00:00:02     
(6/6): rhel-7-server-rpms/7Server/x86_64/primary_db                                                                                              |  54 MB  00:00:07     
repo id                                                                               repo name                                                                   status
rh-gluster-3-client-for-rhel-7-server-rpms/7Server/x86_64                             Red Hat Storage Native Client for RHEL 7 (RPMs)                                252
rhel-7-server-rpms/7Server/x86_64                                                     Red Hat Enterprise Linux 7 Server (RPMs)                                    23,926
repolist: 24,178
[root@dhcp46-85 yum.repos.d]#
[root@dhcp46-85 yum.repos.d]# cd
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# mkdir /mnt/tmp
[root@dhcp46-85 ~]# ping gqas001.sbu.lab.eng.bos.redhat.com
PING gqas001.sbu.lab.eng.bos.redhat.com (10.16.156.0) 56(84) bytes of data.
64 bytes from gqas001.sbu.lab.eng.bos.redhat.com (10.16.156.0): icmp_seq=1 ttl=55 time=245 ms
64 bytes from gqas001.sbu.lab.eng.bos.redhat.com (10.16.156.0): icmp_seq=2 ttl=55 time=245 ms
^C
--- gqas001.sbu.lab.eng.bos.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 245.895/245.933/245.972/0.497 ms
[root@dhcp46-85 ~]# mount -t glusterfs gqas001.sbu.lab.eng.bos.redhat.com:testvol /mnt/tmp/
Mount failed. Please check the log file for more details.
[root@dhcp46-85 ~]# vim /var/log/glusterfs/mnt-tmp.log 
-bash: vim: command not found
[root@dhcp46-85 ~]#
[root@dhcp46-85 ~]# cat /var/log/glusterfs/mnt-tmp.log 
[2019-04-09 04:18:40.387646] I [MSGID: 100030] [glusterfsd.c:2646:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.2 (args: /usr/sbin/glusterfs --volfile-server=gqas001.sbu.lab.eng.bos.redhat.com --volfile-id=testvol /mnt/tmp)
[2019-04-09 04:18:40.461797] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2019-04-09 04:18:40.478763] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-04-09 04:18:40.478852] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-04-09 04:18:40.971968] E [glusterfsd-mgmt.c:1925:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server
[2019-04-09 04:18:40.972037] E [glusterfsd-mgmt.c:2051:mgmt_getspec_cbk] 0-mgmt: Server is operating at an op-version which is not supported
[2019-04-09 04:18:40.974171] W [glusterfsd.c:1462:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90) [0x7f85a20c6a00] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x485) [0x563209920a55] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x563209919b2b] ) 0-: received signum (0), shutting down
[2019-04-09 04:18:40.974297] I [fuse-bridge.c:6611:fini] 0-fuse: Unmounting '/mnt/tmp'.
[2019-04-09 04:18:40.979281] I [fuse-bridge.c:6616:fini] 0-fuse: Closing fuse connection to '/mnt/tmp'.
[2019-04-09 04:18:40.980417] W [glusterfsd.c:1462:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f85a115cdd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x563209919cc5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x563209919b2b] ) 0-: received signum (15), shutting down
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# cat /etc/yum.repos.d/glusterfs-6.repo 
[local]
name=glusterfs-6
baseurl=file:///home/glusterfs-6
enabled=1
gpgcheck=0
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# yum update glusterfs glusterfs-fuse
Loaded plugins: product-id, search-disabled-repos, subscription-manager
local                                                                                                                                            | 2.9 kB  00:00:00     
local/primary_db                                                                                                                                 | 7.3 kB  00:00:00     
Resolving Dependencies
--> Running transaction check
---> Package glusterfs.x86_64 0:3.12.2-47.el7 will be updated
---> Package glusterfs.x86_64 0:6.0-1.el7 will be an update
--> Processing Dependency: glusterfs-libs(x86-64) = 6.0-1.el7 for package: glusterfs-6.0-1.el7.x86_64
---> Package glusterfs-fuse.x86_64 0:3.12.2-47.el7 will be updated
---> Package glusterfs-fuse.x86_64 0:6.0-1.el7 will be an update
--> Processing Dependency: glusterfs-client-xlators(x86-64) = 6.0-1.el7 for package: glusterfs-fuse-6.0-1.el7.x86_64
--> Running transaction check
---> Package glusterfs-client-xlators.x86_64 0:3.12.2-47.el7 will be updated
---> Package glusterfs-client-xlators.x86_64 0:6.0-1.el7 will be an update
---> Package glusterfs-libs.x86_64 0:3.12.2-47.el7 will be updated
---> Package glusterfs-libs.x86_64 0:6.0-1.el7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================
 Package                                              Arch                               Version                                Repository                         Size
========================================================================================================================================================================
Updating:
 glusterfs                                            x86_64                             6.0-1.el7                              local                             599 k
 glusterfs-fuse                                       x86_64                             6.0-1.el7                              local                             122 k
Updating for dependencies:
 glusterfs-client-xlators                             x86_64                             6.0-1.el7                              local                             825 k
 glusterfs-libs                                       x86_64                             6.0-1.el7                              local                             390 k

Transaction Summary
========================================================================================================================================================================
Upgrade  2 Packages (+2 Dependent packages)

Total download size: 1.9 M
Is this ok [y/d/N]: y
Downloading packages:
...
...
...
Complete!
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# rpm -qa | grep gluster
glusterfs-6.0-1.el7.x86_64
glusterfs-libs-6.0-1.el7.x86_64
glusterfs-client-xlators-6.0-1.el7.x86_64
glusterfs-fuse-6.0-1.el7.x86_64
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# mkdir /mnt/tmpNew
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# mount -t glusterfs gqas001.sbu.lab.eng.bos.redhat.com:testvol /mnt/tmpNew/
[root@dhcp46-85 ~]# 
[root@dhcp46-85 ~]# vi /var/log/glusterfs/mnt-tmpNew.log 
[root@dhcp46-85 ~]# mount | grep gluster
gqas001.sbu.lab.eng.bos.redhat.com:testvol on /mnt/tmpNew type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@dhcp46-85 ~]# 


Server logs:
-------------
[root@gqas001 ~]# rpm -qa | grep gluster
glusterfs-cli-6.0-1.el7rhgs.x86_64
glusterfs-cloudsync-plugins-6.0-1.el7rhgs.x86_64
tmp-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-3.0-0.noarch
python2-gluster-6.0-1.el7rhgs.x86_64
glusterfs-geo-replication-6.0-1.el7rhgs.x86_64
glusterfs-6.0-1.el7rhgs.x86_64
glusterfs-api-6.0-1.el7rhgs.x86_64
glusterfs-devel-6.0-1.el7rhgs.x86_64
glusterfs-client-xlators-6.0-1.el7rhgs.x86_64
glusterfs-fuse-6.0-1.el7rhgs.x86_64
glusterfs-events-6.0-1.el7rhgs.x86_64
glusterfs-rdma-6.0-1.el7rhgs.x86_64
glusterfs-thin-arbiter-6.0-1.el7rhgs.x86_64
glusterfs-libs-6.0-1.el7rhgs.x86_64
glusterfs-server-6.0-1.el7rhgs.x86_64
glusterfs-debuginfo-6.0-1.el7rhgs.x86_64
[root@gqas001 ~]# 
[root@gqas001 ~]# gluster pool list
UUID					Hostname                          	State
825da299-4e10-4a93-9f26-c30b6c49f1c9	gqas004.sbu.lab.eng.bos.redhat.com	Connected 
2466fcd1-78f0-4d66-bd18-28fed503e504	gqas009.sbu.lab.eng.bos.redhat.com	Connected 
528c8eae-9a54-4394-bcab-566495cc5a68	gqas010.sbu.lab.eng.bos.redhat.com	Connected 
e2a31da0-e9f0-479b-89ec-6dc4e316d299	gqas012.sbu.lab.eng.bos.redhat.com	Connected 
a9380115-b30b-475e-a22d-e69bee8a92d9	gqas014.sbu.lab.eng.bos.redhat.com	Connected 
4b69a70d-b81c-4aff-aa62-643b7a62b135	localhost                         	Connected 
[root@gqas001 ~]# 
[root@gqas001 ~]# gluster v get all all
Option                                  Value                                   
------                                  -----                                   
cluster.server-quorum-ratio             51                                      
cluster.enable-shared-storage           disable                                 
cluster.op-version                      60000                                   
cluster.max-op-version                  60000                                   
cluster.brick-multiplex                 disable                                 
cluster.max-bricks-per-process          250                                     
cluster.daemon-log-level                INFO                                    
[root@gqas001 ~]# 
[root@gqas001 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 02e345d4-7567-4bdb-83c0-698cb70f275d
Status: Started
Snapshot Count: 0
Number of Bricks: 24 x 3 = 72
Transport-type: tcp
Bricks:
Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/gluster/brick1/testvol
Brick2: gqas004.sbu.lab.eng.bos.redhat.com:/gluster/brick1/testvol
...
...
...
Brick70: gqas010.sbu.lab.eng.bos.redhat.com:/gluster/brick12/testvol
Brick71: gqas012.sbu.lab.eng.bos.redhat.com:/gluster/brick12/testvol
Brick72: gqas014.sbu.lab.eng.bos.redhat.com:/gluster/brick12/testvol
Options Reconfigured:
performance.cache-samba-metadata: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: off
[root@gqas001 ~]# 
[root@gqas001 ~]# cat /var/log/glusterfs/glusterd.log
...
...
[2019-04-09 03:59:15.632766] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2019-04-09 03:59:15.635367] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2019-04-09 04:18:40.848223] I [MSGID: 106022] [glusterd-handshake.c:868:_client_supports_volume] 0-glusterd: Client 10.70.46.85:1023 (1 -> 31305) doesn't support required op-version (40000). Rejecting volfile request. [Operation not supported]
[2019-04-09 07:30:50.013756] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2019-04-09 07:30:50.016410] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2019-04-09 07:31:36.587161] I [MSGID: 106487] [glusterd-handler.c:1498:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2019-04-09 07:32:00.278177] E [MSGID: 106061] [glusterd-utils.c:10290:glusterd_max_opversion_use_rsp_dict] 0-management: Maximum supported op-version not set in destination dictionary
...
...
[root@gqas001 ~]#

Comment 2 Atin Mukherjee 2019-04-10 11:11:51 UTC
There're two parts to this problem. What you have hit is explained in (1), but if we fix only (1) you would end up hitting (2) , so we need to fix both. The root cause is self explanatory from the commit messages but I'll paste here too

1. upstream patch : https://review.gluster.org/#/c/22539/ (we still need to get an agreement on this approach)

With group-metadata-cache group profile settings performance.cache-invalidation option when turned on enables both md-cache and quick-read xlator's cache-invalidation feature. While the intent of the group-metadata-cache is to set md-cache xlator's cache-invalidation feature, quick-read xlator also gets affected due to the same. While md-cache feature and it's profile existed since release-3.9, quick-read cache-invalidation was introduced in release-4 and due to this op-version mismatch on any cluster which is >= glusterfs-4 when this group profile is applied it breaks backward compatibility with the old clients. The proposed fix here is to rename the key in quick-read to 'quick-read-cache-invalidation' so that both these features have distinct identification. While this brings in by itself a backward compatibility challenge where this feature is enabled in an existing cluster and when the same is upgraded to a version where this change exists, it will lead to an unidentified old key. But as a workaround we can always ask users upgrading to release-7 version to turn off this option, upgrade the cluster and turn it back on with the new key. This needs to be documented once the patch is accepted. 

2. upstream patch : https://review.gluster.org/22536

Considering ctime is a client side feature, we can't blindly load ctime xlator into the client graph if it's explicitly turned off, that'd result into backward compatibility issue where an old client can't mount a volume configured on a server which is having ctime feature.

Comment 20 errata-xmlrpc 2019-10-30 12:20:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.