1441992 – ls hang seen on an existing mount point (3.2 client) when the server is upgraded and parallel readdir is enabled

Bug 1441992 - ls hang seen on an existing mount point (3.2 client) when the server is upgraded and parallel readdir is enabled

Summary: ls hang seen on an existing mount point (3.2 client) when the server is upgra...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Poornima G
QA Contact:	Vinayak Papnoi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1417151
TreeView+	depends on / blocked

Reported:	2017-04-13 09:09 UTC by Sweta Anandpara
Modified:	2017-09-21 04:37 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-24
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-21 04:37:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Sweta Anandpara 2017-04-13 09:09:47 UTC

Description of problem:
========================
Had a 4node cluster with the GA'ed 3.2 build. Created 3 volumes -ozone (2*2), disp(2*4+2), and dist (2*1). Had all the volumes mounted on client C1 over fuse, and some files created. 

Upgraded the server to 3.3 (3.8.4-22), enabled eventing, enabled performance.parallel-readdir on volumes ozone and dist. The clients _continued_ to be 3.2. Existing mountpoints (even after remounting) stopped working. The volume on which the volume option parallel-readdir was not enabled, disp, is accessible via the existing mount. A fresh mount of the same volumes on 3.3 clients work. 

I suppose the new volume graph created after enabling this volume option is not being recognised by clients of older gluster version. 
Sosreports copied at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/


Version-Release number of selected component (if applicable):
=============================================================
3.8.4-22


How reproducible:
===================
2:2

Steps to Reproduce:
===================
1. Have server and client of glusterfs 3.2.0 with a volume created and mounted.
2. Upgrade the server to 3.3.0. Do NOT update the client
3. Enable performance.parallel-readdir and try to access the same volume from an existing mountpoint of 3.2 client

Actual results:
==============
Any command when given, hangs.

When performance.parallel-readdir is disabled on the volume, it errors out with: "Transport endpoint not connected"

Client logs:
-----------

[2017-04-08 02:31:33.443345] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2017-04-08 02:40:01.960990] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-ozone-client-1: server 10.70.47.164:49152 has not responded in the last 42 seconds, disconnecting.
[2017-04-08 02:41:48.549405] I [socket.c:3446:socket_submit_request] 0-ozone-client-1: not connected (priv->connected = -1)
[2017-04-08 02:41:48.549462] W [rpc-clnt.c:1692:rpc_clnt_submit] 0-ozone-client-1: failed to submit rpc-request (XID: 0x14 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (ozone-client-1)
[2017-04-08 02:41:48.549481] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-ozone-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]
[2017-04-08 02:41:49.569170] W [rpc-clnt.c:1692:rpc_clnt_submit] 0-ozone-client-1: failed to submit rpc-request (XID: 0x15 Program: GlusterFS 3.3, ProgVers: 330, Proc: 20) to rpc-transport (ozone-client-1)
[2017-04-08 02:41:49.569204] E [MSGID: 114031] [client-rpc-fops.c:2847:client3_3_opendir_cbk] 0-ozone-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]


Additional info:
=================

[root@dhcp47-165 ~]# 
[root@dhcp47-165 ~]# gluster peer status
Number of Peers: 3

Hostname: dhcp47-164.lab.eng.blr.redhat.com
Uuid: afa697a0-2cc6-4705-892e-f5ec56a9f9de
State: Peer in Cluster (Connected)

Hostname: dhcp47-162.lab.eng.blr.redhat.com
Uuid: 95491d39-d83a-4053-b1d5-682ca7290bd2
State: Peer in Cluster (Connected)

Hostname: dhcp47-157.lab.eng.blr.redhat.com
Uuid: d0955c85-94d0-41ba-aea8-1ffde3575ea5
State: Peer in Cluster (Connected)
[root@dhcp47-165 ~]# 
[root@dhcp47-165 ~]# 
[root@dhcp47-165 ~]# gluster v list
disp
dist
ozone
[root@dhcp47-165 ~]# 
[root@dhcp47-165 ~]# 
[root@dhcp47-165 ~]# gluster v info
 
Volume Name: disp
Type: Distributed-Disperse
Volume ID: d7f56851-61a5-4211-8f3f-1c000a68eced
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick0/disp_0
Brick2: 10.70.47.164:/bricks/brick0/disp_1
Brick3: 10.70.47.162:/bricks/brick0/disp_2
Brick4: 10.70.47.157:/bricks/brick0/disp_3
Brick5: 10.70.47.165:/bricks/brick1/disp_4
Brick6: 10.70.47.164:/bricks/brick1/disp_5
Brick7: 10.70.47.162:/bricks/brick1/disp_6
Brick8: 10.70.47.157:/bricks/brick1/disp_7
Brick9: 10.70.47.165:/bricks/brick2/disp_8
Brick10: 10.70.47.164:/bricks/brick2/disp_9
Brick11: 10.70.47.162:/bricks/brick2/disp_10
Brick12: 10.70.47.157:/bricks/brick2/disp_11
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.bitrot: on
features.scrub: Active
features.scrub-freq: hourly
 
Volume Name: dist
Type: Distribute
Volume ID: f9571010-d72a-4cec-a12c-f2819bf12c04
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick0/dist_0
Brick2: 10.70.47.164:/bricks/brick0/dist_1
Options Reconfigured:
performance.parallel-readdir: on
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.bitrot: on
features.scrub: Active
features.scrub-freq: hourly
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: 8b736150-4fdd-4f00-9446-4ae89920f63b
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick0/ozone_0
Brick2: 10.70.47.164:/bricks/brick0/ozone_1
Brick3: 10.70.47.162:/bricks/brick0/ozone_2
Brick4: 10.70.47.157:/bricks/brick0/ozone_3
Options Reconfigured:
performance.parallel-readdir: on
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.bitrot: on
features.scrub: Active
features.scrub-freq: hourly
[root@dhcp47-165 ~]# 
[root@dhcp47-165 ~]# 
[root@dhcp47-165 ~]# gluster v status
Status of volume: disp
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick0/disp_0    49154     0          Y       29903
Brick 10.70.47.164:/bricks/brick0/disp_1    49154     0          Y       23450
Brick 10.70.47.162:/bricks/brick0/disp_2    49153     0          Y       23257
Brick 10.70.47.157:/bricks/brick0/disp_3    49153     0          Y       23300
Brick 10.70.47.165:/bricks/brick1/disp_4    49155     0          Y       29922
Brick 10.70.47.164:/bricks/brick1/disp_5    49155     0          Y       23469
Brick 10.70.47.162:/bricks/brick1/disp_6    49154     0          Y       23276
Brick 10.70.47.157:/bricks/brick1/disp_7    49154     0          Y       23319
Brick 10.70.47.165:/bricks/brick2/disp_8    49156     0          Y       29942
Brick 10.70.47.164:/bricks/brick2/disp_9    49156     0          Y       23489
Brick 10.70.47.162:/bricks/brick2/disp_10   49155     0          Y       23296
Brick 10.70.47.157:/bricks/brick2/disp_11   49155     0          Y       23339
Self-heal Daemon on localhost               N/A       N/A        Y       29966
Bitrot Daemon on localhost                  N/A       N/A        Y       29977
Scrubber Daemon on localhost                N/A       N/A        Y       29989
Self-heal Daemon on dhcp47-164.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23513
Bitrot Daemon on dhcp47-164.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       23524
Scrubber Daemon on dhcp47-164.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       23536
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23318
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       23328
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       23339
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23363
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       23373
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       23384
 
Task Status of Volume disp
------------------------------------------------------------------------------
There are no active volume tasks
 
Another transaction is in progress for dist. Please try again after sometime.
 
Status of volume: ozone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick0/ozone_0   49152     0          Y       28852
Brick 10.70.47.164:/bricks/brick0/ozone_1   49152     0          Y       22614
Brick 10.70.47.162:/bricks/brick0/ozone_2   49152     0          Y       22428
Brick 10.70.47.157:/bricks/brick0/ozone_3   49152     0          Y       22576
Self-heal Daemon on localhost               N/A       N/A        Y       29966
Bitrot Daemon on localhost                  N/A       N/A        Y       29977
Scrubber Daemon on localhost                N/A       N/A        Y       29989
Self-heal Daemon on dhcp47-164.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23513
Bitrot Daemon on dhcp47-164.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       23524
Scrubber Daemon on dhcp47-164.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       23536
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23318
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       23328
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       23339
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23363
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       23373
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       23384
 
Task Status of Volume ozone
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp47-165 ~]#
[root@dhcp47-165 ~]# rpm -qa | grep gluster
glusterfs-libs-3.8.4-22.el7rhgs.x86_64
glusterfs-cli-3.8.4-22.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-22.el7rhgs.x86_64
glusterfs-rdma-3.8.4-22.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-3.8.4-22.el7rhgs.x86_64
glusterfs-api-3.8.4-22.el7rhgs.x86_64
glusterfs-events-3.8.4-22.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
glusterfs-fuse-3.8.4-22.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-22.el7rhgs.x86_64
glusterfs-server-3.8.4-22.el7rhgs.x86_64
python-gluster-3.8.4-22.el7rhgs.noarch
[root@dhcp47-165 ~]#

Comment 2 Sweta Anandpara 2017-04-13 09:17:15 UTC

[qe@rhsqe-repo 1441992]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com
[qe@rhsqe-repo 1441992]$ 
[qe@rhsqe-repo 1441992]$ pwd
/home/repo/sosreports/1441992
[qe@rhsqe-repo 1441992]$ 
[qe@rhsqe-repo 1441992]$ ll
total 49308
-rwxr-xr-x. 1 qe qe 12584772 Apr 13 14:33 sosreport-dhcp47-157-sysreg-prod-20170413034037.tar.xz
-rwxr-xr-x. 1 qe qe 12578400 Apr 13 14:33 sosreport-dhcp47-162-sysreg-prod-20170413034025.tar.xz
-rwxr-xr-x. 1 qe qe 12627140 Apr 13 14:34 sosreport-dhcp47-164-sysreg-prod-20170413034018.tar.xz
-rwxr-xr-x. 1 qe qe 12697080 Apr 13 14:33 sosreport-dhcp47-165-sysreg-prod-20170413034012.tar.xz
[qe@rhsqe-repo 1441992]$

Comment 3 Poornima G 2017-04-13 11:32:25 UTC

RCA:

The 3.2 op-version is 31001, and parallel readdir op-version is 31000. But the 3.2 code doesn't recognize parallel-readdir feature and still we will be able to enable the feature, because the 3.2 opversion is greater than parallel readdir's opversion. Ideally we should pick features of higher opversion in one release and then in the next release we pick a feature of lower opversion.

We shouldn't allow enabling parallel readdir until:
- cluster op-version is that of 3.3
- all the clients and servers are upgraded to 3.3 - this condition is what is breaking as the parallel readdir opversion is lower than that of 3.2.
If we allow setting parallel-readdir when there are older clients, then the older clients might crash or fail to mount as they do not understand the new feature "parallel-readdir".

So the bug would be, setting "parallel-readdir on" is working even when older clients are connected, but it is expected to fail.

Possible solution:
Increase the op-version of parallel-readdir to be > 31001 only in downstream.
Will wait for more discussion with glusterd team before arriving at the solution

Comment 4 Atin Mukherjee 2017-04-13 12:37:16 UTC

(In reply to Poornima G from comment #3)
> RCA:
> 
> The 3.2 op-version is 31001, and parallel readdir op-version is 31000. But
> the 3.2 code doesn't recognize parallel-readdir feature and still we will be
> able to enable the feature, because the 3.2 opversion is greater than
> parallel readdir's opversion. Ideally we should pick features of higher
> opversion in one release and then in the next release we pick a feature of
> lower opversion.
> 
> We shouldn't allow enabling parallel readdir until:
> - cluster op-version is that of 3.3
> - all the clients and servers are upgraded to 3.3 - this condition is what
> is breaking as the parallel readdir opversion is lower than that of 3.2.
> If we allow setting parallel-readdir when there are older clients, then the
> older clients might crash or fail to mount as they do not understand the new
> feature "parallel-readdir".
> 
> So the bug would be, setting "parallel-readdir on" is working even when
> older clients are connected, but it is expected to fail.
> 
> Possible solution:
> Increase the op-version of parallel-readdir to be > 31001 only in downstream.
> Will wait for more discussion with glusterd team before arriving at the
> solution

Kaushal,

I don't have any other solution in mind apart from what Poornima is referring at. Do you think we can handle this in any other way. Unfortunately this will again lead us in diverging the op-versions between upstream and downstream.

Comment 7 Kaushal 2017-04-18 08:18:31 UTC

I cannot think of other way. We will need to diverge op-versions again.

The original intent of syncing op-versions across upstream and downstream was to allow upstream clients to use downstream volumes. Diverging will break this, and I guess we're okay with that.

But now, we'll need someone to track these changes between upstream and downstream, and make sure these changes are done whenever we fork a downstream branch from upstream.

Comment 8 Atin Mukherjee 2017-04-18 09:42:00 UTC

(In reply to Kaushal from comment #7)
> I cannot think of other way. We will need to diverge op-versions again.
> 
> The original intent of syncing op-versions across upstream and downstream
> was to allow upstream clients to use downstream volumes. Diverging will
> break this, and I guess we're okay with that.
> 
> But now, we'll need someone to track these changes between upstream and
> downstream, and make sure these changes are done whenever we fork a
> downstream branch from upstream.

Yes, that can be taken care by DOWNSTREAM ONLY tag in the downstream patches.

Comment 9 Poornima G 2017-04-26 09:27:57 UTC

Patch posted at https://code.engineering.redhat.com/gerrit/104403

Comment 11 Vinayak Papnoi 2017-06-21 12:00:58 UTC

Build : 3.8.4-28

Followed the steps mentioned in the description. Umount and remount after server upgrade to 3.3 is working fine. Pumped IOs from the mount point and tried to access the files from the client. Was successfully able to access the files without any hangs.

Hence, moving this bug to verfied.

Comment 13 errata-xmlrpc 2017-09-21 04:37:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.