1084925 – [Upgrade]: while RHS upgrade for geo-rep setup from RHS-2.1.1 to RHS-3.0, IO on client failed with OSError: [Errno 116] Stale file handle.

Bug 1084925 - [Upgrade]: while RHS upgrade for geo-rep setup from RHS-2.1.1 to RHS-3.0, IO on client failed with OSError: [Errno 116] Stale file handle.

Summary: [Upgrade]: while RHS upgrade for geo-rep setup from RHS-2.1.1 to RHS-3.0, IO ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-07 09:54 UTC by Vijaykumar Koppad
Modified:	2015-12-01 12:39 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-12-01 12:39:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport of the all one of the master node, where monitor died after upgrade (15.45 MB, application/x-xz) 2014-06-18 09:53 UTC, Vijaykumar Koppad	no flags	Details
View All

Description Vijaykumar Koppad 2014-04-07 09:54:47 UTC

Description of problem: while RHS upgrade for geo-rep setup from RHS-2.1.1 to RHS-3.0 (glusterfs-3.5qa2-0.304.git0c1d78f.el6rhs.x86_64.rpm), IO on old(rhs-2.1.1[glusterfs-3.4.0.44rhs])client failed with OSError: [Errno 116] Stale file handle: '5341784d%%LXY2Y1W7YB'

snippet from client log-file

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-04-06 16:09:20.219221] I [client-handshake.c:1676:select_server_supported_programs] 0-master-client-8: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-04-06 16:09:20.219499] I [client-handshake.c:1676:select_server_supported_programs] 0-master-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-04-06 16:09:20.229233] I [client-handshake.c:1474:client_setvolume_cbk] 0-master-client-8: Connected to 10.70.43.0:49154, attached to remote volume '/bricks/master_brick9'.
[2014-04-06 16:09:20.229261] I [client-handshake.c:1486:client_setvolume_cbk] 0-master-client-8: Server and Client lk-version numbers are not same, reopening the fds
[2014-04-06 16:09:20.229510] I [client-handshake.c:1676:select_server_supported_programs] 1-master-client-8: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-04-06 16:09:20.229832] I [client-handshake.c:1676:select_server_supported_programs] 1-master-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-04-06 16:09:20.230548] I [client-handshake.c:1474:client_setvolume_cbk] 1-master-client-0: Connected to 10.70.43.0:49152, attached to remote volume '/bricks/master_brick1'.
[2014-04-06 16:09:20.230573] I [client-handshake.c:1486:client_setvolume_cbk] 1-master-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-04-06 16:09:20.238699] I [client-handshake.c:450:client_set_lk_version_cbk] 0-master-client-8: Server lk version = 1
[2014-04-06 16:09:20.239253] I [client-handshake.c:1474:client_setvolume_cbk] 1-master-client-8: Connected to 10.70.43.0:49154, attached to remote volume '/bricks/master_brick9'.
[2014-04-06 16:09:20.239380] I [client-handshake.c:1486:client_setvolume_cbk] 1-master-client-8: Server and Client lk-version numbers are not same, reopening the fds
[2014-04-06 16:09:20.240839] I [client-handshake.c:450:client_set_lk_version_cbk] 1-master-client-0: Server lk version = 1
[2014-04-06 16:09:20.241869] I [client-handshake.c:1474:client_setvolume_cbk] 0-master-client-4: Connected to 10.70.43.0:49153, attached to remote volume '/bricks/master_brick5'.
[2014-04-06 16:09:20.241895] I [client-handshake.c:1486:client_setvolume_cbk] 0-master-client-4: Server and Client lk-version numbers are not same, reopening the fds
[2014-04-06 16:09:20.242415] I [client-handshake.c:1676:select_server_supported_programs] 1-master-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-04-06 16:09:20.243037] I [client-handshake.c:450:client_set_lk_version_cbk] 0-master-client-4: Server lk version = 1
[2014-04-06 16:09:20.243222] I [client-handshake.c:1474:client_setvolume_cbk] 1-master-client-4: Connected to 10.70.43.0:49153, attached to remote volume '/bricks/master_brick5'.
[2014-04-06 16:09:20.243245] I [client-handshake.c:1486:client_setvolume_cbk] 1-master-client-4: Server and Client lk-version numbers are not same, reopening the fds
[2014-04-06 16:09:20.244121] I [client-handshake.c:450:client_set_lk_version_cbk] 1-master-client-4: Server lk version = 1
[2014-04-06 16:09:20.250322] I [client-handshake.c:450:client_set_lk_version_cbk] 1-master-client-8: Server lk version = 1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Version-Release number of selected component (if applicable): glusterfs-3.5qa2-0.304.git0c1d78f.el6rhs.x86_64.rpm


How reproducible: didn't try to reproduce the issue. 


Steps to Reproduce:

Setup:

1.Create master and slave cluster with RHS-2.1.1 

2. Create and start a gep-rep relationship between master and slave (6x2)

3. Keep creating data on master.

Actions:

Steps to upgrade master or slave node:

===============================================================================================

1. Kill all the gsync and gluster processes. You can use the following commands to do the same.

ps -aef | grep gluster | grep gsync | awk '{print $2}' | xargs kill -9

pkill glusterfsd

pkill glusterfs

pkill glusterd

2. Upgrade the node with RHS-3.0

3. Reboot the node

reboot

 

Repeat the above processes with each slave and master.  The order of upgrading them should follow the  following convention.

a. Start with upgrading any slave.

b. During the upgrade process the status shows that session in one of the master node has gone faulty. That node where it went faulty is the next node to be upgraded.

c. Next one to be upgraded is the replica pair of the slave which just got upgraded. Make sure that both the backend bricks are in sync during the upgrade of this node. If the bricks are not in sync at the time of upgrade, you may enter into split brain situation.

d. See step b (Upgrade the master node which went to faulty)


Actual results: While upgrade there was IO failure on client  


Expected results: While rolling upgrade, there shouldn't be any IO failure on the client. 


Additional info:

Comment 2 Vijaykumar Koppad 2014-06-18 09:53:01 UTC

Created attachment 909917 [details]
sosreport of the all one of the master node, where monitor died after upgrade

Note You need to log in before you can comment on or make changes to this bug.