1160233 – [USS] : Rebalance process tries to connect to snapd and in case when snapd crashes it might affect rebalance process

Bug 1160233 - [USS] : Rebalance process tries to connect to snapd and in case when snapd crashes it might affect rebalance process

Summary: [USS] : Rebalance process tries to connect to snapd and in case when snapd cr...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.3
Assignee:	Avra Sengupta
QA Contact:	senaik
Docs Contact:
URL:
Whiteboard:	USS
Depends On:
Blocks:	1162694 1164711
TreeView+	depends on / blocked

Reported:	2014-11-04 11:57 UTC by senaik
Modified:	2016-09-17 12:52 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.6.0.35-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1164711 (view as bug list)
Environment:
Last Closed:	2015-01-15 13:41:49 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0038	0	normal	SHIPPED_LIVE	Red Hat Storage 3.0 enhancement and bug fix update #3	2015-01-15 18:35:28 UTC

Description senaik 2014-11-04 11:57:25 UTC

Description of problem:
======================
If snapd is down/crashed , rebalance process hangs as it is trying to connect to snapd. Checking rebalance status shows it is in progress from a long time as it is trying to connect to snapd which has crashed


Version-Release number of selected component (if applicable):
==============================================================
glusterfs 3.6.0.30

How reproducible:
================
1/1


Steps to Reproduce:
==================
1.Create a 2x2 dist rep volume and start it 

2.Fuse and NFS mount the volume and create some I O

3.While IO is in progress create some snapshots 

4. After snapshots are completed, cd to .snaps and access the snaps resulted in snapd crash (tracked by bz 1160138 )

5. Check rebalance status 
gluster v rebalance vol2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                2      294Bytes          1508             0             0          in progress            6332.00
       snapshot14.lab.eng.blr.redhat.com                0        0Bytes         12838             0             0            completed             157.00
       snapshot15.lab.eng.blr.redhat.com               14         6.6KB          1540             0             0          in progress            6332.00
       snapshot16.lab.eng.blr.redhat.com                0        0Bytes         12828             0             0            completed             120.00
volume rebalance: vol2: success: 

Rebalance process is in progress on 2 nodes and remains in this state as it is trying to connect to snapd which has crashed. 

------------Part if rebalance log-------------

[2014-11-04 10:48:33.215941] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:33.222052] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:34.228429] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:34.234436] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:36.241848] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:36.248168] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:37.253538] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:37.259399] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:39.266618] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:39.272841] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:40.278413] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:40.284449] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:42.290661] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:42.297403] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:43.302585] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:43.309132] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
(END) 
-----------------------------------------------------------------

Actual results:
===============
Rebalance process hangs if snapd crashes


Expected results:
================
If snapd crashes rebalance process should not be affected . Rebalance process should not access snapshots , because if snapd crashes rebalance process might hang as it is trying to connect to snapd which has crashed 


Additional info:

Comment 4 Avra Sengupta 2014-12-01 07:18:20 UTC

Fixed with https://code.engineering.redhat.com/gerrit/37556

Comment 5 senaik 2014-12-03 12:27:15 UTC

Version :glusterfs 3.6.0.35
=======
While rebalance process was in progress, stopped snapd from different servers and rebalance process completed successfully. 

Marking the bug as 'Verified'

Comment 7 errata-xmlrpc 2015-01-15 13:41:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html

Note You need to log in before you can comment on or make changes to this bug.