Bug 1160233 - [USS] : Rebalance process tries to connect to snapd and in case when snapd crashes it might affect rebalance process
Summary: [USS] : Rebalance process tries to connect to snapd and in case when snapd cr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.0.3
Assignee: Avra Sengupta
QA Contact: senaik
URL:
Whiteboard: USS
Depends On:
Blocks: 1162694 1164711
TreeView+ depends on / blocked
 
Reported: 2014-11-04 11:57 UTC by senaik
Modified: 2016-09-17 12:52 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.6.0.35-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1164711 (view as bug list)
Environment:
Last Closed: 2015-01-15 13:41:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 0 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 18:35:28 UTC

Description senaik 2014-11-04 11:57:25 UTC
Description of problem:
======================
If snapd is down/crashed , rebalance process hangs as it is trying to connect to snapd. Checking rebalance status shows it is in progress from a long time as it is trying to connect to snapd which has crashed


Version-Release number of selected component (if applicable):
==============================================================
glusterfs 3.6.0.30

How reproducible:
================
1/1


Steps to Reproduce:
==================
1.Create a 2x2 dist rep volume and start it 

2.Fuse and NFS mount the volume and create some I O

3.While IO is in progress create some snapshots 

4. After snapshots are completed, cd to .snaps and access the snaps resulted in snapd crash (tracked by bz 1160138 )

5. Check rebalance status 
gluster v rebalance vol2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                2      294Bytes          1508             0             0          in progress            6332.00
       snapshot14.lab.eng.blr.redhat.com                0        0Bytes         12838             0             0            completed             157.00
       snapshot15.lab.eng.blr.redhat.com               14         6.6KB          1540             0             0          in progress            6332.00
       snapshot16.lab.eng.blr.redhat.com                0        0Bytes         12828             0             0            completed             120.00
volume rebalance: vol2: success: 

Rebalance process is in progress on 2 nodes and remains in this state as it is trying to connect to snapd which has crashed. 

------------Part if rebalance log-------------

[2014-11-04 10:48:33.215941] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:33.222052] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:34.228429] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:34.234436] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:36.241848] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:36.248168] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:37.253538] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:37.259399] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:39.266618] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:39.272841] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:40.278413] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:40.284449] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:42.290661] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:42.297403] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:43.302585] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:43.309132] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
(END) 
-----------------------------------------------------------------

Actual results:
===============
Rebalance process hangs if snapd crashes


Expected results:
================
If snapd crashes rebalance process should not be affected . Rebalance process should not access snapshots , because if snapd crashes rebalance process might hang as it is trying to connect to snapd which has crashed 


Additional info:

Comment 4 Avra Sengupta 2014-12-01 07:18:20 UTC
Fixed with https://code.engineering.redhat.com/gerrit/37556

Comment 5 senaik 2014-12-03 12:27:15 UTC
Version :glusterfs 3.6.0.35
=======
While rebalance process was in progress, stopped snapd from different servers and rebalance process completed successfully. 

Marking the bug as 'Verified'

Comment 7 errata-xmlrpc 2015-01-15 13:41:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html


Note You need to log in before you can comment on or make changes to this bug.