Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1160233 - [USS] : Rebalance process tries to connect to snapd and in case when snapd crashes it might affect rebalance process
[USS] : Rebalance process tries to connect to snapd and in case when snapd cr...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: snapshot (Show other bugs)
3.0
Unspecified Unspecified
unspecified Severity high
: ---
: RHGS 3.0.3
Assigned To: Avra Sengupta
senaik
USS
: ZStream
Depends On:
Blocks: 1162694 1164711
  Show dependency treegraph
 
Reported: 2014-11-04 06:57 EST by senaik
Modified: 2016-09-17 08:52 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.6.0.35-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1164711 (view as bug list)
Environment:
Last Closed: 2015-01-15 08:41:49 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 13:35:28 EST

  None (edit)
Description senaik 2014-11-04 06:57:25 EST
Description of problem:
======================
If snapd is down/crashed , rebalance process hangs as it is trying to connect to snapd. Checking rebalance status shows it is in progress from a long time as it is trying to connect to snapd which has crashed


Version-Release number of selected component (if applicable):
==============================================================
glusterfs 3.6.0.30

How reproducible:
================
1/1


Steps to Reproduce:
==================
1.Create a 2x2 dist rep volume and start it 

2.Fuse and NFS mount the volume and create some I O

3.While IO is in progress create some snapshots 

4. After snapshots are completed, cd to .snaps and access the snaps resulted in snapd crash (tracked by bz 1160138 )

5. Check rebalance status 
gluster v rebalance vol2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                2      294Bytes          1508             0             0          in progress            6332.00
       snapshot14.lab.eng.blr.redhat.com                0        0Bytes         12838             0             0            completed             157.00
       snapshot15.lab.eng.blr.redhat.com               14         6.6KB          1540             0             0          in progress            6332.00
       snapshot16.lab.eng.blr.redhat.com                0        0Bytes         12828             0             0            completed             120.00
volume rebalance: vol2: success: 

Rebalance process is in progress on 2 nodes and remains in this state as it is trying to connect to snapd which has crashed. 

------------Part if rebalance log-------------

[2014-11-04 10:48:33.215941] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:33.222052] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:34.228429] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:34.234436] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:36.241848] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:36.248168] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:37.253538] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:37.259399] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:39.266618] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:39.272841] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:40.278413] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:40.284449] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:42.290661] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 2-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:42.297403] E [socket.c:2169:socket_connect_finish] 2-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
[2014-11-04 10:48:43.302585] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 6-vol2-snapd-client: changing port to 49179 (from 0)
[2014-11-04 10:48:43.309132] E [socket.c:2169:socket_connect_finish] 6-vol2-snapd-client: connection to 127.0.0.1:49179 failed (Connection refused)
(END) 
-----------------------------------------------------------------

Actual results:
===============
Rebalance process hangs if snapd crashes


Expected results:
================
If snapd crashes rebalance process should not be affected . Rebalance process should not access snapshots , because if snapd crashes rebalance process might hang as it is trying to connect to snapd which has crashed 


Additional info:
Comment 4 Avra Sengupta 2014-12-01 02:18:20 EST
Fixed with https://code.engineering.redhat.com/gerrit/37556
Comment 5 senaik 2014-12-03 07:27:15 EST
Version :glusterfs 3.6.0.35
=======
While rebalance process was in progress, stopped snapd from different servers and rebalance process completed successfully. 

Marking the bug as 'Verified'
Comment 7 errata-xmlrpc 2015-01-15 08:41:49 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html

Note You need to log in before you can comment on or make changes to this bug.