1015394 – [RHSC] - Status dialog hangs when glusterd goes down in one of the node while rebalance happens.

Bug 1015394 - [RHSC] - Status dialog hangs when glusterd goes down in one of the node while rebalance happens.

Summary: [RHSC] - Status dialog hangs when glusterd goes down in one of the node whil...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhsc
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 2.1.2
Assignee:	anmol babu
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:	1028325 1036564
Blocks:
TreeView+	depends on / blocked

Reported:	2013-10-04 06:05 UTC by RamaKasturi
Modified:	2015-05-13 16:28 UTC (History)
CC List:	8 users (show)
Fixed In Version:	cb10
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-02-25 07:47:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Attaching engine log (15.72 MB, text/x-log) 2013-10-04 06:05 UTC, RamaKasturi	no flags	Details
Attaching vdsm log (10.24 MB, text/x-log) 2013-10-04 06:11 UTC, RamaKasturi	no flags	Details
Attaching vdsm node2 log (12.51 MB, text/x-log) 2013-10-04 06:13 UTC, RamaKasturi	no flags	Details
Attaching vdsm node3 log (10.94 MB, text/x-log) 2013-10-04 06:17 UTC, RamaKasturi	no flags	Details
Attaching vdsm node4 log (11.14 MB, text/x-log) 2013-10-04 06:23 UTC, RamaKasturi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:0208	0	normal	SHIPPED_LIVE	Red Hat Storage 2.1 enhancement and bug fix update #2	2014-02-25 12:20:30 UTC

Description RamaKasturi 2013-10-04 06:05:49 UTC

Created attachment 807426 [details]
Attaching engine log

Description of problem:
Status dialog hangs when glusterd  goes down in one of the node while rebalance happens.

Version-Release number of selected component (if applicable):
rhsc-2.1.1-0.0.2.master.el6ev.noarch

How reproducible:
Always.

Steps to Reproduce:
1. Create a distributed volume and start it.
2. Start rebalance on the volume.
3. Now go to one of the server and stop glusterd.
4. Now click on the status button.

Actual results:
Status dialog opens up and it says fetching data. Console hangs until it is reloaded.

Expected results:
status dialog should open up and it should display the output same as CLI.

Additional info:

Comment 2 RamaKasturi 2013-10-04 06:11:53 UTC

Created attachment 807427 [details]
Attaching vdsm log

Comment 3 RamaKasturi 2013-10-04 06:13:34 UTC

Created attachment 807428 [details]
Attaching vdsm node2 log

Comment 4 RamaKasturi 2013-10-04 06:17:26 UTC

Created attachment 807429 [details]
Attaching vdsm node3 log

Comment 5 RamaKasturi 2013-10-04 06:23:53 UTC

Created attachment 807430 [details]
Attaching vdsm node4 log

Comment 6 RamaKasturi 2013-10-04 06:28:45 UTC

1)Even after bringing up the glusterd on the node from where it was stopped 
rebalance still continues to run even if it is completed from the gluster CLI. 

2) When user stops rebalance on the volume and clicks on status , the status dialog still hangs.

Comment 7 anmol babu 2013-10-08 09:12:18 UTC

Thread-3227::DEBUG::2013-10-03 21:33:46,583::BindingXMLRPC::981::vds::(wrapper) return volumeRebalanceStatus with {'status': {'message': 'Done', 'code': 0}, 'hosts': [{'totalSizeMoved': 6291456000, 'status': 'COMPLETED', 'filesSkipped': 0, 'name': 'localhost', 'filesMoved': 6, 'filesFailed': 0, 'filesScanned': 69}, {'totalSizeMoved': 3145728000, 'status': 'COMPLETED', 'filesSkipped': 0, 'name': '10.70.37.80', 'filesMoved': 3, 'filesFailed': 0, 'filesScanned': 63}, {'totalSizeMoved': 7340032000, 'status': 'COMPLETED', 'filesSkipped': 0, 'name': '10.70.37.135', 'filesMoved': 7, 'filesFailed': 0, 'filesScanned': 76}, {'totalSizeMoved': 0, 'status': 'COMPLETED', 'filesSkipped': 0, 'name': '10.70.37.103', 'filesMoved': 0, 'filesFailed': 0, 'filesScanned': 60}], 'summary': {'totalSizeMoved': 16777216000, 'status': 'COMPLETED', 'filesSkipped': 0, 'filesFailed': 0, 'filesMoved': 16, 'filesScanned': 268}}

the above statements suggest from the vdsm log suggest that the file size is greater than an int can support.So, its an overflow error.

So,can you please check the size of the data on the bricks

Comment 8 RamaKasturi 2013-10-08 09:21:52 UTC

The data on the mount point was around 60GB.

Comment 9 anmol babu 2013-10-09 10:34:51 UTC

Resolved in CB3 build

Comment 10 RamaKasturi 2013-10-17 09:55:45 UTC

1) status dialog still hangs when glusterd goes down in one of the node. 

2) Once the glusterd is up, rebalance icon and tasks pane gets updated as completed but clicking on the status shows that "No rebalance ever happened on this volume".

3) I am able to see the follwoing error in engine logs.

2013-10-17 20:20:52,180 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRebalanceStatusQuery] (ajp-/127.0.0.1:8702-6) Query GetGlusterVolumeRebalanceStatusQuery failed. Exception message is VdcBLLException: Command execution failed.

Please find the sos reports in the below link.

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1015394/

Comment 11 anmol babu 2013-11-04 15:51:06 UTC

The above exception : 

"Query GetGlusterVolumeRebalanceStatusQuery failed. Exception message is VdcBLLException: Command execution failed."

indicates that for some reason the query is not going through.And hence,the current build cb6 handles such failures with a message stating "Could not fetch data".

If this is not the expected behaviour,please do let me know about the expected behaviour.

Comment 12 anmol babu 2013-11-04 15:51:22 UTC

The above exception : 

"Query GetGlusterVolumeRebalanceStatusQuery failed. Exception message is VdcBLLException: Command execution failed."

indicates that for some reason the query is not going through.And hence,the current build cb6 handles such failures with a message stating "Could not fetch data".

If this is not the expected behaviour,please do let me know about the expected behaviour.

Comment 13 anmol babu 2013-11-08 06:45:37 UTC

Seems like this is a glusterfs bug.

[root@localhost ~]# gluster volume rebalance vol_dis status

                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes           150             0             0            completed               1.00
                            10.70.37.155                0        0Bytes           150             0             0            completed               1.00
                            10.70.37.155                0        0Bytes           150             0             0            completed               1.00
                             10.70.37.95                1      1000.0MB           150             0             0            completed              26.00
volume rebalance: vol_dis: success: 


[root@localhost ~]# gluster volume rebalance vol_dis status --xml



     Bcoz,as can be seen above there is no status xml returned from glusterfs



[root@localhost ~]# echo $?
2

Comment 14 RamaKasturi 2013-12-10 13:11:34 UTC

works fine with cb10 and glusterfs build (glusterfs-server-3.4.0.47.1u2rhs-1.el6rhs.x86_64) and works fine.

When glusterd goes down in any of the node in cluster, status dialog does not hang and able to get the status.

Will reopen the bug if it happens to see it again.

Comment 16 errata-xmlrpc 2014-02-25 07:47:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html

Note You need to log in before you can comment on or make changes to this bug.