Bug 1012393

Summary: [RHSC] - console hangs when clicked on rebalance status
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: rhscAssignee: Timothy Asir <tjeyasin>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.1CC: dpati, dtsang, kmayilsa, knarra, mmahoney, pprakash, rhs-bugs, sabose, ssampat, tjeyasin
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: CB6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-25 07:44:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Attaching engine log
none
vdsm log from server3
none
Attaching screen shot for the same. none

Description RamaKasturi 2013-09-26 11:57:59 UTC
Description of problem:
Rebalance status dialog hungs.

Version-Release number of selected component (if applicable):
rhsc-2.1.1-0.0.1.master.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Select the volume on which you want to start rebalance.
2. Start rebalance on the volume.
3. immediately click on the rebalance status button.

Actual results:
It opens up Rebalance status dialog and the console hangs.i.e not able to perform any other operation until console is reloaded again.

Expected results:
It should open rebalance status dialog and give the status detail. console should not hang.

Additional info

Comment 2 RamaKasturi 2013-10-03 10:22:40 UTC
Console hungs in the following secnario also.

1) Start rebalance on a volume.
2) click on the status button when rebalance is started.
3) Now close the rebalance status dialog.
4) Open the status dialog again.
5) Console hungs and it always says fetching data.

Comment 3 Kanagaraj 2013-10-03 10:32:13 UTC
The following error log is seen in the engine.log

2013-10-03 21:33:47,026 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) START, GetGlusterVolumeRebalanceStatusVDSCommand(HostName = server3, HostId = 072715ba-7ddb-42bc-9903-0102d43491ea), log id: 31b61312
2013-10-03 21:33:47,201 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) Command GetGlusterVolumeRebalanceStatusVDS execution failed. Exception: VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.OverflowError'>:int exceeds XML-RPC limits
2013-10-03 21:33:47,202 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) FINISH, GetGlusterVolumeRebalanceStatusVDSCommand, log id: 31b61312
2013-10-03 21:33:47,243 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRebalanceStatusQuery] (ajp-/127.0.0.1:8702-5) Query GetGlusterVolumeRebalanceStatusQuery failed. Exception message is VdcBLLExcept
ion: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.OverflowError'>:int exceeds XML-RPC limits (Failed with error VDS_NETWORK_ERROR and code 5022)

Comment 4 RamaKasturi 2013-10-03 10:44:24 UTC
Created attachment 807004 [details]
Attaching engine log

Comment 5 RamaKasturi 2013-10-03 10:45:02 UTC
Created attachment 807005 [details]
vdsm log from server3

Comment 6 Kanagaraj 2013-10-03 10:46:10 UTC
Thread-2866::DEBUG::2013-10-03 21:21:02,400::BindingXMLRPC::974::vds::(wrapper) client [10.70.37.86]::call volumeRebalanceStatus with ('vol_dis',) {}
Thread-2866::DEBUG::2013-10-03 21:21:02,512::BindingXMLRPC::981::vds::(wrapper) return volumeRebalanceStatus with {'status': {'message': 'Done', 'code': 0}, 'hosts': [{'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': 'localhost', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 29}, {'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': '10.70.37.80', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 12}, {'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': '10.70.37.135', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 46}, {'totalSizeMoved': 0, 'status': 'COMPLETED', 'filesSkipped': 0, 'name': '10.70.37.103', 'filesMoved': 0, 'filesFailed': 0, 'filesScanned': 60}], 'summary': {'totalSizeMoved': 3145728000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'filesFailed': 0, 'filesMoved': 3, 'filesScanned': 147}}

Comment 7 Kanagaraj 2013-10-03 11:04:30 UTC
[root@localhost ~]# vdsClient -s 0 glusterVolumeRebalanceStatus volumeName=vol_dis
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsClient.py", line 2538, in <module>
    code, message = commands[command][0](commandArgs)
  File "/usr/share/vdsm/vdsClientGluster.py", line 139, in do_glusterVolumeRebalanceStatus
    status = self.s.glusterVolumeRebalanceStatus(volumeName)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1253, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1392, in _parse_response
    return u.close()
  File "/usr/lib64/python2.6/xmlrpclib.py", line 838, in close
    raise Fault(**self._stack[0])
Fault: <Fault 1: "<type 'exceptions.OverflowError'>:int exceeds XML-RPC limits">

Comment 8 RamaKasturi 2013-10-04 06:43:34 UTC
Checking status on a volume on which rebalance is already completed also makes the status dialog to hung.

Comment 9 Timothy Asir 2013-10-10 11:11:10 UTC
Send a patch to return values as string instead of integer smiler to other existing methods to avoid int overflow issue in XML-RPC layer.

Comment 10 Timothy Asir 2013-10-10 11:13:52 UTC
Patch sent to VDSM url: http://gerrit.ovirt.org/#/c/19863/

Provide rebalance status values as strings to avoid overflow error
when a rebalance status values exceeds the XML-RPC limits

Comment 11 RamaKasturi 2013-10-22 12:49:17 UTC
I am still able to reproduce the above issue.

providing the link for SOSReports here.

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1012393/

Comment 12 Prasanth 2013-10-22 12:57:25 UTC
I'm also hitting the same issue:

--------
2013-10-22 17:53:47,045 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-7) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:47,171 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-6) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:48,259 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-10) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:48,473 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-2) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:50,845 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-14) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:51,132 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-3) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:51,477 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-15) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
--------

Comment 13 Sahina Bose 2013-10-23 11:02:39 UTC
Prasanth, Are you facing this error on Rebalance status?

Comment 14 Prasanth 2013-10-23 13:04:34 UTC
(In reply to Sahina Bose from comment #13)
> Prasanth, Are you facing this error on Rebalance status?

No, in Remove brick Status. Need a different bug to track it or is it due to the same issue?

Comment 15 Timothy Asir 2013-10-25 10:46:08 UTC
As per the current design vdsm calls gluster cli xml command internally to get the rebalance status. But the glusterfs which installed in the (test) node does not provide any (valid) output for the gluster volume rebalance status xml command where as a plain gluster volume rebalance cli provides an appropriate status messages.

Could you please copy the output of the following commands:
* gluster volume rebalance <volumename> status --xml

Comment 16 RamaKasturi 2013-10-25 11:00:42 UTC
[root@localhost ~]# gluster vol rebalance vol_dis status --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
  <opRet>0</opRet>
  <opErrno>115</opErrno>
  <opErrstr/>
  <volRebalance>
    <task-id>6f0edef0-e046-4e97-9f82-bd239d6edd72</task-id>
    <op>3</op>
    <nodeCount>4</nodeCount>
    <node>
      <nodeName>localhost</nodeName>
      <id>2d487988-8371-4553-b0ce-4301eb82f1b7</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>0</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>7.00</runtime>
    </node>
    <node>
      <nodeName>10.70.37.140</nodeName>
      <id>dd0ee43d-7c59-454a-a68e-34328a6d6a98</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>23</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>7.00</runtime>
    </node>
    <node>
      <nodeName>10.70.37.43</nodeName>
      <id>346b0d17-e300-44e0-a50c-8230d10e6113</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>2</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>6.00</runtime>
    </node>
    <node>
      <nodeName>10.70.37.75</nodeName>
      <id>2303e0a6-ccd9-4653-9a0d-030a9ebf10de</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>274</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>10.00</runtime>
    </node>
    <aggregate>
      <files>0</files>
      <size>0</size>
      <lookups>8396</lookups>
      <failures>0</failures>
      <skipped>299</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>10.00</runtime>
    </aggregate>
  </volRebalance>
</cliOutput>

Comment 17 RamaKasturi 2013-10-31 09:59:44 UTC
Could you please tell me what was causing the issue and what is the fix?

Comment 18 Timothy Asir 2013-11-12 08:08:50 UTC
Its because, the vdsm received simultaneous calls one after another immediately, resulted in the problem. In order to fix this issue now we display an information like "Could not fetch rebalance status of volume :<volumeName>"

Comment 19 RamaKasturi 2013-11-12 11:22:16 UTC
Verified in cb7.(About dialog does not have a title.) Now the status dialog no more hangs and works fine.

When started and clicked on status immdiately, it says could not fetch rebalance status of the volume : <volumeName>.

Comment 20 RamaKasturi 2013-11-12 11:22:49 UTC
Created attachment 822867 [details]
Attaching screen shot for the same.

Comment 22 errata-xmlrpc 2014-02-25 07:44:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html