Bug 1012393 - [RHSC] - console hangs when clicked on rebalance status
[RHSC] - console hangs when clicked on rebalance status
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc (Show other bugs)
2.1
Unspecified Unspecified
high Severity urgent
: ---
: RHGS 2.1.2
Assigned To: Timothy Asir
RamaKasturi
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-26 07:57 EDT by RamaKasturi
Modified: 2015-05-13 12:33 EDT (History)
10 users (show)

See Also:
Fixed In Version: CB6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-25 02:44:52 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Attaching engine log (725.02 KB, text/x-log)
2013-10-03 06:44 EDT, RamaKasturi
no flags Details
vdsm log from server3 (1.08 MB, text/x-log)
2013-10-03 06:45 EDT, RamaKasturi
no flags Details
Attaching screen shot for the same. (192.22 KB, image/png)
2013-11-12 06:22 EST, RamaKasturi
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 19863 None None None Never

  None (edit)
Description RamaKasturi 2013-09-26 07:57:59 EDT
Description of problem:
Rebalance status dialog hungs.

Version-Release number of selected component (if applicable):
rhsc-2.1.1-0.0.1.master.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Select the volume on which you want to start rebalance.
2. Start rebalance on the volume.
3. immediately click on the rebalance status button.

Actual results:
It opens up Rebalance status dialog and the console hangs.i.e not able to perform any other operation until console is reloaded again.

Expected results:
It should open rebalance status dialog and give the status detail. console should not hang.

Additional info
Comment 2 RamaKasturi 2013-10-03 06:22:40 EDT
Console hungs in the following secnario also.

1) Start rebalance on a volume.
2) click on the status button when rebalance is started.
3) Now close the rebalance status dialog.
4) Open the status dialog again.
5) Console hungs and it always says fetching data.
Comment 3 Kanagaraj 2013-10-03 06:32:13 EDT
The following error log is seen in the engine.log

2013-10-03 21:33:47,026 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) START, GetGlusterVolumeRebalanceStatusVDSCommand(HostName = server3, HostId = 072715ba-7ddb-42bc-9903-0102d43491ea), log id: 31b61312
2013-10-03 21:33:47,201 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) Command GetGlusterVolumeRebalanceStatusVDS execution failed. Exception: VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.OverflowError'>:int exceeds XML-RPC limits
2013-10-03 21:33:47,202 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) FINISH, GetGlusterVolumeRebalanceStatusVDSCommand, log id: 31b61312
2013-10-03 21:33:47,243 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRebalanceStatusQuery] (ajp-/127.0.0.1:8702-5) Query GetGlusterVolumeRebalanceStatusQuery failed. Exception message is VdcBLLExcept
ion: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.OverflowError'>:int exceeds XML-RPC limits (Failed with error VDS_NETWORK_ERROR and code 5022)
Comment 4 RamaKasturi 2013-10-03 06:44:24 EDT
Created attachment 807004 [details]
Attaching engine log
Comment 5 RamaKasturi 2013-10-03 06:45:02 EDT
Created attachment 807005 [details]
vdsm log from server3
Comment 6 Kanagaraj 2013-10-03 06:46:10 EDT
Thread-2866::DEBUG::2013-10-03 21:21:02,400::BindingXMLRPC::974::vds::(wrapper) client [10.70.37.86]::call volumeRebalanceStatus with ('vol_dis',) {}
Thread-2866::DEBUG::2013-10-03 21:21:02,512::BindingXMLRPC::981::vds::(wrapper) return volumeRebalanceStatus with {'status': {'message': 'Done', 'code': 0}, 'hosts': [{'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': 'localhost', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 29}, {'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': '10.70.37.80', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 12}, {'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': '10.70.37.135', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 46}, {'totalSizeMoved': 0, 'status': 'COMPLETED', 'filesSkipped': 0, 'name': '10.70.37.103', 'filesMoved': 0, 'filesFailed': 0, 'filesScanned': 60}], 'summary': {'totalSizeMoved': 3145728000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'filesFailed': 0, 'filesMoved': 3, 'filesScanned': 147}}
Comment 7 Kanagaraj 2013-10-03 07:04:30 EDT
[root@localhost ~]# vdsClient -s 0 glusterVolumeRebalanceStatus volumeName=vol_dis
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsClient.py", line 2538, in <module>
    code, message = commands[command][0](commandArgs)
  File "/usr/share/vdsm/vdsClientGluster.py", line 139, in do_glusterVolumeRebalanceStatus
    status = self.s.glusterVolumeRebalanceStatus(volumeName)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1253, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1392, in _parse_response
    return u.close()
  File "/usr/lib64/python2.6/xmlrpclib.py", line 838, in close
    raise Fault(**self._stack[0])
Fault: <Fault 1: "<type 'exceptions.OverflowError'>:int exceeds XML-RPC limits">
Comment 8 RamaKasturi 2013-10-04 02:43:34 EDT
Checking status on a volume on which rebalance is already completed also makes the status dialog to hung.
Comment 9 Timothy Asir 2013-10-10 07:11:10 EDT
Send a patch to return values as string instead of integer smiler to other existing methods to avoid int overflow issue in XML-RPC layer.
Comment 10 Timothy Asir 2013-10-10 07:13:52 EDT
Patch sent to VDSM url: http://gerrit.ovirt.org/#/c/19863/

Provide rebalance status values as strings to avoid overflow error
when a rebalance status values exceeds the XML-RPC limits
Comment 11 RamaKasturi 2013-10-22 08:49:17 EDT
I am still able to reproduce the above issue.

providing the link for SOSReports here.

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1012393/
Comment 12 Prasanth 2013-10-22 08:57:25 EDT
I'm also hitting the same issue:

--------
2013-10-22 17:53:47,045 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-7) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:47,171 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-6) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:48,259 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-10) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:48,473 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-2) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:50,845 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-14) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:51,132 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-3) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
2013-10-22 17:53:51,477 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-15) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID
--------
Comment 13 Sahina Bose 2013-10-23 07:02:39 EDT
Prasanth, Are you facing this error on Rebalance status?
Comment 14 Prasanth 2013-10-23 09:04:34 EDT
(In reply to Sahina Bose from comment #13)
> Prasanth, Are you facing this error on Rebalance status?

No, in Remove brick Status. Need a different bug to track it or is it due to the same issue?
Comment 15 Timothy Asir 2013-10-25 06:46:08 EDT
As per the current design vdsm calls gluster cli xml command internally to get the rebalance status. But the glusterfs which installed in the (test) node does not provide any (valid) output for the gluster volume rebalance status xml command where as a plain gluster volume rebalance cli provides an appropriate status messages.

Could you please copy the output of the following commands:
* gluster volume rebalance <volumename> status --xml
Comment 16 RamaKasturi 2013-10-25 07:00:42 EDT
[root@localhost ~]# gluster vol rebalance vol_dis status --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
  <opRet>0</opRet>
  <opErrno>115</opErrno>
  <opErrstr/>
  <volRebalance>
    <task-id>6f0edef0-e046-4e97-9f82-bd239d6edd72</task-id>
    <op>3</op>
    <nodeCount>4</nodeCount>
    <node>
      <nodeName>localhost</nodeName>
      <id>2d487988-8371-4553-b0ce-4301eb82f1b7</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>0</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>7.00</runtime>
    </node>
    <node>
      <nodeName>10.70.37.140</nodeName>
      <id>dd0ee43d-7c59-454a-a68e-34328a6d6a98</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>23</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>7.00</runtime>
    </node>
    <node>
      <nodeName>10.70.37.43</nodeName>
      <id>346b0d17-e300-44e0-a50c-8230d10e6113</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>2</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>6.00</runtime>
    </node>
    <node>
      <nodeName>10.70.37.75</nodeName>
      <id>2303e0a6-ccd9-4653-9a0d-030a9ebf10de</id>
      <files>0</files>
      <size>0</size>
      <lookups>2099</lookups>
      <failures>0</failures>
      <skipped>274</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>10.00</runtime>
    </node>
    <aggregate>
      <files>0</files>
      <size>0</size>
      <lookups>8396</lookups>
      <failures>0</failures>
      <skipped>299</skipped>
      <status>3</status>
      <statusStr>completed</statusStr>
      <runtime>10.00</runtime>
    </aggregate>
  </volRebalance>
</cliOutput>
Comment 17 RamaKasturi 2013-10-31 05:59:44 EDT
Could you please tell me what was causing the issue and what is the fix?
Comment 18 Timothy Asir 2013-11-12 03:08:50 EST
Its because, the vdsm received simultaneous calls one after another immediately, resulted in the problem. In order to fix this issue now we display an information like "Could not fetch rebalance status of volume :<volumeName>"
Comment 19 RamaKasturi 2013-11-12 06:22:16 EST
Verified in cb7.(About dialog does not have a title.) Now the status dialog no more hangs and works fine.

When started and clicked on status immdiately, it says could not fetch rebalance status of the volume : <volumeName>.
Comment 20 RamaKasturi 2013-11-12 06:22:49 EST
Created attachment 822867 [details]
Attaching screen shot for the same.
Comment 22 errata-xmlrpc 2014-02-25 02:44:52 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html

Note You need to log in before you can comment on or make changes to this bug.