Bug 1012393
Summary: | [RHSC] - console hangs when clicked on rebalance status | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> | ||||||||
Component: | rhsc | Assignee: | Timothy Asir <tjeyasin> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 2.1 | CC: | dpati, dtsang, kmayilsa, knarra, mmahoney, pprakash, rhs-bugs, sabose, ssampat, tjeyasin | ||||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||||
Target Release: | RHGS 2.1.2 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | CB6 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-02-25 07:44:52 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
RamaKasturi
2013-09-26 11:57:59 UTC
Console hungs in the following secnario also. 1) Start rebalance on a volume. 2) click on the status button when rebalance is started. 3) Now close the rebalance status dialog. 4) Open the status dialog again. 5) Console hungs and it always says fetching data. The following error log is seen in the engine.log 2013-10-03 21:33:47,026 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) START, GetGlusterVolumeRebalanceStatusVDSCommand(HostName = server3, HostId = 072715ba-7ddb-42bc-9903-0102d43491ea), log id: 31b61312 2013-10-03 21:33:47,201 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) Command GetGlusterVolumeRebalanceStatusVDS execution failed. Exception: VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.OverflowError'>:int exceeds XML-RPC limits 2013-10-03 21:33:47,202 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeRebalanceStatusVDSCommand] (ajp-/127.0.0.1:8702-5) FINISH, GetGlusterVolumeRebalanceStatusVDSCommand, log id: 31b61312 2013-10-03 21:33:47,243 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRebalanceStatusQuery] (ajp-/127.0.0.1:8702-5) Query GetGlusterVolumeRebalanceStatusQuery failed. Exception message is VdcBLLExcept ion: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.OverflowError'>:int exceeds XML-RPC limits (Failed with error VDS_NETWORK_ERROR and code 5022) Created attachment 807004 [details]
Attaching engine log
Created attachment 807005 [details]
vdsm log from server3
Thread-2866::DEBUG::2013-10-03 21:21:02,400::BindingXMLRPC::974::vds::(wrapper) client [10.70.37.86]::call volumeRebalanceStatus with ('vol_dis',) {} Thread-2866::DEBUG::2013-10-03 21:21:02,512::BindingXMLRPC::981::vds::(wrapper) return volumeRebalanceStatus with {'status': {'message': 'Done', 'code': 0}, 'hosts': [{'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': 'localhost', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 29}, {'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': '10.70.37.80', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 12}, {'totalSizeMoved': 1048576000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'name': '10.70.37.135', 'filesMoved': 1, 'filesFailed': 0, 'filesScanned': 46}, {'totalSizeMoved': 0, 'status': 'COMPLETED', 'filesSkipped': 0, 'name': '10.70.37.103', 'filesMoved': 0, 'filesFailed': 0, 'filesScanned': 60}], 'summary': {'totalSizeMoved': 3145728000, 'status': 'IN PROGRESS', 'filesSkipped': 0, 'filesFailed': 0, 'filesMoved': 3, 'filesScanned': 147}} [root@localhost ~]# vdsClient -s 0 glusterVolumeRebalanceStatus volumeName=vol_dis Traceback (most recent call last): File "/usr/share/vdsm/vdsClient.py", line 2538, in <module> code, message = commands[command][0](commandArgs) File "/usr/share/vdsm/vdsClientGluster.py", line 139, in do_glusterVolumeRebalanceStatus status = self.s.glusterVolumeRebalanceStatus(volumeName) File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request verbose=self.__verbose File "/usr/lib64/python2.6/xmlrpclib.py", line 1253, in request return self._parse_response(h.getfile(), sock) File "/usr/lib64/python2.6/xmlrpclib.py", line 1392, in _parse_response return u.close() File "/usr/lib64/python2.6/xmlrpclib.py", line 838, in close raise Fault(**self._stack[0]) Fault: <Fault 1: "<type 'exceptions.OverflowError'>:int exceeds XML-RPC limits"> Checking status on a volume on which rebalance is already completed also makes the status dialog to hung. Send a patch to return values as string instead of integer smiler to other existing methods to avoid int overflow issue in XML-RPC layer. Patch sent to VDSM url: http://gerrit.ovirt.org/#/c/19863/ Provide rebalance status values as strings to avoid overflow error when a rebalance status values exceeds the XML-RPC limits I am still able to reproduce the above issue. providing the link for SOSReports here. http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1012393/ I'm also hitting the same issue: -------- 2013-10-22 17:53:47,045 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-7) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID 2013-10-22 17:53:47,171 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-6) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID 2013-10-22 17:53:48,259 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-10) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID 2013-10-22 17:53:48,473 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-2) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID 2013-10-22 17:53:50,845 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-14) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID 2013-10-22 17:53:51,132 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-3) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID 2013-10-22 17:53:51,477 ERROR [org.ovirt.engine.core.bll.gluster.GetGlusterVolumeRemoveBricksStatusQuery] (ajp-/127.0.0.1:8702-15) Query GetGlusterVolumeRemoveBricksStatusQuery failed. Exception message is GLUSTER_VOLUME_ID_INVALID -------- Prasanth, Are you facing this error on Rebalance status? (In reply to Sahina Bose from comment #13) > Prasanth, Are you facing this error on Rebalance status? No, in Remove brick Status. Need a different bug to track it or is it due to the same issue? As per the current design vdsm calls gluster cli xml command internally to get the rebalance status. But the glusterfs which installed in the (test) node does not provide any (valid) output for the gluster volume rebalance status xml command where as a plain gluster volume rebalance cli provides an appropriate status messages. Could you please copy the output of the following commands: * gluster volume rebalance <volumename> status --xml [root@localhost ~]# gluster vol rebalance vol_dis status --xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <opRet>0</opRet> <opErrno>115</opErrno> <opErrstr/> <volRebalance> <task-id>6f0edef0-e046-4e97-9f82-bd239d6edd72</task-id> <op>3</op> <nodeCount>4</nodeCount> <node> <nodeName>localhost</nodeName> <id>2d487988-8371-4553-b0ce-4301eb82f1b7</id> <files>0</files> <size>0</size> <lookups>2099</lookups> <failures>0</failures> <skipped>0</skipped> <status>3</status> <statusStr>completed</statusStr> <runtime>7.00</runtime> </node> <node> <nodeName>10.70.37.140</nodeName> <id>dd0ee43d-7c59-454a-a68e-34328a6d6a98</id> <files>0</files> <size>0</size> <lookups>2099</lookups> <failures>0</failures> <skipped>23</skipped> <status>3</status> <statusStr>completed</statusStr> <runtime>7.00</runtime> </node> <node> <nodeName>10.70.37.43</nodeName> <id>346b0d17-e300-44e0-a50c-8230d10e6113</id> <files>0</files> <size>0</size> <lookups>2099</lookups> <failures>0</failures> <skipped>2</skipped> <status>3</status> <statusStr>completed</statusStr> <runtime>6.00</runtime> </node> <node> <nodeName>10.70.37.75</nodeName> <id>2303e0a6-ccd9-4653-9a0d-030a9ebf10de</id> <files>0</files> <size>0</size> <lookups>2099</lookups> <failures>0</failures> <skipped>274</skipped> <status>3</status> <statusStr>completed</statusStr> <runtime>10.00</runtime> </node> <aggregate> <files>0</files> <size>0</size> <lookups>8396</lookups> <failures>0</failures> <skipped>299</skipped> <status>3</status> <statusStr>completed</statusStr> <runtime>10.00</runtime> </aggregate> </volRebalance> </cliOutput> Could you please tell me what was causing the issue and what is the fix? Its because, the vdsm received simultaneous calls one after another immediately, resulted in the problem. In order to fix this issue now we display an information like "Could not fetch rebalance status of volume :<volumeName>" Verified in cb7.(About dialog does not have a title.) Now the status dialog no more hangs and works fine. When started and clicked on status immdiately, it says could not fetch rebalance status of the volume : <volumeName>. Created attachment 822867 [details]
Attaching screen shot for the same.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |