Description of problem: On bigmem hosts, getVdsStats fail due to issue in mom. See additional info Version-Release number of selected component (if applicable): 3.6.1 How reproducible: danken claims 100% Steps to Reproduce: 1. Install and run Vdsm+mom on a bigmem host 2. run vdsClient -s 0 getVdsStats Actual results: Fails Expected results: Works Additional info: vdsm.log trace: jsonrpc.Executor/7::ERROR::2015-12-30 16:35:17,706::__init__::526::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 521, in _serveRequest res = method(**params) File "/usr/share/vdsm/rpc/Bridge.py", line 277, in _dynamicMethod result = fn(*methodArgs) File "/usr/share/vdsm/API.py", line 1384, in getStats stats.update(self._cif.mom.getKsmStats()) File "/usr/share/vdsm/momIF.py", line 68, in getKsmStats stats = self._mom.getStatistics()['host'] File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request verbose=self.__verbose File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python2.7/xmlrpclib.py", line 1306, in single_request return self.parse_response(response) File "/usr/lib64/python2.7/xmlrpclib.py", line 1482, in parse_response return u.close() File "/usr/lib64/python2.7/xmlrpclib.py", line 794, in close raise Fault(**self._stack[0]) Fault: <Fault 1: "<type 'exceptions.OverflowError'>:int exceeds XML-RPC limits"> after logging host_stats on mom/MOMFuncs we se 2015-12-30 16:47:54,796 - mom.RPCServer - INFO - host_stats {'swap_out': 0, 'swap_usage': 0, 'ksmd_cpu_usage': 0, 'anon_pages': 18073368, 'ksm_shareable': 20518684, 'ksm_pages_unshared': 0, 'swap_total': 16383996, 'ksm_pages_sharing': 0, 'cpu_count': 16, 'swap_in': 0, 'ksm_pages_to_scan': 100, 'mem_free': 178111592, 'ksm_merge_across_nodes': 1, 'ksm_pages_volatile': 0, 'mem_available': 197909444, 'ksm_pages_shared': 0, 'ksm_full_scans': 0, 'ksm_run': 0, 'ksm_sleep_millisecs': 20, 'mem_unused': 177132524}
As quick hack to get the host running, I've edited /usr/lib/python2.7/site-packages/mom/MOMFuncs.py def getStatistics(self): self.logger.info("getStatistics()") host_stats = self.threads['host_monitor'].interrogate().statistics[-1] host_stats = dict((k, str(v)) for (k, v) in host_stats.iteritems()) guest_stats = {} guest_entities = self.threads['guest_manager'].interrogate().values() for entity in guest_entities: d = dict((k, str(v) if isinstance(v, int) else v) for (k, v) in entity.statistics[-1].iteritems()) guest_stats[entity.properties['name']] = d ret = {'host': host_stats, 'guests': guest_stats} return ret and /usr/share/vdsm/momIF.py def getKsmStats(self): """ Get information about KSM and convert memory data from page based values to MiB. """ ret = {} try: stats = self._mom.getStatistics()['host'] stats = dict((k, int(v)) for (k, v) in stats.iteritems()) ret['ksmState'] = bool(stats['ksm_run']) ret['ksmPages'] = stats['ksm_pages_to_scan'] ret['ksmMergeAcrossNodes'] = bool(stats['ksm_merge_across_nodes']) ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES ret['memShared'] /= Mbytes ret['ksmCpu'] = stats['ksmd_cpu_usage'] except (AttributeError, socket.error): self.log.warning("MOM not available, KSM stats will be missing.") return ret
I do not see any big enough number in the log. XML-RPC supports signed 32 bit ints - the max value is about two billion (ten digits). We will obviously have to stringify it..
The long integers of comment 0 showed up only after I've added logging to getStatistics().
verified on : Red Hat Enterprise Virtualization Manager Version: 3.6.3-0.1.el6 mom-0.5.2-1.el7ev.noarch the verification run on PPC host with 251 GB memory verification steps : run vdsClient -s 0 getVdsStats , the command worked.
did this work in 3.6.2 ? or was it broken? I think I need this function and I have some hosts with way more ram and I'm currently in the process to deploy 3.6.2. so if this does not work in 3.6.2 I would have to wait for 3.6.3 release. can someone confirm/decline if this bug is present in 3.6.2? Thanks!