Bug 1294833 - XMLRPC API of mom breaks on host with 193270 MiB ram [NEEDINFO]
XMLRPC API of mom breaks on host with 193270 MiB ram
Status: CLOSED CURRENTRELEASE
Product: mom
Classification: oVirt
Component: Core (Show other bugs)
0.5.1
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.3
: 0.5.2
Assigned To: Martin Sivák
Shira Maximov
: Regression
Depends On:
Blocks: 1302001
  Show dependency treegraph
 
Reported: 2015-12-30 10:00 EST by Oved Ourfali
Modified: 2016-02-18 06:12 EST (History)
11 users (show)

See Also:
Fixed In Version: mom-0.5.2-1
Doc Type: Bug Fix
Doc Text:
Cause: VDSM uses XML-RPC to communicate with MoM in oVirt 3.6. XML-RPC only supports int32 for numbers. Consequence: Big enough amount of memory overflows the int32 type and XML-RPC reports an error. Fix: MoM was configured to use i8 XML-RPC extension for transfering big numbers. Result: VDSM can properly retrieve statistics from MoM.
Story Points: ---
Clone Of:
: 1302001 (view as bug list)
Environment:
Last Closed: 2016-02-18 06:12:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
s.kieske: needinfo? (msivak)
rule-engine: ovirt‑3.6.z+
rule-engine: blocker+
mgoldboi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 51396 master MERGED Convert big numbers to i8 type in the XML-RPC interface 2016-01-25 10:38 EST

  None (edit)
Description Oved Ourfali 2015-12-30 10:00:33 EST
Description of problem:
On bigmem hosts, getVdsStats fail due to issue in mom.
See additional info

Version-Release number of selected component (if applicable):
3.6.1

How reproducible:
danken claims 100%

Steps to Reproduce:
1. Install and run Vdsm+mom on a bigmem host
2. run vdsClient -s 0 getVdsStats


Actual results:
Fails

Expected results:
Works

Additional info:

vdsm.log trace:
jsonrpc.Executor/7::ERROR::2015-12-30 16:35:17,706::__init__::526::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 521, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 277, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1384, in getStats
    stats.update(self._cif.mom.getKsmStats())
  File "/usr/share/vdsm/momIF.py", line 68, in getKsmStats
    stats = self._mom.getStatistics()['host']
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1306, in single_request
    return self.parse_response(response)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1482, in parse_response
    return u.close()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 794, in close
    raise Fault(**self._stack[0])
Fault: <Fault 1: "<type 'exceptions.OverflowError'>:int exceeds XML-RPC limits">


after logging host_stats on mom/MOMFuncs we se

2015-12-30 16:47:54,796 - mom.RPCServer - INFO - host_stats {'swap_out': 0, 'swap_usage': 0, 'ksmd_cpu_usage': 0, 'anon_pages': 18073368, 'ksm_shareable': 20518684, 'ksm_pages_unshared': 0, 'swap_total': 16383996, 'ksm_pages_sharing': 0, 'cpu_count': 16, 'swap_in': 0, 'ksm_pages_to_scan': 100, 'mem_free': 178111592, 'ksm_merge_across_nodes': 1, 'ksm_pages_volatile': 0, 'mem_available': 197909444, 'ksm_pages_shared': 0, 'ksm_full_scans': 0, 'ksm_run': 0, 'ksm_sleep_millisecs': 20, 'mem_unused': 177132524}
Comment 2 Dan Kenigsberg 2016-01-03 10:50:58 EST
As quick hack to get the host running, I've edited /usr/lib/python2.7/site-packages/mom/MOMFuncs.py

    def getStatistics(self):
        self.logger.info("getStatistics()")
        host_stats = self.threads['host_monitor'].interrogate().statistics[-1]
        host_stats = dict((k, str(v)) for (k, v) in host_stats.iteritems())
        guest_stats = {}
        guest_entities = self.threads['guest_manager'].interrogate().values()
        for entity in guest_entities:
            d = dict((k, str(v) if isinstance(v, int) else v) for (k, v) in entity.statistics[-1].iteritems())
            guest_stats[entity.properties['name']] = d
        ret = {'host': host_stats, 'guests': guest_stats}
        return ret

and /usr/share/vdsm/momIF.py

    def getKsmStats(self):
        """
        Get information about KSM and convert memory data from page
        based values to MiB.
        """

        ret = {}

        try:
            stats = self._mom.getStatistics()['host']
            stats = dict((k, int(v)) for (k, v) in stats.iteritems())
            ret['ksmState'] = bool(stats['ksm_run'])
            ret['ksmPages'] = stats['ksm_pages_to_scan']
            ret['ksmMergeAcrossNodes'] = bool(stats['ksm_merge_across_nodes'])
            ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES
            ret['memShared'] /= Mbytes
            ret['ksmCpu'] = stats['ksmd_cpu_usage']
        except (AttributeError, socket.error):
            self.log.warning("MOM not available, KSM stats will be missing.")

        return ret
Comment 3 Martin Sivák 2016-01-04 06:38:56 EST
I do not see any big enough number in the log. XML-RPC supports signed 32 bit ints - the max value is about two billion (ten digits).

We will obviously have to stringify it..
Comment 4 Dan Kenigsberg 2016-01-04 09:07:13 EST
The long integers of comment 0 showed up only after I've added logging to getStatistics().
Comment 5 Shira Maximov 2016-02-07 10:07:34 EST
verified on : 
Red Hat Enterprise Virtualization Manager Version: 3.6.3-0.1.el6 
mom-0.5.2-1.el7ev.noarch

the verification run on PPC host with  251 GB memory 

verification steps : 
run vdsClient -s 0 getVdsStats , the command worked.
Comment 6 Sven Kieske 2016-02-11 09:30:57 EST
did this work in 3.6.2 ? or was it broken? I think I need this function and I have some hosts with way more ram and I'm currently in the process to deploy 3.6.2.

so if this does not work in 3.6.2 I would have to wait for 3.6.3 release.

can someone confirm/decline if this bug is present in 3.6.2?

Thanks!

Note You need to log in before you can comment on or make changes to this bug.