Description of problem: jsonrpc pass unicode objects to vdsm apis when it should pass plain strings: 2015-11-13 20:17:15,690 INFO [dispatcher] (jsonrpc.Executor/0) Run and protect: connectStoragePool(spUUID=u'591475db-6fa9-455d-9c05-7f6e30fb06d5', hostID=1, msdUUID=u'6c77adb1-74fc-4fa9-a0ac-3b5a4b789318', masterVersion=3, domainsMap={u'869b6036-a9ca-43eb-9287-828803822783': u'active', u'6267179b-bd65-43c1-8a1e-f33cf12582fa': u'attached', u'e43d9b66-aef1-49d7-bfb4-bc06ddc9f025': u'attached', u'6c77adb1-74fc-4fa9-a0ac-3b5a4b789318': u'active'}, options=None) [logUtils:48(wrapper)] In this example, arguments like spUUID are pass as a unicode object u'591475db-6fa9-455d-9c05-7f6e30fb06d5' instead of the plain string: '591475db-6fa9-455d-9c05-7f6e30fb06d5'. These unicode strings may cause failures when combining or formatting them with non-ascii strings: >>> "ascii" + "\xd7\x90" 'ascii\xd7\x90' >>> u"ascii" + "\xd7\x90" Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal not in range(128) Since u"ascii" is unicode string, Python try to convert "\xd7\x90" assuming the default encoding (ascii), and fails. Another example is logging: >>> import logging >> logging.error("ascii=%s utf8=%s", u"ascii", "\xd7\x90") Traceback (most recent call last): File "/usr/lib64/python2.7/logging/__init__.py", line 859, in emit msg = self.format(record) File "/usr/lib64/python2.7/logging/__init__.py", line 732, in format return fmt.format(record) File "/usr/lib64/python2.7/logging/__init__.py", line 471, in format record.message = record.getMessage() File "/usr/lib64/python2.7/logging/__init__.py", line 335, in getMessage msg = msg % self.args UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal not in range(128) Again, works fine with using string object: >>> logging.error("ascii=%s utf8=%s", "ascii", "\xd7\x90") ERROR:root:ascii=ascii utf8=א The root cause is using the json library from the Python standard library. >>> import json >>> json.loads(json.dumps({"key": unicode("\xd7\x90", 'utf8')})) {u'key': u'\u05d0'} Here the string "key" was converted to u"key". When running on Python 2.6, we used the simplejson library, which does not have this issue: >>> json.loads(json.dumps({"key": unicode("\xd7\x90", 'utf8')})) {'key': u'\u05d0'} Note that "key" is a plain string as it should be. We do use simplejson when it is available: # lib/vdsm/compat.py try: # on RHEL/Centos 6.x, the JSON module in the python standard # library does not include significant speedups: # stdlib is based on simplejson 1.9, speedups were added on 2.0.9. # In general, speedups are first found on the # simplejson package. import simplejson as json json # make pyflakes happy except ImportError: # no big deal, fallback to standard libary import json json # yep, this is needed twice. But we do not require simplejson since this patch: commit fb1c44af320fbeb99f1cba406eae70e653b6de2f Author: Yaniv Bronhaim <ybronhei> Date: Sun Jun 14 14:03:26 2015 +0300 Remove el6 specific code from spec Change-Id: Ifef95b7c61b8bd8c86ef2642e74275da87659cb8 Signed-off-by: Yaniv Bronhaim <ybronhei> Reviewed-on: https://gerrit.ovirt.org/42317 Continuous-Integration: Jenkins CI Reviewed-by: Dima Kuznetsov <dkuznets> Reviewed-by: Dan Kenigsberg <danken> Since the library is not available, we use fallback to builtin json. Version-Release number of selected component (if applicable): Since 3.5, when running on EL7 or Fedora (both use Python 2.7) How reproducible: Always
Setting severity to high since this can fail some flows when non-ascii values are sent from engine. Changing component to core, since jsonrpc is not the root cause, it just happens when using it instead of xmlrpc.
Nir, can I assign to you? Should it be on post?
Taken
(In reply to Nir Soffer from comment #3) > Taken Thanks. Highly appreciated!
(In reply to Nir Soffer from comment #1) > Setting severity to high since this can fail some flows when non-ascii > values are sent from engine. This seems the underlying bug: Engine sending unadorned binary data as strings violates jsonrpc spec. If we take the suggested patch as a stop-gap measure to avoid a 3.6 regression, we should include a logging mechanism to catch such cases and fix them in Engine.
(In reply to Dan Kenigsberg from comment #5) > (In reply to Nir Soffer from comment #1) > > Setting severity to high since this can fail some flows when non-ascii > > values are sent from engine. > > This seems the underlying bug: Engine sending unadorned binary data as > strings violates jsonrpc spec. JSON string is unicode [1] - can you show where the jsonrpc spec [2] forbids unicode strings? Also vdsm schema does not mention anything about unicode values. If we forbid them it should be documented. > If we take the suggested patch as a stop-gap > measure to avoid a 3.6 regression, we should include a logging mechanism > to catch such cases and fix them in Engine. Warning about unicode values is expensive, requires recursive search in arguments dict in jsonrpc, and manual checking of the argument in xmlrpc. We can enhance the password protection code to warn about unicode values. But I think we better spend time on useful features and handle this in the schema. [1] http://www.json.org/ [2] http://www.jsonrpc.org/specification
We are investigating a different direction, that will allow using the built-in json library safely, and be more compatible with Python 3.
String to use the json built-in library expose this issue, but it is not the root cause. The root cause is mixing unicode and utf8 encoded strings, while python default encoding in Python 2 is ascii. Each time a string is encoded using str() or decoded using unicode() or x.decode() without specifying the encoding, python default to 'ascii'. It seems that changing the default encoding is a quick fix avoiding these issues, until vdsm is fixed to never mix unicode and encoded strings.
Missing backport to 3.6 branch, moving to post
Nir - what's the status of this one?
(In reply to Oved Ourfali from comment #11) > Nir - what's the status of this one? Fixed in master. This fix is required for supporting non-ascii vm name, but libvirt does does not support it yet. See https://gerrit.ovirt.org/48570 Do you want to backport it?
Okay. Moving to 3.6.3, hoping this will get into libvirt by then. If you can backport the merged patch that would be great.
(In reply to Oved Ourfali from comment #13) > Okay. Moving to 3.6.3, hoping this will get into libvirt by then. If bug 1285720 is resolved, this fix is good enough to allow vms with non-acsii names. > If you can backport the merged patch that would be great. Patch is ready for merging to 3.6.
Verified with: vdsm-4.17.18-0.el7ev.noarch From vdsm log: Thread-47::INFO::2016-01-21 13:04:06,489::logUtils::48::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='00000001-0001-0001-0001-0000000001a8', hostID=1, msdUUID='ad2d4bfe-5683-4d40-84e1-80522e030890', masterVersion=1, domainsMap={'ad2d4bfe-5683-4d40-84e1-80522e030890': 'active'}, options=None)