Bug 1562704
| Summary: | httpd pulp wsgi segfaults when exval contains non-ascii characters | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | ||||
| Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Katello QA List <katello-qa-list> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 6.3.0 | CC: | alexandre.chanu, bkearney, hasuzuki, jortel, jpasqual, phess, pmoravec, ttereshc | ||||
| Target Milestone: | Unspecified | Keywords: | PrioBumpField, Reopened, Triaged | ||||
| Target Release: | Unused | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-05-01 13:05:18 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
To QE: please let me know if you need full reproducer that shall be like: - have RHEL6 machine with .pl localisation (I expect other localisations will work as well, simply any one that has non-ascii chars in "transaction failed" error string like below) - have there redhat-lsb-core-4.0-3.el6.x86_64 installed - via katello-agent, try to install/upgrade there redhat-lsb-core-4.0-7.el6.x86_64 (try that first via yum directly, if you get yum error like: B\u0142\u0119dy testu transakcji: file /usr/sbin/redhat_lsb_trigger.x86_64 from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64 file /usr/share/man/man1/lsb_release.1.gz from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64 ) If the above doesnt work as reproducer and a similar is needed, let me ask. Created attachment 1438737 [details]
Small unicode reproducer.
I cant reproduce the bug in either way other than artificially copying the journal file content. I dont see a way how to _generate_ such content, what Satellite/pulp action could do so. I even tried to: - have Sat6.2 - generate package install request while goferd was down - upgrade Sat to 6.3 (well this was mimicked by smart copying&updating of the jrnl file content from 6.2 machine to 6.3 one) - let goferd to read the request there and complain with nonASCII chars but still no luck :( Since the underlying case is closed and we dont have a reproducer, I am closing this BZ. (In reply to Pavel Moravec from comment #12) > I cant reproduce the bug in either way other than artificially copying the > journal file content. I dont see a way how to _generate_ such content, what > Satellite/pulp action could do so. > > I even tried to: > - have Sat6.2 > - generate package install request while goferd was down > - upgrade Sat to 6.3 (well this was mimicked by smart copying&updating of > the jrnl file content from 6.2 machine to 6.3 one) > - let goferd to read the request there and complain with nonASCII chars > > but still no luck :( > > Since the underlying case is closed and we dont have a reproducer, I am > closing this BZ. Some such scenario must exists. Now a customer using localization (sic!) did upgrade 6.4 to 6.5 and hit the same (some double-check is pending..) Reopening since it happens again and I did some progress. Here is the underlying problem: the way how gofer library builds instance of an exception after importing it does not work well (cf gofer/rmi/dispatcher.py and gofer/common.py): >>> target = "ArithmeticError" >>> m = __import__("exceptions", fromlist=[target]) >>> T = getattr(m, target) >>> inst = T.__new__(T) >>> inst ArithmeticError() >>> print(inst) >>> target = "UnicodeEncodeError" >>> m = __import__("exceptions", fromlist=[target]) >>> T = getattr(m, target) >>> inst = T.__new__(T) >>> inst UnicodeEncodeError() >>> print(inst) Segmentation fault (core dumped) And even the inst.__dict__.update(state or {}) isnt sufficient, since state is too short list to fill UnicodeEncodeError requiring 5 arguments: >>> i = UnicodeEncodeError() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: function takes exactly 5 arguments (0 given) >>> So a reproducer: - enforce goferd to reply with reply.exval = "UnicodeEncodeError" - rmi exception evaluation and RemoteException class in particular will wrongly fill the exception - an attempt to call str(..) method for the improperly-built exception will cause the segfault So the only missing step: force goferd to respond with UnicodeEncodeError . (or any other exception that requires more than 2 params in its constructor) (from built-in exceptions, just these three fail to be "rebuilt" the way how gofer rmi class does: UnicodeDecodeError UnicodeEncodeError UnicodeTranslateError ) OK, finally I have a reliable reproducer: 1) Have Sat6.5 (or .4 or whatever release - any affected) 2) have a RHEL6 registered. Not RHEL7 but RHEL6. 3) remove any "redhat-lsb*" package from the client: yum remove "redhat-lsb*" 4) try to install redhat-lsb-4.0-3.el6 from command line - just to ensure it fails on dependency: yum install redhat-lsb-4.0-3.el6 .. Transaction Check Error: file /usr/sbin/redhat_lsb_trigger.x86_64 conflicts between attempted installs of redhat-lsb-4.0-3.el6.x86_64 and redhat-lsb-core-4.0-7.el6.x86_64 file /usr/share/man/man1/lsb_release.1.gz conflicts between attempted installs of redhat-lsb-4.0-3.el6.x86_64 and redhat-lsb-core-4.0-7.el6.x86_64 5) Try the same via goferd: hammer -u admin -p mysecretpassword host package install --packages "redhat-lsb-4.0-3.el6" --host rhel6u8-5.gsslab.brq2.redhat.com 6) wait few minutes to get "Error: 500 Internal Server Error" 7) check segfaulting pulp WSGI script on Satellite: Aug 5 10:26:16 provisioning kernel: httpd[9658]: segfault at 10 ip 00007fba3b99ba8f sp 00007fba175d4c30 error 4 in libpython2.7.so.1.0[7fba3b939000+17e000] .. and one stuck message in pulp.task queue: # qpid-stat -b amqps://localhost:5671 --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -q | grep pulp.task pulp.task Y 1 24 23 2.50k 52.8k 50.3k 0 1 # (one can open /var/lib/qpidd/.qpidd/qls/jrnl2/pulp.task/*.jrnl, search for latest string "retval" to see the stuck message that WSGI script fails to process) The reason is, goferd on RHEL6 really follows Content.install from /usr/lib/python2.6/site-packages/katello/agent/goferd/plugin.py:344, that sets exval like explained in older updates. Jeff, could you reproduce it now? Summary of the bug (well, 2 bugs, in fact): basically it happens any time when: - a package install/update/remove happens on RHEL6 via katello-agent - it fails with any reason (insufficient disk space, dependency missing or similar) - _and_ the error message contains - due to localisation used - some non-ASCII characters Then encoding or decoding the UTF error string fails, which raises UnicodeDecodeError or UnicodeEncodeError inside goferd (this happens on RHEL6 only since goferd on RHEL7 follows different call flow path). Goferd then reports this exception to Satellite, and Satellite - trying to re-build the exception - fails to initialize these particular exception classes. So then an attempt to print the class instance (or even show/print the instance as a string) raises unexpected exception to to insufficient instance initialisation, what leads to the pulp WSGI script segfault. Since the script fails to decode/process the message from goferd, the message still keeps unread. So whenever one restarts httpd or pulp or anything else, the WSGI script attempts to process the same message and segfaults again - until one removes the message like we suggested as a workaround. Two underlying bugs here: - goferd processing the UTF error string should not raise an encode/decode error - WSGI script on Satellite should handle UnicodeDecodeError exceptions from goferd properly (In reply to Pavel Moravec from comment #16) > OK, finally I have a reliable reproducer: > .. > 5) Try the same via goferd: This assumes the goferd is run with localised LANG, like: service goferd stop LANG=pl_PL goferd -f & Tried to reproduce using:
from gofer.rmi.dispatcher import RemoteException, Return
try:
u"\u0411".encode("iso-8859-15")
except UnicodeEncodeError, e:
returned = Return.exception()
re = RemoteException.instance(returned)
str(re)
And, with a functional test having a gofer plugin raise UnicodeEncodeError in an RMI method.
on python 2.4, 2.7 and 3.6.
The gofer mechanism for propagating exceptions is to json marshal/unmarshal the exception. There is no special handling of specific types of exceptions. The most likely explanation that the UnicodeEncodeError is being raised (by YUM) on the content host (agent) with fewer arguments than needed to reconstruct on the server. Yet not sure how. I reviewed the python documentation for this versions mentioned and seems the UnicodeEncodeError has always had 5 arguments. I no longer have access to a satellite to reproduce that way. Can someone provide me with access to a satellite reproducer?
The issue is in the gofer library, Pulp is not a maintainer for it. Feel free to reach out to Jeff Ortel. Pulp 2 is in maintenance mode and currently accepts only critical/security issues. The main focus is on Pulp 3 and some of the requests will be satisfied in the newer version.We have evaluated this request, and while we recognize that it is a valid request, we do not expect this to be implemented in Pulp 2. As this issue is not relevant for Pulp 3, we are therefore closing this out as WONTFIX. |
Created attachment 1416109 [details] backtrace from reproducer Description of problem: (noticed at a customer, proper reproducer pending) When pulp.task queue has a message like: { "data": { "consumer_id": "173383a7-170c-46d3-a725-735e3db49519", "task_id": "2fd82f7c-def8-43ce-a409-db28ca8dcd79" }, "result": { "exval": "Traceback (most recent call last):\n\n File \"/usr/lib/python2.6/site-packages/gofer/rmi/dispatcher.py\", line 454, in __call__\n retval = self.method(*self.args, **self.kwargs)\n\n File \"/usr/lib/gofer/plugins/katelloplugin.py\", line 357, in install\n report = dispatcher.install(conduit, units, options)\n\n File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/dispatcher.py\", line 68, in install\n _report.set_failed(report.LastExceptionDetails())\n\n File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/report.py\", line 196, in __init__\n self['message'] = str(inst)\n\nUnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)\n", "xargs": [ "ascii", "B\u0142\u0119dy testu transakcji: file /usr/sbin/redhat_lsb_trigger.x86_64 from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n file /usr/share/man/man1/lsb_release.1.gz from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n", 1, 3, "ordinal not in range(128)" ], "xclass": "UnicodeEncodeError", "xmodule": "exceptions", "xstate": { "trace": "Traceback (most recent call last):\n\n File \"/usr/lib/python2.6/site-packages/gofer/rmi/dispatcher.py\", line 454, in __call__\n retval = self.method(*self.args, **self.kwargs)\n\n File \"/usr/lib/gofer/plugins/katelloplugin.py\", line 357, in install\n report = dispatcher.install(conduit, units, options)\n\n File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/dispatcher.py\", line 68, in install\n _report.set_failed(report.LastExceptionDetails())\n\n File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/report.py\", line 196, in __init__\n self['message'] = str(inst)\n\nUnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)\n" } }, "routing": [ null, "pulp.task" ], "sn": "b6a70f3c-0fa4-47a5-aced-34d02bd95bc1", "timestamp": "2018-03-19T14:37:03Z", "version": "2.0" } then pulp wsgi script segfaults with backtrace: f=f@entry=Frame 0x7f4a20053300, for file /usr/lib/python2.7/site-packages/gofer/rmi/async.py, line 246, in __unicode__ (self=<Failed(origin=None f=f@entry=Frame 0x563adf7a51e0, for file /usr/lib/python2.7/site-packages/gofer/common.py, line 42, in utf8 (thing=<Failed(origin=None f=f@entry=Frame 0x7f4a20053140, for file /usr/lib/python2.7/site-packages/gofer/rmi/async.py, line 254, in __str__ (self=<Failed(origin=None f=f@entry=Frame 0x563add0ad270, for file /usr/lib64/python2.7/logging/__init__.py, line 328, in getMessage (self=<LogRecord(task_id=None .. I *think* it is due to: #4 0x00007f4a45f594e5 in PyObject_Unicode ( v=exceptions.UnicodeEncodeError('ascii', u'B\u0142\u0119dy testu transakcji: file /usr/sbin/redhat_lsb_trigger.x86_64 from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n file /usr/share/man/man1/lsb_release.1.gz from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n', 1, 3, 'ordinal not in range(128)')) at /usr/src/debug/Python-2.7.5/Objects/object.c:514 since I *guess* in : def __unicode__(self): # .. of class AsyncReply - used below s = list() s.append(self.__class__.__name__) s.append(' sn : %s' % self.sn) s.append(' origin : %s' % self.origin) s.append(' timestamp : %s' % self.timestamp) s.append(' user data : %s' % self.data) return '\n'.join(s) .. def __unicode__(self): s = list() s.append(AsyncReply.__unicode__(self)) # this __unicode__ method does **not** return unicode string(?) s.append(' exval: %s' % unicode(self.exval)) # .. so appending unicode after a str fails here Version-Release number of selected component (if applicable): python-gofer-2.7.7-3.el7sat.noarch How reproducible: 100% (once I provide reproducer) Steps to Reproduce: n.a. now (shall be "try to install a package with a conflict to a system with localization, such that error string is in non-ascii") Actual results: /var/log/httpd/error_log full of: [Mon Apr 02 09:14:40.627315 2018] [core:notice] [pid 9651] AH00052: child pid 9680 exit signal Segmentation fault (11) No pulp-related task can run (500 ISE received) hammer ping fails on pulp and pulp_auth Expected results: no segfaults pulp-related tasks work hammer ping works Additional info: attaching bt from my reproducer (gdb.txt)