Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1562704

Summary: httpd pulp wsgi segfaults when exval contains non-ascii characters
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED WONTFIX QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.3.0CC: alexandre.chanu, bkearney, hasuzuki, jortel, jpasqual, phess, pmoravec, ttereshc
Target Milestone: UnspecifiedKeywords: PrioBumpField, Reopened, Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-01 13:05:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
backtrace from reproducer none

Description Pavel Moravec 2018-04-02 07:33:42 UTC
Created attachment 1416109 [details]
backtrace from reproducer

Description of problem:
(noticed at a customer, proper reproducer pending)

When pulp.task queue has a message like:

{
    "data": {
        "consumer_id": "173383a7-170c-46d3-a725-735e3db49519",
        "task_id": "2fd82f7c-def8-43ce-a409-db28ca8dcd79"
    },
    "result": {
        "exval": "Traceback (most recent call last):\n\n  File \"/usr/lib/python2.6/site-packages/gofer/rmi/dispatcher.py\", line 454, in __call__\n    retval = self.method(*self.args, **self.kwargs)\n\n  File \"/usr/lib/gofer/plugins/katelloplugin.py\", line 357, in install\n    report = dispatcher.install(conduit, units, options)\n\n  File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/dispatcher.py\", line 68, in install\n    _report.set_failed(report.LastExceptionDetails())\n\n  File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/report.py\", line 196, in __init__\n    self['message'] = str(inst)\n\nUnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)\n",
        "xargs": [
            "ascii",
            "B\u0142\u0119dy testu transakcji:   file /usr/sbin/redhat_lsb_trigger.x86_64 from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n  file /usr/share/man/man1/lsb_release.1.gz from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n",
            1,
            3,
            "ordinal not in range(128)"
        ],
        "xclass": "UnicodeEncodeError",
        "xmodule": "exceptions",
        "xstate": {
            "trace": "Traceback (most recent call last):\n\n  File \"/usr/lib/python2.6/site-packages/gofer/rmi/dispatcher.py\", line 454, in __call__\n    retval = self.method(*self.args, **self.kwargs)\n\n  File \"/usr/lib/gofer/plugins/katelloplugin.py\", line 357, in install\n    report = dispatcher.install(conduit, units, options)\n\n  File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/dispatcher.py\", line 68, in install\n    _report.set_failed(report.LastExceptionDetails())\n\n  File \"/usr/lib/python2.6/site-packages/pulp/agent/lib/report.py\", line 196, in __init__\n    self['message'] = str(inst)\n\nUnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)\n"
        }
    },
    "routing": [
        null,
        "pulp.task"
    ],
    "sn": "b6a70f3c-0fa4-47a5-aced-34d02bd95bc1",
    "timestamp": "2018-03-19T14:37:03Z",
    "version": "2.0"
}

then pulp wsgi script segfaults with backtrace:

    f=f@entry=Frame 0x7f4a20053300, for file /usr/lib/python2.7/site-packages/gofer/rmi/async.py, line 246, in __unicode__ (self=<Failed(origin=None
    f=f@entry=Frame 0x563adf7a51e0, for file /usr/lib/python2.7/site-packages/gofer/common.py, line 42, in utf8 (thing=<Failed(origin=None
    f=f@entry=Frame 0x7f4a20053140, for file /usr/lib/python2.7/site-packages/gofer/rmi/async.py, line 254, in __str__ (self=<Failed(origin=None
    f=f@entry=Frame 0x563add0ad270, for file /usr/lib64/python2.7/logging/__init__.py, line 328, in getMessage (self=<LogRecord(task_id=None
..

I *think* it is due to:

#4  0x00007f4a45f594e5 in PyObject_Unicode (
    v=exceptions.UnicodeEncodeError('ascii', u'B\u0142\u0119dy testu transakcji:   file /usr/sbin/redhat_lsb_trigger.x86_64 from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n  file /usr/share/man/man1/lsb_release.1.gz from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64\n', 1, 3, 'ordinal not in range(128)')) at /usr/src/debug/Python-2.7.5/Objects/object.c:514

since I *guess* in :

    def __unicode__(self):    # .. of class AsyncReply - used below
        s = list()
        s.append(self.__class__.__name__)
        s.append('  sn : %s' % self.sn)
        s.append('  origin : %s' % self.origin)
        s.append('  timestamp : %s' % self.timestamp)
        s.append('  user data : %s' % self.data)
        return '\n'.join(s)

..

    def __unicode__(self):
        s = list()
        s.append(AsyncReply.__unicode__(self))          # this __unicode__ method does **not** return unicode string(?)
        s.append('  exval: %s' % unicode(self.exval))   # .. so appending unicode after a str fails here



Version-Release number of selected component (if applicable):
python-gofer-2.7.7-3.el7sat.noarch


How reproducible:
100% (once I provide reproducer)


Steps to Reproduce:
n.a. now (shall be "try to install a package with a conflict to a system with localization, such that error string is in non-ascii")


Actual results:
/var/log/httpd/error_log full of:
[Mon Apr 02 09:14:40.627315 2018] [core:notice] [pid 9651] AH00052: child pid 9680 exit signal Segmentation fault (11)

No pulp-related task can run (500 ISE received)

hammer ping fails on pulp and pulp_auth



Expected results:
no segfaults

pulp-related tasks work

hammer ping works



Additional info:
attaching bt from my reproducer (gdb.txt)

Comment 3 Pavel Moravec 2018-04-02 09:02:24 UTC
To QE:
please let me know if you need full reproducer that shall be like:

- have RHEL6 machine with .pl localisation (I expect other localisations will work as well, simply any one that has non-ascii chars in "transaction failed" error string like below)

- have there redhat-lsb-core-4.0-3.el6.x86_64 installed

- via katello-agent, try to install/upgrade there redhat-lsb-core-4.0-7.el6.x86_64

(try that first via yum directly, if you get yum error like:

B\u0142\u0119dy testu transakcji:   file /usr/sbin/redhat_lsb_trigger.x86_64 from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64
  file /usr/share/man/man1/lsb_release.1.gz from install of redhat-lsb-core-4.0-7.el6.x86_64 conflicts with file from package redhat-lsb-4.0-3.el6.x86_64
)

If the above doesnt work as reproducer and a similar is needed, let me ask.

Comment 10 Jeff Ortel 2018-05-18 16:38:08 UTC
Created attachment 1438737 [details]
Small unicode reproducer.

Comment 12 Pavel Moravec 2018-06-23 12:36:44 UTC
I cant reproduce the bug in either way other than artificially copying the journal file content. I dont see a way how to _generate_ such content, what Satellite/pulp action could do so.

I even tried to:
- have Sat6.2
- generate package install request while goferd was down
- upgrade Sat to 6.3 (well this was mimicked by smart copying&updating of the jrnl file content from 6.2 machine to 6.3 one)
- let goferd to read the request there and complain with nonASCII chars

but still no luck :(

Since the underlying case is closed and we dont have a reproducer, I am closing this BZ.

Comment 13 Pavel Moravec 2019-08-02 12:10:32 UTC
(In reply to Pavel Moravec from comment #12)
> I cant reproduce the bug in either way other than artificially copying the
> journal file content. I dont see a way how to _generate_ such content, what
> Satellite/pulp action could do so.
> 
> I even tried to:
> - have Sat6.2
> - generate package install request while goferd was down
> - upgrade Sat to 6.3 (well this was mimicked by smart copying&updating of
> the jrnl file content from 6.2 machine to 6.3 one)
> - let goferd to read the request there and complain with nonASCII chars
> 
> but still no luck :(
> 
> Since the underlying case is closed and we dont have a reproducer, I am
> closing this BZ.

Some such scenario must exists. Now a customer using localization (sic!) did upgrade 6.4 to 6.5 and hit the same (some double-check is pending..)

Comment 14 Pavel Moravec 2019-08-02 20:45:44 UTC
Reopening since it happens again and I did some progress.

Here is the underlying problem: the way how gofer library builds instance of an exception after importing it does not work well (cf gofer/rmi/dispatcher.py and gofer/common.py):

>>> target = "ArithmeticError"
>>> m = __import__("exceptions", fromlist=[target])
>>> T = getattr(m, target)
>>> inst = T.__new__(T)
>>> inst
ArithmeticError()
>>> print(inst)

>>> target = "UnicodeEncodeError"
>>> m = __import__("exceptions", fromlist=[target])
>>> T = getattr(m, target)
>>> inst = T.__new__(T)
>>> inst
UnicodeEncodeError()
>>> print(inst)
Segmentation fault (core dumped)

And even the

inst.__dict__.update(state or {})

isnt sufficient, since state is too short list to fill UnicodeEncodeError requiring 5 arguments:

>>> i = UnicodeEncodeError()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: function takes exactly 5 arguments (0 given)
>>>


So a reproducer:
- enforce goferd to reply with reply.exval = "UnicodeEncodeError"
- rmi exception evaluation and RemoteException class in particular will wrongly fill the exception
- an attempt to call str(..) method for the improperly-built exception will cause the segfault


So the only missing step: force goferd to respond with UnicodeEncodeError . (or any other exception that requires more than 2 params in its constructor)

Comment 15 Pavel Moravec 2019-08-05 07:24:18 UTC
(from built-in exceptions, just these three fail to be "rebuilt" the way how gofer rmi class does:

UnicodeDecodeError
UnicodeEncodeError
UnicodeTranslateError

)

Comment 16 Pavel Moravec 2019-08-05 08:36:10 UTC
OK, finally I have a reliable reproducer:

1) Have Sat6.5 (or .4 or whatever release - any affected)

2) have a RHEL6 registered. Not RHEL7 but RHEL6.

3) remove any "redhat-lsb*" package from the client:
yum remove "redhat-lsb*"

4) try to install redhat-lsb-4.0-3.el6 from command line - just to ensure it fails on dependency:
yum install redhat-lsb-4.0-3.el6 
..
Transaction Check Error:
  file /usr/sbin/redhat_lsb_trigger.x86_64 conflicts between attempted installs of redhat-lsb-4.0-3.el6.x86_64 and redhat-lsb-core-4.0-7.el6.x86_64
  file /usr/share/man/man1/lsb_release.1.gz conflicts between attempted installs of redhat-lsb-4.0-3.el6.x86_64 and redhat-lsb-core-4.0-7.el6.x86_64

5) Try the same via goferd:

hammer -u admin -p mysecretpassword host package install --packages "redhat-lsb-4.0-3.el6" --host rhel6u8-5.gsslab.brq2.redhat.com

6) wait few minutes to get "Error: 500 Internal Server Error"
7) check segfaulting pulp WSGI script on Satellite:

Aug  5 10:26:16 provisioning kernel: httpd[9658]: segfault at 10 ip 00007fba3b99ba8f sp 00007fba175d4c30 error 4 in libpython2.7.so.1.0[7fba3b939000+17e000]

.. and one stuck message in pulp.task queue:
# qpid-stat -b amqps://localhost:5671 --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -q | grep pulp.task
  pulp.task                                                          Y                      1    24     23    2.50k  52.8k    50.3k        0     1
#

(one can open /var/lib/qpidd/.qpidd/qls/jrnl2/pulp.task/*.jrnl, search for latest string "retval" to see the stuck message that WSGI script fails to process)


The reason is, goferd on RHEL6 really follows Content.install from /usr/lib/python2.6/site-packages/katello/agent/goferd/plugin.py:344, that sets exval like explained in older updates.



Jeff, could you reproduce it now?

Comment 17 Pavel Moravec 2019-08-05 08:48:58 UTC
Summary of the bug (well, 2 bugs, in fact):

basically it happens any time when:

- a package install/update/remove happens on RHEL6 via katello-agent
- it fails with any reason (insufficient disk space, dependency missing or similar)
- _and_ the error message contains - due to localisation used - some non-ASCII characters

Then encoding or decoding the UTF error string fails, which raises UnicodeDecodeError or UnicodeEncodeError inside goferd (this happens on RHEL6 only since goferd on RHEL7 follows different call flow path). Goferd then reports this exception to Satellite, and Satellite - trying to re-build the exception - fails to initialize these particular exception classes. So then an attempt to print the class instance (or even show/print the instance as a string) raises unexpected exception to to insufficient instance initialisation, what leads to the pulp WSGI script segfault.

Since the script fails to decode/process the message from goferd, the message still keeps unread. So whenever one restarts httpd or pulp or anything else, the WSGI script attempts to process the same message and segfaults again - until one removes the message like we suggested as a workaround.


Two underlying bugs here:
- goferd processing the UTF error string should not raise an encode/decode error
- WSGI script on Satellite should handle UnicodeDecodeError exceptions from goferd properly

Comment 18 Pavel Moravec 2019-08-05 08:54:28 UTC
(In reply to Pavel Moravec from comment #16)
> OK, finally I have a reliable reproducer:
> 
..
> 5) Try the same via goferd:


This assumes the goferd is run with localised LANG, like:

service goferd stop
LANG=pl_PL goferd -f &

Comment 19 Jeff Ortel 2020-01-20 18:11:19 UTC
Tried to reproduce using:

from gofer.rmi.dispatcher import RemoteException, Return
try:
    u"\u0411".encode("iso-8859-15")
except UnicodeEncodeError, e:
    returned = Return.exception()
    re = RemoteException.instance(returned)
    str(re)

And, with a functional test having a gofer plugin raise UnicodeEncodeError in an RMI method. 

on python 2.4, 2.7 and 3.6.

The gofer mechanism for propagating exceptions is to json marshal/unmarshal the exception. There is no special handling of specific types of exceptions. The most likely explanation that the UnicodeEncodeError is being raised (by YUM) on the content host (agent) with fewer arguments than needed to reconstruct on the server. Yet not sure how. I reviewed the python documentation for this versions mentioned and seems the UnicodeEncodeError has always had 5 arguments.  I no longer have access to a satellite to reproduce that way. Can someone provide me with access to a satellite reproducer?

Comment 22 Tanya Tereshchenko 2020-05-01 13:05:18 UTC
The issue is in the gofer library, Pulp is not a maintainer for it. Feel free to reach out to Jeff Ortel.

Pulp 2 is in maintenance mode and currently accepts only critical/security issues. The main focus is on Pulp 3 and some of the requests will be satisfied in the newer version.We have evaluated this request, and while we recognize that it is a valid request, we do not expect this to be implemented in Pulp 2. As this issue is not relevant for Pulp 3, we are therefore closing this out as WONTFIX.