Bug 1478149

Summary:	Loading module from gi.repository hangs process
Product:	Red Hat Enterprise Linux 7	Reporter:	John Dennis <jdennis>
Component:	pygobject3	Assignee:	Matthew Barnes <mbarnes>
Status:	CLOSED WONTFIX	QA Contact:	Desktop QE <desktop-qa-list>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.4	CC:	afazekas, antoine, apevec, fhubik, jjoyce, lhh, nkinder, nlevinki, rcritten, srevivo
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1475969	Environment:
Last Closed:	2019-10-28 08:44:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1475969, 1478173
Bug Blocks:

Description John Dennis 2017-08-03 18:02:23 UTC

+++ This bug was initially created as a clone of Bug #1475969 +++

Description of problem:

tempest is failing with:
stdout: 2017-07-26 11:46:30.409 13362 INFO tempest [-] Using tempest config file /etc/tempest/tempest.conf
2017-07-26 11:46:30.491 13362 INFO __main__ [-] Reading defaults from file '/home/cloud-user/tempest-dir/etc/default-overrides.conf'
2017-07-26 11:46:30.492 13362 INFO __main__ [-] Adding options from deployer-input file '/home/cloud-user/tempest-dir/etc/deployer-input-juno.conf'
2017-07-26 11:46:30.492 13362 DEBUG __main__ [-] Setting [volume-feature-enabled] bootable = false set tools/config_tempest.py:467
2017-07-26 11:46:30.492 13362 DEBUG __main__ [-] Setting [compute-feature-enabled] preserve_ports = false set tools/config_tempest.py:467
2017-07-26 11:46:30.492 13362 DEBUG __main__ [-] Setting [compute-feature-enabled] live_migrate_paused_instances = false set tools/config_tempest.py:467
2017-07-26 11:46:30.493 13362 DEBUG __main__ [-] Setting [data_processing] catalog_type = data_processing set tools/config_tempest.py:467
2017-07-26 11:46:30.493 13362 DEBUG __main__ [-] Setting [identity] uri = http://192.168.1.13:5000/v2.0/ set tools/config_tempest.py:467
2017-07-26 11:46:30.493 13362 DEBUG __main__ [-] Setting [identity] admin_password = redhat set tools/config_tempest.py:467
2017-07-26 11:46:30.493 13362 DEBUG __main__ [-] Setting [scenario] img_dir = /home/cloud-user/tempest-dir/etc set tools/config_tempest.py:467
2017-07-26 11:46:30.493 13362 DEBUG __main__ [-] Setting [image] http_image = http://rhos-qe-mirror-qeos.usersys.redhat.com/images/cirros-0.3.5-x86_64-uec.tar.gz set tools/config_tempest.py:467
2017-07-26 11:46:30.493 13362 DEBUG __main__ [-] Setting [identity] uri_v3 = http://192.168.1.13:5000/v3/ set tools/config_tempest.py:467
2017-07-26 11:51:34.990 13362 INFO tempest.lib.common.rest_client [-] Request (main): 500 POST http://192.168.1.13:5000/v2.0/tokens
2017-07-26 11:51:34.990 13362 CRITICAL tempest [-] Unhandled error: IdentityError: Got identity error
Details: Unexpected status code 500
2017-07-26 11:51:34.990 13362 ERROR tempest Traceback (most recent call last):
2017-07-26 11:51:34.990 13362 ERROR tempest   File "tools/config_tempest.py", line 842, in <module>
2017-07-26 11:51:34.990 13362 ERROR tempest     main()
2017-07-26 11:51:34.990 13362 ERROR tempest   File "tools/config_tempest.py", line 155, in main
2017-07-26 11:51:34.990 13362 ERROR tempest     clients = ClientManager(conf, not args.non_admin)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "tools/config_tempest.py", line 417, in __init__
2017-07-26 11:51:34.990 13362 ERROR tempest     tenant_name)['id']
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/common/identity.py", line 20, in get_tenant_by_name
2017-07-26 11:51:34.990 13362 ERROR tempest     tenants = client.list_tenants()['tenants']
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/services/identity/v2/json/tenants_client.py", line 56, in list_tenants
2017-07-26 11:51:34.990 13362 ERROR tempest     resp, body = self.get('tenants')
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/common/rest_client.py", line 275, in get
2017-07-26 11:51:34.990 13362 ERROR tempest     return self.request('GET', url, extra_headers, headers)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/common/rest_client.py", line 633, in request
2017-07-26 11:51:34.990 13362 ERROR tempest     headers=headers, body=body)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/common/rest_client.py", line 527, in _request
2017-07-26 11:51:34.990 13362 ERROR tempest     method, url, headers, body, self.filters)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/auth.py", line 187, in auth_request
2017-07-26 11:51:34.990 13362 ERROR tempest     filters, method, url, headers, body)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/auth.py", line 272, in _decorate_request
2017-07-26 11:51:34.990 13362 ERROR tempest     auth_data = self.get_auth()
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/auth.py", line 149, in get_auth
2017-07-26 11:51:34.990 13362 ERROR tempest     self.set_auth()
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/auth.py", line 158, in set_auth
2017-07-26 11:51:34.990 13362 ERROR tempest     self.cache = self._get_auth()
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/auth.py", line 309, in _get_auth
2017-07-26 11:51:34.990 13362 ERROR tempest     token, auth_data = auth_func(**auth_params)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/services/identity/v2/token_client.py", line 104, in get_token
2017-07-26 11:51:34.990 13362 ERROR tempest     body = self.auth(user, password, tenant)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/services/identity/v2/token_client.py", line 54, in auth
2017-07-26 11:51:34.990 13362 ERROR tempest     resp, body = self.post(self.auth_url, body=body)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/common/rest_client.py", line 260, in post
2017-07-26 11:51:34.990 13362 ERROR tempest     return self.request('POST', url, extra_headers, headers, body)
2017-07-26 11:51:34.990 13362 ERROR tempest   File "/home/cloud-user/tempest-dir/tempest/lib/services/identity/v2/token_client.py", line 98, in request
2017-07-26 11:51:34.990 13362 ERROR tempest     'Unexpected status code {0}'.format(resp.status))
2017-07-26 11:51:34.990 13362 ERROR tempest IdentityError: Got identity error
2017-07-26 11:51:34.990 13362 ERROR tempest Details: Unexpected status code 500
2017-07-26 11:51:34.990 13362 ERROR tempest 


Version-Release number of selected component (if applicable):
openstack-keystone-2014.2.4-1.el7ost

How reproducible:
Always

Steps to Reproduce:
1. Deploy OSP 6 with packstack allinone
2. Run tempest

Actual results:
tempest fails with Unexpected status code 500

Expected results:
tempest runs

--- Additional comment from Jason Joyce on 2017-07-27 12:27:06 EDT ---

This is only appearing on RHEL 7.4, RHEL 7.3 seems to be running without error.

--- Additional comment from Attila Fazekas on 2017-07-28 06:46:29 EDT ---

In the above experiment keystone is not responsive,
you can find similar messages in the log:

>  [Fri Jul 28 06:23:36.857715 2017] [core:error] [pid 6813] [client 192.168.1.13:34040] Script timed out before returning headers: admin


No sign of database activity.
Restarting the httpd process does not helps.

On the same system the `stand alone` version of keystone (openstack-keystone) service was responsive. (manually started)

This kind of issue can happen, if some part of the code wants to
use eventlet in the mod_wsgi mode.

keystone likely was responsive when packstack created the endpoints,
some extra config or library added later in the job might be responsible for the issue.

--- Additional comment from Attila Fazekas on 2017-07-28 10:49:11 EDT ---

The keystoneclient used by keystone tries to do something with gnome-keyring when it is installed. The gnome-keyring was pulled by git.

Uninstalling  gnome-keyring  package helps.

The CLI version of the keystone client also throws the following warning:

/usr/lib/python2.7/site-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded.
  from gi.repository import GnomeKeyring


BTW, the service code was in a gi call.
Thread 8 (Thread 0x7f316298b700 (LWP 20239)):
#4 Waiting for a lock (e.g. GIL)
#10 Frame 0x7f315cda1bb0, for file /usr/lib64/python2.7/site-packages/gi/module.py, line 157, in __getattr__ (self=<IntrospectionModule(_namespace='GLib', __path__='/usr/lib64/girepository-1.0/GLib-2.0.typelib', Variant=<StructMeta(new_fixed_array=<gi.FunctionInfo at remote 0x7f315cd81458>, new_array=<gi.FunctionInfo at remote 0x7f315cd81298>, __module__='gi.repository.GLib', new_int64=<gi.FunctionInfo at remote 0x7f315cd815a8>, hash=<gi.FunctionInfo at remote 0x7f315cd81fb8>, get_uint32=<gi.FunctionInfo at remote 0x7f315cd81f10>, new_objv=<gi.FunctionInfo at remote 0x7f315cd81650>, get_uint16=<gi.FunctionInfo at remote 0x7f315cd81ed8>, get_byte=<gi.FunctionInfo at remote 0x7f315cd81ab0>, parse_error_print_context=<gi.FunctionInfo at remote 0x7f315cd8a378>, parse_error_quark=<gi.FunctionInfo at remote 0x7f315cd8a3b0>, new_uint16=<gi.FunctionInfo at remote 0x7f315cd81768>, get_int16=<gi.FunctionInfo at remote 0x7f315cd81c70>, equal=<gi.FunctionInfo at remote 0x7f315cd81a40>, new_signature=<gi.FunctionInfo at remot...(truncated)
    wrapper = enum_register_new_gtype_and_add(info)
#19 Frame 0x7f315cd12fa0, for file /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py, line 489, in <module> ()
    globals()[attr] = getattr(GLib.UserDirectory, 'DIRECTORY_' + n)
#31 Frame 0x7f315cd11b60, for file /usr/lib64/python2.7/importlib/__init__.py, line 37, in import_module (name='gi.overrides.GLib', package=None)
    __import__(name)
#35 Frame 0x7f315cd108f0, for file /usr/lib64/python2.7/site-packages/gi/overrides/__init__.py, line 125, in load_overrides (introspection_module=<IntrospectionModule(_namespace='GLib', __path__='/usr/lib64/girepository-1.0/GLib-2.0.typelib', Variant=<StructMeta(new_fixed_array=<gi.FunctionInfo at remote 0x7f315cd81458>, new_array=<gi.FunctionInfo at remote 0x7f315cd81298>, __module__='gi.repository.GLib', new_int64=<gi.FunctionInfo at remote 0x7f315cd815a8>, hash=<gi.FunctionInfo at remote 0x7f315cd81fb8>, get_uint32=<gi.FunctionInfo at remote 0x7f315cd81f10>, new_objv=<gi.FunctionInfo at remote 0x7f315cd81650>, get_uint16=<gi.FunctionInfo at remote 0x7f315cd81ed8>, get_byte=<gi.FunctionInfo at remote 0x7f315cd81ab0>, parse_error_print_context=<gi.FunctionInfo at remote 0x7f315cd8a378>, parse_error_quark=<gi.FunctionInfo at remote 0x7f315cd8a3b0>, new_uint16=<gi.FunctionInfo at remote 0x7f315cd81768>, get_int16=<gi.FunctionInfo at remote 0x7f315cd81c70>, equal=<gi.FunctionInfo at remote 0x7f315cd81a40>, new_sign...(truncated)
    override_mod = importlib.import_module(override_package_name)
#38 Frame 0x7f315cd403a0, for file /usr/lib64/python2.7/site-packages/gi/importer.py, line 146, in load_module (self=<DynamicImporter(path='gi.repository') at remote 0x7f315cd50b90>, fullname='gi.repository.GLib', path='gi.repository', namespace='GLib', stacklevel=4, introspection_module=<IntrospectionModule(_namespace='GLib', __path__='/usr/lib64/girepository-1.0/GLib-2.0.typelib', Variant=<StructMeta(new_fixed_array=<gi.FunctionInfo at remote 0x7f315cd81458>, new_array=<gi.FunctionInfo at remote 0x7f315cd81298>, __module__='gi.repository.GLib', new_int64=<gi.FunctionInfo at remote 0x7f315cd815a8>, hash=<gi.FunctionInfo at remote 0x7f315cd81fb8>, get_uint32=<gi.FunctionInfo at remote 0x7f315cd81f10>, new_objv=<gi.FunctionInfo at remote 0x7f315cd81650>, get_uint16=<gi.FunctionInfo at remote 0x7f315cd81ed8>, get_byte=<gi.FunctionInfo at remote 0x7f315cd81ab0>, parse_error_print_context=<gi.FunctionInfo at remote 0x7f315cd8a378>, parse_error_quark=<gi.FunctionInfo at remote 0x7f315cd8a3b0>, new_uint16=<gi.FunctionI...(truncated)
    dynamic_module = load_overrides(introspection_module)
#53 Frame 0x7f315cd3e190, for file /usr/lib64/python2.7/importlib/__init__.py, line 37, in import_module (name='gi.repository.GLib', package=None)
    __import__(name)
#57 Frame 0x7f315cd3db20, for file /usr/lib64/python2.7/site-packages/gi/importer.py, line 145, in load_module (self=<DynamicImporter(path='gi.repository') at remote 0x7f315cd50b90>, fullname='gi.repository.GObject', path='gi.repository', namespace='GObject', stacklevel=4, introspection_module=<IntrospectionModule(__name__='gi.repository.GObject', _namespace='GObject', _version='2.0', __path__='/usr/lib64/girepository-1.0/GObject-2.0.typelib') at remote 0x7f315cd50dd0>, dep='GLib-2.0')
    importlib.import_module('gi.repository.' + dep.split("-")[0])
#72 Frame 0x7f315cd35c20, for file /usr/lib64/python2.7/importlib/__init__.py, line 37, in import_module (name='gi.repository.GObject', package=None)
    __import__(name)
#76 Frame 0x7f315cd16090, for file /usr/lib64/python2.7/site-packages/gi/importer.py, line 145, in load_module (self=<DynamicImporter(path='gi.repository') at remote 0x7f315cd50b90>, fullname='gi.repository.GnomeKeyring', path='gi.repository', namespace='GnomeKeyring', stacklevel=4, introspection_module=<IntrospectionModule(__name__='gi.repository.GnomeKeyring', _namespace='GnomeKeyring', _version='1.0', __path__='/usr/lib64/girepository-1.0/GnomeKeyring-1.0.typelib') at remote 0x7f315cd50c50>, dep='GObject-2.0')
    importlib.import_module('gi.repository.' + dep.split("-")[0])
#92 Frame 0x7f315c71d900, for file /usr/lib/python2.7/site-packages/keyring/backends/Gnome.py, line 6, in <module> ()
    from gi.repository import GnomeKeyring
#104 Frame 0x7f315cc67a50, for file /usr/lib64/python2.7/importlib/__init__.py, line 37, in import_module (name='keyring.backends.Gnome', package='keyring.backends', level=1, character='G')
    __import__(name)
#108 Frame 0x7f315cc67860, for file /usr/lib/python2.7/site-packages/keyring/backend.py, line 114, in _load_backend (name='Gnome', package='keyring.backends')
    mod = importlib.import_module('.'+name, package)
#115 Frame 0x7f315cc67660, for file /usr/lib/python2.7/site-packages/keyring/backend.py, line 129, in _load_backends (backends=('file', 'Gnome', 'Google', 'keyczar', 'kwallet', 'multi', 'OS_X', 'pyfs', 'SecretService', 'Windows'))
    list(map(_load_backend, backends))
#119 Frame 0x7f315cc67470, for file /usr/lib/python2.7/site-packages/keyring/backend.py, line 137, in get_all_keyring ()
    _load_backends()
#124 Frame 0x7f315cc60c40, for file /usr/lib/python2.7/site-packages/keyring/util/__init__.py, line 24, in wrapper (args=(), kwargs={})
    func.always_returns = func(*args, **kwargs)
#128 Frame 0x7f315cc0fd80, for file /usr/lib/python2.7/site-packages/keyring/core.py, line 66, in _get_best_keyring ()
    keyrings = backend.get_all_keyring()
#131 Frame 0x7f315cc174d0, for file /usr/lib/python2.7/site-packages/keyring/core.py, line 58, in init_backend ()
    set_keyring(load_config() or _get_best_keyring())
#134 Frame 0x7f315cc08e40, for file /usr/lib/python2.7/site-packages/keyring/core.py, line 158, in <module> ()
    init_backend()
#147 Frame 0x7f315cbfc960, for file /usr/lib/python2.7/site-packages/keyring/__init__.py, line 12, in <module> ()
    from .core import (set_keyring, get_keyring, set_password, get_password,
#162 Frame 0x7f315ca45050, for file /usr/lib/python2.7/site-packages/keystoneclient/httpclient.py, line 40, in <module> ()
    import keyring
#175 Frame 0x7f315cb63b80, for file /usr/lib/python2.7/site-packages/keystoneclient/v2_0/client.py, line 20, in <module> ()
    from keystoneclient import httpclient
#188 Frame 0x7f315cb681d0, for file /usr/lib/python2.7/site-packages/keystoneclient/v2_0/__init__.py, line 2, in <module> ()
    from keystoneclient.v2_0.client import Client
#203 Frame 0x7f315c95a720, for file /usr/lib/python2.7/site-packages/keystoneclient/discover.py, line 21, in <module> ()
    from keystoneclient.v2_0 import client as v2_client
#216 Frame 0x7f315c962850, for file /usr/lib/python2.7/site-packages/keystoneclient/client.py, line 13, in <module> ()
    from keystoneclient import discover
#229 Frame 0x7f315c0df7b0, for file /usr/lib/python2.7/site-packages/keystoneclient/__init__.py, line 26, in <module> ()
    from keystoneclient import client
#244 Frame 0x7f315c945d80, for file /usr/lib/python2.7/site-packages/keystone/models/token_model.py, line 15, in <module> ()
    from keystoneclient.common import cms
#257 Frame 0x7f315c949380, for file /usr/lib/python2.7/site-packages/keystone/common/authorization.py, line 21, in <module> ()
    from keystone.models import token_model
#270 Frame 0x7f315c139300, for file /usr/lib/python2.7/site-packages/keystone/common/controller.py, line 18, in <module> ()
    from keystone.common import authorization
#283 Frame 0x7f315c141ef0, for file /usr/lib/python2.7/site-packages/keystone/assignment/controllers.py, line 26, in <module> ()
    from keystone.common import controller
#296 Frame 0x7f315c0a70a0, for file /usr/lib/python2.7/site-packages/keystone/assignment/__init__.py, line 15, in <module> ()
    from keystone.assignment import controllers  # noqa
#311 Frame 0x7f315c144e10, for file /usr/lib/python2.7/site-packages/keystone/backends.py, line 13, in <module> ()
    from keystone import assignment
#324 Frame 0x7f315c00cbd0, for file /var/www/cgi-bin/keystone/admin, line 28, in <module> ()
    from keystone import backends

Related packages: 

libgnome-keyring-3.12.0-1.el7 
                                     
python-keyring-4.0-1.el7ost.noarch
python-keystoneclient-0.11.1-2.el7ost.noarch
openstack-keystone-2014.2.4-1.el7ost.noarch
python-keystonemiddleware-1.3.2-3.el7ost.noarch
python-keystone-2014.2.4-1.el7ost.noarch

--- Additional comment from Filip Hubík on 2017-07-28 10:54:11 EDT ---

I deployed and stopped before playbooks which perform Tempest configuration and run, means before step of the job:

$ ir-tester tempest -v --inventory hosts -i install.yml -o test.yml --openstack-installer packstack --openstack-version 6 --deployer-input-method local                         --deployer-input-file etc/deployer-input-juno.conf --config-options="image.http_image=http://<mirror>/images/cirros-0.3.5-x86_64-uec.tar.gz" --setup-revision=liberty --tests=sanity

Looks like httpd is sane and restartable there and keystone responds to local and even remote requests fine, even after

$ systemctl restart httpd # keystone still responds as expected.

When I continue with mentioned command performing automation (Tempest stage), keystone commands stop to respond usually, HTTP 500 responses are being returned, "systemctl restart httpd" or "reboot" doesn't help from that point and deployment is in unusable state as described.

--- Additional comment from Filip Hubík on 2017-07-28 10:57:24 EDT ---

Confirmed Atilla's solution workarounded this issue also on my setup (https://bugzilla.redhat.com/show_bug.cgi?id=1475969#c3, packstack, nova, flatdhcp, qpid deployment).

--- Additional comment from Attila Fazekas on 2017-07-28 12:36:18 EDT ---

Using >=keyring-8.0 helps,
the other releases avoids this issue in another way because we never used newer than python-keyring-5.7.1-1.el7ost.noarch .

12/pike is not affected, I am unsure about the others ATM.

--- Additional comment from John Dennis on 2017-07-28 15:41:10 EDT ---

Shouldn't the component for this bug be changed from keystone to gnome-keyring?

--- Additional comment from Jason Joyce on 2017-07-28 16:02:20 EDT ---

I was able to reproduce with OSP 7 using a basic packstack --allinone install.

--- Additional comment from Jason Joyce on 2017-07-28 16:10:26 EDT ---

Changing the component to python-keyring.

--- Additional comment from Attila Fazekas on 2017-07-29 10:15:44 EDT ---

Failed to reproduce the issue with OSP 8 (and 9.) , so just 7 and 6 .

--- Additional comment from Attila Fazekas on 2017-07-29 11:14:36 EDT ---

Using the python-keystoneclient and it's dependencies from 8 can help.

Installing:
 python-monotonic                                 noarch                            0.3-1.el7ost                                     /python-monotonic-0.3-1.el7ost.noarch                                    12 k
Updating:
 python-keystoneclient                            noarch                            1:1.7.2-1.el7ost                                 /python-keystoneclient-1.7.2-1.el7ost.noarch                            3.0 M
 python-oslo-config                               noarch                            2:2.4.0-1.1.el7ost                               /python-oslo-config-2.4.0-1.1.el7ost.noarch                             759 k
 python-oslo-utils                                noarch                            2.8.0-1.el7ost                                   /python-oslo-utils-2.8.0-1.el7ost.noarch                                440 k
 python-stevedore                                 noarch                            1.8.0-1.el7ost                                   /python-stevedore-1.8.0-1.el7ost.noarch                                 221 k


Some of the above changes are not compatible with horizon.

Downgrading to :
python-keystoneclient-1.3.0-3.el7ost.noarch                                                                                                                                                   
python-oslo-utils-1.4.0-2.el7ost.noarch  

from:
python-keystoneclient-1.7.2-1.el7ost.noarch                                                                                                                                                       
python-oslo-utils-2.8.0-1.el7ost.noarch 

reintroduces the issue.

The actual component can be one of these two.

--- Additional comment from Rob Crittenden on 2017-07-31 21:28:55 EDT ---

One thing I noticed is that Gnome was updated to 3.22.0 in RHEL 7.4. Given the problem seems related to loading libgnome-keyring I wonder if that isn't related.

What I'm guessing is that the gnome keyring is trying to prompt for something but apache has closed stdin/stdout/stderr so it sorta freaks out and hangs for something that will never come.

--- Additional comment from Rob Crittenden on 2017-07-31 22:50:36 EDT ---

On my install I narrowed it down to where in python-gobject-base (/usr/lib64/python2.7/site-packages/gi/overrides/GLib.py) it is hanging:

for n in ['ERR', 'HUP', 'IN', 'NVAL', 'OUT', 'PRI']:
    globals()['IO_' + n] = getattr(GLib.IOCondition, n)
    __all__.append('IO_' + n)

I feel like I'm poking in the dark here though. To get this far I annotated the code with a print and ran apache like:

# strace -f -o /tmp/out -s 1024 /usr/sbin/httpd -X -k start -d /etc/httpd -f /etc/httpd/conf/httpd.conf

This keeps Apache single-process and makes it easier to capture output. It gets to the point where it tries getattr(Glib.IOCondition, 'IO_ERR') followed by a ton of:

select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>

and select timeouts until I kill it.

--- Additional comment from Rob Crittenden on 2017-08-01 13:50:39 EDT ---

I made a little progress on a workaround. If you create /var/lib/keystone/.local/share/python_keyring/keyringrc.cfg with the contents:

[backend]
default-keyring=keyring.backends.file.EncryptedKeyring

then keystone will work with SELinux in permissive mode.

In enforcing mode httpd doesn't have read access to this file:

type=AVC msg=audit(1501609484.063:13177): avc:  denied  { open } for  pid=10111 comm="httpd" path="/var/lib/keystone/.local/share/python_keyring/keyringrc.cfg" dev="dm-0" ino=396418 scontext=system_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:keystone_var_lib_t:s0 tclass=file
type=AVC msg=audit(1501609484.063:13177): avc:  denied  { read } for  pid=10111 comm="httpd" name="keyringrc.cfg" dev="dm-0" ino=396418 scontext=system_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:keystone_var_lib_t:s0 tclass=file

A temporary workaround for this is to use chcon to grant access to the file:

# chcon system_u:system_r:httpd_t:s0 /var/lib/keystone/.local/share/python_keyring/keyringrc.cfg

But on the next relabel this is likely to be lost.

--- Additional comment from John Dennis on 2017-08-02 20:52:29 EDT ---

Thanks to Rob Crittenden for doing the heavy lifting of figuring out it's some how related to libgnome-keyring

I did some further debugging on what happens when python-keyring initializes.

It attempts to load the following backends in it's backends directory:

file, Gnome, Google, keyczar, kwallet, multi, OS_X, pyfs, SecretService, Windows

Each backend Keyring class is supposed to implement a priority() method used to sort the backends in priority order in order to select the best backend. If the priority() method raises an exception the backend is eliminated from consideration. 

The backend priority list (even with libgnome-keyring installed) is:

[<keyring.backends.file.EncryptedKeyring, <keyring.backends.file.PlaintextKeyring]

Gnome is skipped because it's priority() checks for the existence of these environment variables: DISPLAY, DBUS_SESSION_BUS_ADDRESS which are not present.

So NOTHING in the gnome backend actually executes in any meaningful way.

Simply commenting out this import in backends/Gnome.py

        from gi.repository import GnomeKeyring

gets rid of the problem.

the gi.repository is a Gnome specific magic mechanism for loading Python modules. I looked at the gnome keyring source code and it's not obvious to me what is being invoked when it's loaded as a Python module, but clearly it's doing something during the module load that hangs the process.

The file /var/lib/keystone/.local/share/python_keyring/keyringrc.cfg is a python-keyring construct, as far as I can tell it has nothing to do with loading the Gnome backend so why creating this file mitigates the hang currently eludes me.

--- Additional comment from Rob Crittenden on 2017-08-03 09:43:35 EDT ---

It works because it skips the discovery process. You'll find where it loads and processes keyringrc.cfg in keyring/core.py::load_config(). If a backend is defined and is loadable then it is used, otherwise _get_best_keyring() is called and all available backends are examined.

--- Additional comment from John Dennis on 2017-08-03 09:55:51 EDT ---

Re comment #16, thanks Rob, of course that makes sense, the discovery process is short-circuited.

--- Additional comment from John Dennis on 2017-08-03 13:44:37 EDT ---

I was finally able to identify the Python call where the hang occurs by instrumenting the WSGI app to log every Python call. Here is the call sequence and the WSGI log messages indicating the process has stalled.

Call to _setup_methods on line 54 of /usr/lib64/python2.7/site-packages/gi/types.py from line 332 of /usr/lib64/python2.7/site-packages/gi/types.py
Call to _VariantCreator on line 87 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py from line 87 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py
Call to __getattr__ on line 135 of /usr/lib64/python2.7/site-packages/gi/module.py from line 90 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py
Call to __init__ on line 323 of /usr/lib64/python2.7/site-packages/gi/types.py from line 222 of /usr/lib64/python2.7/site-packages/gi/module.py
Call to _setup_fields on line 69 of /usr/lib64/python2.7/site-packages/gi/types.py from line 331 of /usr/lib64/python2.7/site-packages/gi/types.py
Call to _setup_methods on line 54 of /usr/lib64/python2.7/site-packages/gi/types.py from line 332 of /usr/lib64/python2.7/site-packages/gi/types.py
Call to Variant on line 229 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py from line 229 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py
Call to __init__ on line 323 of /usr/lib64/python2.7/site-packages/gi/types.py from line 229 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py
Call to deprecated_attr on line 227 of /usr/lib64/python2.7/site-packages/gi/overrides/__init__.py from line 488 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py
Call to __getattr__ on line 135 of /usr/lib64/python2.7/site-packages/gi/module.py from line 489 of /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py
mod_wsgi (pid=9627): Daemon process deadlock timer expired, stopping process 'keystone_main'.
mod_wsgi (pid=9627): Shutdown requested 'keystone_main'.
mod_wsgi (pid=9627): Aborting process 'keystone_main'.

Here is the relevant snippet of code from Glib.py:

# backwards compatible names from old static bindings                                                                                                                                                                                         
for n in ['DESKTOP', 'DOCUMENTS', 'DOWNLOAD', 'MUSIC', 'PICTURES',
          'PUBLIC_SHARE', 'TEMPLATES', 'VIDEOS']:
    attr = 'USER_DIRECTORY_' + n
    deprecated_attr("GLib", attr, "GLib.UserDirectory.DIRECTORY_" + n)
    globals()[attr] = getattr(GLib.UserDirectory, 'DIRECTORY_' + n)
    __all__.append(attr)

line 489 is:

    globals()[attr] = getattr(GLib.UserDirectory, 'DIRECTORY_' + n)

--- Additional comment from John Dennis on 2017-08-03 13:51:38 EDT ---

Here is what I think we should do:

Let's use Rob's workaround of creating the keyringrc.cfg as described in comment #14. That avoids python-keyring's autodiscovery process where it executes this module load:

from gi.repository import GnomeKeyring

That will get us to a working system in the shortest time.

However we really need to fix whatever is wrong in Gnome, to that end I'm going to clone this bug and change the component to one of the Gnome components so that the Gnome devs can start digging into the failure. This way we can keep the two bugs separate as each team works on their solution independently.

Comment 1 John Dennis 2017-08-03 18:25:58 UTC

I need to clarify where the hang occurs, it's in __getattr__ on line 135 of /usr/lib64/python2.7/site-packages/gi/module.py

Comment 2 John Dennis 2017-08-03 20:19:16 UTC

I figure it's worthwhile to capture this in the bug report. I had a heck of a time figuring out where things were hanging. I tried both strace and gdb and neither of those allowed me to pin-point where where the hang was. This is how I instrumented the WSGI script to get the call list:

import sys

def trace_calls(frame, event, arg):
    if event != 'call':
        return
    co = frame.f_code
    func_name = co.co_name
    if func_name == 'write':
        # Ignore write() calls from print statements
	return
    func_line_no = frame.f_lineno
    func_filename = co.co_filename
    caller = frame.f_back
    if caller:
        caller_line_no = caller.f_lineno
        caller_filename = caller.f_code.co_filename
        sys.stdout.write('Call to %s on line %s of %s from line %s of %s\n' % \
        (func_name, func_line_no, func_filename,
         caller_line_no, caller_filename))
    return

sys.settrace(trace_calls)

Comment 3 Dan Callaghan 2017-10-09 05:58:36 UTC

I hit this in my application as well, which was just importing keystoneclient... so the impact of this bug is likely quite large. Anything importing the offending module inside mod_wsgi will hang, it seems.

Btw this might already be obvious to people on the bug but thought I'd mention it...

You can debug the hang fairly easily in gdb. Setting processes=1 for the mod_wsgi worker will help. Then attach to the one and only mod_wsgi worker.

One gotcha is that the mod_wsgi worker still spawns a few different threads (even with threads=1) and the actual application code runs in thread 2 (not the main thread). This might be what was throwing some folks off.

(gdb) info threads
  Id   Target Id         Frame 
  4    Thread 0x7f038bfc5700 (LWP 988) "httpd" 0x00007f0398add7a3 in select () at ../sysdeps/unix/syscall-template.S:81
  3    Thread 0x7f038b7c4700 (LWP 989) "httpd" 0x00007f0398fc2a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5642bbde17e0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  2    Thread 0x7f038afc3700 (LWP 990) "httpd" 0x00007f0398fc2a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5642bbde17e0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
* 1    Thread 0x7f039a4e1880 (LWP 986) "httpd" 0x00007f0398adba3d in poll () at ../sysdeps/unix/syscall-template.S:81

Here thread 1 is wsgi_daemon_main (waiting in poll), thread 3 is wsgi_deadlock_thread (waiting for the GIL), thread 4 is wsgi_monitor_thread (waiting for the GIL). Thread 2 is the one running the actual application. The py-bt command in gdb gives a nice Python-level back trace:

#4 Waiting for a lock (e.g. GIL)
#10 Frame 0x7f036cef6920, for file /usr/lib64/python2.7/site-packages/gi/module.py, line 157, in __getattr__ (self=<IntrospectionModule(_namespace='GLib', __path__='/usr/lib64/girepository-1.0/GLib-2.0.typelib', Variant=<StructMeta(new_fixed_array=<gi.FunctionInfo at remote 0x7f036ceff148>, new_array=<gi.FunctionInfo at remote 0x7f036cebef80>, __module__='gi.repository.GLib', new_int64=<gi.FunctionInfo at remote 0x7f036ceff298>, hash=<gi.FunctionInfo at remote 0x7f036ceffca8>, get_uint32=<gi.FunctionInfo at remote 0x7f036ceffc00>, new_objv=<gi.FunctionInfo at remote 0x7f036ceff340>, get_uint16=<gi.FunctionInfo at remote 0x7f036ceffbc8>, get_byte=<gi.FunctionInfo at remote 0x7f036ceff7a0>, parse_error_print_context=<gi.FunctionInfo at remote 0x7f036cf00068>, parse_error_quark=<gi.FunctionInfo at remote 0x7f036cf000a0>, new_uint16=<gi.FunctionInfo at remote 0x7f036ceff458>, get_int16=<gi.FunctionInfo at remote 0x7f036ceff960>, equal=<gi.FunctionInfo at remote 0x7f036ceff730>, new_signature=<gi.FunctionInfo at remot...(truncated)
    wrapper = enum_register_new_gtype_and_add(info)
#19 Frame 0x7f036ceddf60, for file /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py, line 489, in <module> ()
    globals()[attr] = getattr(GLib.UserDirectory, 'DIRECTORY_' + n)

Very suspiciously, this thread is *also waiting on the GIL* so I wonder if something in the gobject stuff has introduced a deadlock (acquiring the GIL while it already holds it)?

Comment 4 Dan Callaghan 2017-10-09 06:58:10 UTC

Here is a minimal-ish reproducer, which will get you the above backtrace in gdb:

Install mod_wsgi and python-gobject-base (the latter is a transitive dep of subscription-manager so it'll be already installed already).

Create /opt/bz1478149/app.wsgi:

import gi.repository.GLib
def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['It works\n']

Create /etc/httpd/conf.d/bz1478149.conf:

WSGIDaemonProcess bz1478149 display-name=bz1478149 processes=1 threads=1
WSGIScriptAlias /bz1478149 /opt/bz1478149/app.wsgi
<Directory /opt/bz1478149>
    Require all granted
    WSGIProcessGroup bz1478149
</Directory>

Start httpd.

Run: curl http://localhost/bz1478149
It will hang. (If you comment out the import statement in /opt/bz1478149/app.wsgi and restart httpd, you will see it responds immediately.)

Find mod_wsgi worker process using systemctl status httpd:

[...]
   CGroup: /system.slice/httpd.service
           ├─2009 /usr/sbin/httpd -DFOREGROUND
           ├─2010 bz1478149       -DFOREGROUND
           ├─2011 /usr/sbin/httpd -DFOREGROUND
           ├─2012 /usr/sbin/httpd -DFOREGROUND
[...]

Attach gdb: gdb /usr/sbin/httpd 2010

(gdb) info threads
  Id   Target Id         Frame 
  4    Thread 0x7f073df5f700 (LWP 2016) "httpd" 0x00007f074b66d7a3 in select () at ../sysdeps/unix/syscall-template.S:81
  3    Thread 0x7f073d75e700 (LWP 2017) "httpd" 0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  2    Thread 0x7f073cf5d700 (LWP 2018) "httpd" 0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
* 1    Thread 0x7f074d06f880 (LWP 2010) "httpd" 0x00007f074b66ba3d in poll () at ../sysdeps/unix/syscall-template.S:81
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f073cf5d700 (LWP 2018))]
#0  0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
43	      err = lll_futex_wait (futex, expected, private);
(gdb) where
#0  0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
#1  do_futex_wait (sem=sem@entry=0x5648cad71610, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:223
#2  0x00007f074bb52a9f in __new_sem_wait_slow (sem=0x5648cad71610, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292
#3  0x00007f074bb52b3b in __new_sem_wait (sem=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:28
#4  0x00007f073f49d5f5 in PyThread_acquire_lock (lock=lock@entry=0x5648cad71610, waitflag=waitflag@entry=1) at /usr/src/debug/Python-2.7.5/Python/thread_pthread.h:323
#5  0x00007f073f469156 in PyEval_RestoreThread (tstate=tstate@entry=0x7f0738131ec0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:379
#6  0x00007f073f48c358 in PyGILState_Ensure () at /usr/src/debug/Python-2.7.5/Python/pystate.c:611
[...]

Comment 5 Dan Callaghan 2017-10-09 06:59:01 UTC

Since the offending code is in python-gobject-base, I guess the correct component for this bug is pygobject3 not pygobject2.

Comment 6 Dan Callaghan 2017-10-09 07:03:50 UTC

Also I noticed this in the Python docs:

"Note that the PyGILState_*() functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_*() API is unsupported."

https://docs.python.org/2/c-api/init.html#non-python-created-threads

pygenum.c had called into PyGILState_Ensure() when it hangs. And mod_wsgi does indeed create multiple interpreters with the Py_NewInterpreter() API. So that seems like a problem...