Bug 1478149
Summary: | Loading module from gi.repository hangs process | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | John Dennis <jdennis> |
Component: | pygobject3 | Assignee: | Matthew Barnes <mbarnes> |
Status: | CLOSED WONTFIX | QA Contact: | Desktop QE <desktop-qa-list> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.4 | CC: | afazekas, antoine, apevec, fhubik, jjoyce, lhh, nkinder, nlevinki, rcritten, srevivo |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1475969 | Environment: | |
Last Closed: | 2019-10-28 08:44:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1475969, 1478173 | ||
Bug Blocks: |
Description
John Dennis
2017-08-03 18:02:23 UTC
I need to clarify where the hang occurs, it's in __getattr__ on line 135 of /usr/lib64/python2.7/site-packages/gi/module.py I figure it's worthwhile to capture this in the bug report. I had a heck of a time figuring out where things were hanging. I tried both strace and gdb and neither of those allowed me to pin-point where where the hang was. This is how I instrumented the WSGI script to get the call list: import sys def trace_calls(frame, event, arg): if event != 'call': return co = frame.f_code func_name = co.co_name if func_name == 'write': # Ignore write() calls from print statements return func_line_no = frame.f_lineno func_filename = co.co_filename caller = frame.f_back if caller: caller_line_no = caller.f_lineno caller_filename = caller.f_code.co_filename sys.stdout.write('Call to %s on line %s of %s from line %s of %s\n' % \ (func_name, func_line_no, func_filename, caller_line_no, caller_filename)) return sys.settrace(trace_calls) I hit this in my application as well, which was just importing keystoneclient... so the impact of this bug is likely quite large. Anything importing the offending module inside mod_wsgi will hang, it seems. Btw this might already be obvious to people on the bug but thought I'd mention it... You can debug the hang fairly easily in gdb. Setting processes=1 for the mod_wsgi worker will help. Then attach to the one and only mod_wsgi worker. One gotcha is that the mod_wsgi worker still spawns a few different threads (even with threads=1) and the actual application code runs in thread 2 (not the main thread). This might be what was throwing some folks off. (gdb) info threads Id Target Id Frame 4 Thread 0x7f038bfc5700 (LWP 988) "httpd" 0x00007f0398add7a3 in select () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7f038b7c4700 (LWP 989) "httpd" 0x00007f0398fc2a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5642bbde17e0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 2 Thread 0x7f038afc3700 (LWP 990) "httpd" 0x00007f0398fc2a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5642bbde17e0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 * 1 Thread 0x7f039a4e1880 (LWP 986) "httpd" 0x00007f0398adba3d in poll () at ../sysdeps/unix/syscall-template.S:81 Here thread 1 is wsgi_daemon_main (waiting in poll), thread 3 is wsgi_deadlock_thread (waiting for the GIL), thread 4 is wsgi_monitor_thread (waiting for the GIL). Thread 2 is the one running the actual application. The py-bt command in gdb gives a nice Python-level back trace: #4 Waiting for a lock (e.g. GIL) #10 Frame 0x7f036cef6920, for file /usr/lib64/python2.7/site-packages/gi/module.py, line 157, in __getattr__ (self=<IntrospectionModule(_namespace='GLib', __path__='/usr/lib64/girepository-1.0/GLib-2.0.typelib', Variant=<StructMeta(new_fixed_array=<gi.FunctionInfo at remote 0x7f036ceff148>, new_array=<gi.FunctionInfo at remote 0x7f036cebef80>, __module__='gi.repository.GLib', new_int64=<gi.FunctionInfo at remote 0x7f036ceff298>, hash=<gi.FunctionInfo at remote 0x7f036ceffca8>, get_uint32=<gi.FunctionInfo at remote 0x7f036ceffc00>, new_objv=<gi.FunctionInfo at remote 0x7f036ceff340>, get_uint16=<gi.FunctionInfo at remote 0x7f036ceffbc8>, get_byte=<gi.FunctionInfo at remote 0x7f036ceff7a0>, parse_error_print_context=<gi.FunctionInfo at remote 0x7f036cf00068>, parse_error_quark=<gi.FunctionInfo at remote 0x7f036cf000a0>, new_uint16=<gi.FunctionInfo at remote 0x7f036ceff458>, get_int16=<gi.FunctionInfo at remote 0x7f036ceff960>, equal=<gi.FunctionInfo at remote 0x7f036ceff730>, new_signature=<gi.FunctionInfo at remot...(truncated) wrapper = enum_register_new_gtype_and_add(info) #19 Frame 0x7f036ceddf60, for file /usr/lib64/python2.7/site-packages/gi/overrides/GLib.py, line 489, in <module> () globals()[attr] = getattr(GLib.UserDirectory, 'DIRECTORY_' + n) Very suspiciously, this thread is *also waiting on the GIL* so I wonder if something in the gobject stuff has introduced a deadlock (acquiring the GIL while it already holds it)? Here is a minimal-ish reproducer, which will get you the above backtrace in gdb: Install mod_wsgi and python-gobject-base (the latter is a transitive dep of subscription-manager so it'll be already installed already). Create /opt/bz1478149/app.wsgi: import gi.repository.GLib def application(environ, start_response): start_response('200 OK', [('Content-Type', 'text/plain')]) return ['It works\n'] Create /etc/httpd/conf.d/bz1478149.conf: WSGIDaemonProcess bz1478149 display-name=bz1478149 processes=1 threads=1 WSGIScriptAlias /bz1478149 /opt/bz1478149/app.wsgi <Directory /opt/bz1478149> Require all granted WSGIProcessGroup bz1478149 </Directory> Start httpd. Run: curl http://localhost/bz1478149 It will hang. (If you comment out the import statement in /opt/bz1478149/app.wsgi and restart httpd, you will see it responds immediately.) Find mod_wsgi worker process using systemctl status httpd: [...] CGroup: /system.slice/httpd.service ├─2009 /usr/sbin/httpd -DFOREGROUND ├─2010 bz1478149 -DFOREGROUND ├─2011 /usr/sbin/httpd -DFOREGROUND ├─2012 /usr/sbin/httpd -DFOREGROUND [...] Attach gdb: gdb /usr/sbin/httpd 2010 (gdb) info threads Id Target Id Frame 4 Thread 0x7f073df5f700 (LWP 2016) "httpd" 0x00007f074b66d7a3 in select () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7f073d75e700 (LWP 2017) "httpd" 0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 2 Thread 0x7f073cf5d700 (LWP 2018) "httpd" 0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 * 1 Thread 0x7f074d06f880 (LWP 2010) "httpd" 0x00007f074b66ba3d in poll () at ../sysdeps/unix/syscall-template.S:81 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f073cf5d700 (LWP 2018))] #0 0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 43 err = lll_futex_wait (futex, expected, private); (gdb) where #0 0x00007f074bb52a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x5648cad71610) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 #1 do_futex_wait (sem=sem@entry=0x5648cad71610, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:223 #2 0x00007f074bb52a9f in __new_sem_wait_slow (sem=0x5648cad71610, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292 #3 0x00007f074bb52b3b in __new_sem_wait (sem=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:28 #4 0x00007f073f49d5f5 in PyThread_acquire_lock (lock=lock@entry=0x5648cad71610, waitflag=waitflag@entry=1) at /usr/src/debug/Python-2.7.5/Python/thread_pthread.h:323 #5 0x00007f073f469156 in PyEval_RestoreThread (tstate=tstate@entry=0x7f0738131ec0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:379 #6 0x00007f073f48c358 in PyGILState_Ensure () at /usr/src/debug/Python-2.7.5/Python/pystate.c:611 [...] Since the offending code is in python-gobject-base, I guess the correct component for this bug is pygobject3 not pygobject2. Also I noticed this in the Python docs: "Note that the PyGILState_*() functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_*() API is unsupported." https://docs.python.org/2/c-api/init.html#non-python-created-threads pygenum.c had called into PyGILState_Ensure() when it hangs. And mod_wsgi does indeed create multiple interpreters with the Py_NewInterpreter() API. So that seems like a problem... |