Bug 1010357

Summary: lock up of Xorg on start when openssl-fips is installed
Product: Red Hat Enterprise Linux 7 Reporter: Matěj Cepl <mcepl>
Component: opensslAssignee: Tomas Mraz <tmraz>
Status: CLOSED CURRENTRELEASE QA Contact: Hubert Kario <hkario>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: codonell, hkario, ksrot, tmraz
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: [cat:lockup]
Fixed In Version: openssl-1.0.1e-21.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 10:43:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
dmesg output
none
/var/log/Xorg.0.log none

Description Matěj Cepl 2013-09-20 15:18:18 UTC
Created attachment 800532 [details]
dmesg output

Description of problem:
I don't know what actually happened, but I am not able to get Xorg running. Both gdm, startx and plain /usr/bin/X running lock up the screen. I can connect via ssh, but I have no screen (even Ctrl-Alt-F2) by any means.

On Adam's advice I have tried to play with

Option     "AccelMethod"      "UXA"

and now I have

Option     "NoAccel"           "yes"

but it doesn't seem to make any difference, I have still the backtrace which to my naive eyes look same (as shown below).

Version-Release number of selected component (if applicable):
xorg-x11-drv-intel-2.21.12-2.el7.x86_64
xorg-x11-server-common-1.14.2-10.el7.x86_64
xorg-x11-server-Xorg-1.14.2-10.el7.x86_64
libdrm-2.4.46-1.el7.x86_64
mesa-libGL-9.2-1.20130902.el7.i686


How reproducible:
Unfortunately 100%, so I don't have my work computer :()

Steps to Reproduce:
1.see above
2.
3.

Actual results:
black (or kind whiteish, greysih, when running gdm) screen

Expected results:
either perfectly working Xorg or some kind of degraded experience, but SOMETHING, please ... please.

Additional info:

Full backtrace of the Xorg server:

0x00007fe968d19bfc in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x7fe969d6d908 <_rtld_local+2312>) at pthread_mutex_unlock.c:52
52	      lll_unlock (mutex->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex));
Missing separate debuginfos, use: debuginfo-install freetype-2.4.11-6.el7.x86_64 libXdamage-1.1.4-3.el7.x86_64 libXext-1.3.2-1.el7.x86_64 libXfixes-5.0.1-1.el7.x86_64 libXxf86vm-1.1.3-1.el7.x86_64 libfontenc-1.1.1-3.el7.x86_64 libgcc-4.8.1-9.el7.x86_64 libxcb-1.9-3.el7.x86_64 mesa-libEGL-9.2-1.20130902.el7.x86_64 mesa-libGL-9.2-1.20130902.el7.x86_64 mesa-libgbm-9.2-1.20130902.el7.x86_64 mesa-libglapi-9.2-1.20130902.el7.x86_64 pcre-8.32-7.el7.x86_64 xorg-x11-drv-intel-2.21.12-2.el7.x86_64 zlib-1.2.7-10.el7.x86_64
(gdb) bt full
(gdb) bt full
#0  0x00007fe968d19bfc in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x7fe969d6d908 <_rtld_local+2312>) at pthread_mutex_unlock.c:52
        type = 1
#1  __GI___pthread_mutex_unlock (mutex=0x7fe969d6d908 <_rtld_local+2312>) at pthread_mutex_unlock.c:297
No locals.
#2  0x00007fe969b4d069 in tls_get_addr_tail (ti=0x7fe969936f58, dtv=0x7fe969d34290, the_map=0x7fe969d6bb20) at dl-tls.c:753
No locals.
#3  0x00007fe96972077e in init_thread_destructor () at procattr.c:64
No locals.
#4  getprocattrcon_raw (context=context@entry=0x7fff5c993fa8, pid=pid@entry=0, attr=attr@entry=0x7fe96972f3e6 "current") at procattr.c:112
        buf = <optimized out>
        size = <optimized out>
        fd = <optimized out>
        ret = <optimized out>
        errno_hold = <optimized out>
        prev_context = <optimized out>
#5  0x00007fe969720a3e in getcon_raw_internal (c=c@entry=0x7fff5c993fa8) at procattr.c:325
No locals.
#6  0x00007fe96972eba0 in is_selinux_enabled_internal () at enabled.c:26
        enabled = 1
        con = 0x7fe969d6e208 ""
#7  0x00000000004f5e29 in SELinuxExtensionInit () at xselinux_ext.c:695
        extEntry = <optimized out>
#8  0x00000000004b9cb1 in InitExtensions (argc=argc@entry=12, argv=argv@entry=0x7fff5c994138) at ../../../mi/miinitext.c:337
        i = <optimized out>
        ext = <optimized out>
#9  0x00000000004260c0 in main (argc=12, argv=0x7fff5c994138, envp=<optimized out>) at main.c:208
        i = <optimized out>
        alwaysCheckForInput = {0, 1}
(gdb)

Comment 1 Matěj Cepl 2013-09-20 15:19:33 UTC
Created attachment 800542 [details]
/var/log/Xorg.0.log

Comment 3 Matěj Cepl 2013-09-23 13:44:59 UTC
After removal of openssl-fips (and dependent packages) Xorg works like charm.

Comment 4 Tomas Mraz 2013-09-23 14:23:28 UTC
The situation as is:

1. openssl needs to do some initialization in a constructor due to new FIPS requirements

2. unless the code done in the constructor is trivial, the X will hang in cycle in tls_get_addr_tail() from glibc which is called from libselinux. This happens after the openssl constructor already run.

Comment 5 Tomas Mraz 2013-09-23 14:26:50 UTC
Carlos, do you have any idea what could be the cause or at least how to find it?

Comment 6 Tomas Mraz 2013-09-23 16:29:17 UTC
So the cause was that I was dlopening libssl.so from libcrypto in the constructor. Which I can avoid. Although I'd still like to know from some glibc guru why it must not be done.

Comment 7 Carlos O'Donell 2013-09-24 08:29:38 UTC
(In reply to Tomas Mraz from comment #6)
> So the cause was that I was dlopening libssl.so from libcrypto in the
> constructor. Which I can avoid. Although I'd still like to know from some
> glibc guru why it must not be done.

There should be no problem using dlopen to load libssl.so from a constructor in libcrypto e.g. __attribute__ ((constructor)).

Your particular backtrace doesn't show any dlopen-related calls. It doesn't show that the particular thread (are there threads?) is blocked, just that it's unlocking a mutex (which is normal). Where is the cycle?

Comment 8 Tomas Mraz 2013-09-24 08:44:17 UTC
Actually the mutex lock/unlock immediately returns and the cycle is done by goto again; in the tls_get_addr_tail() Also note that there is only a single thread so there is no reason why the mutex should block.

There might be a problem that libssl.so depends on many other shared libraries and something is messed up on the load. I don't know. Anyway it seems to be much safer to not load it as it is not strictly necessary.

Comment 9 Carlos O'Donell 2013-09-24 09:46:54 UTC
(In reply to Tomas Mraz from comment #8)
> Actually the mutex lock/unlock immediately returns and the cycle is done by
> goto again; in the tls_get_addr_tail() Also note that there is only a single
> thread so there is no reason why the mutex should block.

That's certainly odd. The only way for this happen would be for l_tls_offset to be non-zero and positive but the module's block to remain at TLS_DTV_UNALLOCATED. That should never happen.

Comment 10 Tomas Mraz 2013-09-24 10:06:47 UTC
As I said there is only a single thread so the values can hardly change in the loop.

Comment 15 Ludek Smid 2014-06-13 10:43:58 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.