Hide Forgot
Description of problem: After updating to xorg-x11-drv-intel-2.15.0-3.fc15, I get a hard X hang every time I log out. The screen is black, with the mouse pointer visible but immobile, and the system doesn't respond to any keyboard combination; the only way to recover the system is to ssh in, kill -9 the X server, and reboot. Attaching to the X server with gdb shows that it has deadlocked trying to report a "corrupted double-linked list": #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:100 #1 0x00007fb07f8c7c11 in _L_lock_10461 () at malloc.c:6486 #2 0x00007fb07f8c59d7 in __libc_malloc (bytes=140396034138592) at malloc.c:3657 #3 0x00007fb07f8bb35d in __libc_message (do_abort=2, fmt=0x7fb07f9a6fb8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:137 #4 0x00007fb07f8c196a in malloc_printerr (action=3, str=0x7fb07f9a3f92 "corrupted double-linked list", ptr=<optimized out>) at malloc.c:6283 #5 0x00007fb07f8c1d48 in malloc_consolidate (av=0x7fb07fbe21e0) at malloc.c:5161 #6 0x00007fb07f8c2669 in malloc_consolidate (av=0x7fb07fbe21e0) at malloc.c:5115 #7 _int_free (av=0x7fb07fbe21e0, p=<optimized out>, have_lock=0) at malloc.c:5034 #8 0x000000351360c01f in FontFileFreeDir (dir=0x1fef3d0) at fontdir.c:166 #9 0x000000351360ce18 in FontFileFreeFPE (fpe=0x1fef360) at fontfile.c:139 #10 0x000000351360f89e in CatalogueUnrefFPEs (fpe=<optimized out>) at catalogue.c:116 #11 0x000000351360fe41 in CatalogueFreeFPE (fpe=0x1fb8f00) at catalogue.c:272 #12 0x000000000042f09d in FreeFPE (fpe=0x1fb8f00) at dixfonts.c:218 #13 FreeFPE (fpe=0x1fb8f00) at dixfonts.c:214 #14 0x000000000042f107 in FreeFontPath (list=0x1fb54b0, n=2, force=1) at dixfonts.c:1628 #15 0x0000000000432257 in FreeFonts () at dixfonts.c:1998 #16 0x0000000000422f1e in main (argc=<optimized out>, argv=0x7fff89fd3fb8, envp=<optimized out>) at main.c:329 Downgrading back to xorg-x11-drv-intel-2.14.0-6.fc15.x86_64 makes the problem go away. Version-Release number of selected component (if applicable): xorg-x11-drv-intel-2.15.0-3.fc15.x86_64 How reproducible: 100%
Created attachment 499495 [details] Full backtrace of X server deadlock
It looks like the problematic commit is one of: e1ff5182304e00c0d392092069422cae7626cf8d Handle drawable/client destruction in pending swaps/flips 86f23f21ab57fcbc031bcd2b8f432a08ff4cc320 Skip client and drawable resource delete calls when deleting frame event I wasn't able to test with only the first commit, because KDE gets stuck on its "splash screen".
Copied from https://bugs.freedesktop.org/show_bug.cgi?id=37420: Ian Pilcher 2011-05-20 14:41:59 PDT One other data point. I'm using the following script to reproduce the problem: #!/bin/bash export DISPLAY=:0 firefox http://www.cnn.com &>/dev/null & kwrite &>/dev/null & glxgears &>/dev/null & sleep 15 qdbus org.kde.screensaver /ScreenSaver org.freedesktop.ScreenSaver.SetActive true sleep 30 qdbus org.kde.screensaver /ScreenSaver org.freedesktop.ScreenSaver.SetActive false sleep 15 killall kwrite sleep 2 killall firefox sleep 2 killall glxgears sleep 2 qdbus org.kde.ksmserver /KSMServer logout 0 0 0 The interesting thing is that the problem does not occur without the "killall ..." commands. There's something about closing the windows (or the way that KWin does it) that triggers the issue. [reply] [-] Comment 3 Ian Pilcher 2011-05-20 17:39:24 PDT I am unable to reproduce this problem when booting with maxcpus=1 or setting MALLOC_CHECK_ to any value.
With the latest glibc update, I'm not getting an abort, rather than a dead- lock: #0 0x0000003e06a36275 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003e06a37b8b in abort () at abort.c:93 #2 0x0000003e06a7232e in __libc_message (do_abort=2, fmt=0x3e06b5e060 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x0000003e06a7896a in malloc_printerr (action=3, str=0x3e06b5b012 "corrupted double-linked list", ptr=<optimized out>) at malloc.c:6283 #4 0x0000003e06a78d80 in malloc_consolidate (av=0x3e06d991e0) at malloc.c:5169 #5 0x0000003e06a79669 in malloc_consolidate (av=0x3e06d991e0) at malloc.c:5115 #6 _int_free (av=0x3e06d991e0, p=<optimized out>, have_lock=0) at malloc.c:5034 #7 0x0000000000461094 in FreeOsBuffers (oc=0x21459e0) at io.c:1101 #8 0x000000000045f283 in CloseDownConnection (client=0x2145a20) at connection.c:1068 #9 0x000000000042e1c6 in CloseDownClient (client=0x2145a20) at dispatch.c:3432 #10 0x000000000042ec3a in Dispatch () at dispatch.c:441 #11 0x0000000000422e1a in main (argc=<optimized out>, argv=0x7fffdb2af6c8, envp=<optimized out>) at main.c:287
(In reply to comment #4) > With the latest glibc update, I'm not getting an abort, rather than a dead- > lock: s/not/now/ <sigh/> Also, kdm is now able to restart X post-abort, so the problem is less severe from a system usability point of view.
Created attachment 500311 [details] Backtrace of deadlock with glibc-2.13.90-13.x86_64 ... or not. :-( #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:100 #1 0x0000003e06a7ec11 in _L_lock_10461 () at malloc.c:6486 #2 0x0000003e06a7c9d7 in __libc_malloc (bytes=266402894304) at malloc.c:3657 #3 0x00000000004ea3ce in XIChangeDeviceProperty (dev=0x17db4c0, property=<optimized out>, type=19, format=8, mode=<optimized out>, len=<optimized out>, value=0x7fff435c4bcf, sendevent=1) at xiproperty.c:749 #4 0x0000000000426fa4 in DisableDevice (dev=0x17db4c0, sendevent=1 '\001') at devices.c:499 #5 0x0000000000427298 in RemoveDevice (dev=0x17db4c0, sendevent=1 '\001') at devices.c:1059 #6 0x000000000047da32 in DeleteInputDeviceRequest (pDev=0x17db4c0) at xf86Xinput.c:957 #7 0x0000000000424560 in CloseDeviceList (listHead=0x7e4b08) at devices.c:968 #8 0x0000000000424ac4 in CloseDownDevices () at devices.c:996 #9 0x00000000004612f8 in AbortServer () at log.c:409 #10 0x00000000004614e7 in FatalError (f=0x578e50 "Caught signal %d (%s). Server aborting\n") at log.c:536 #11 0x000000000046231e in OsSigHandler (sip=<optimized out>, signo=11, unused=<optimized out>) at osinit.c:153 #12 OsSigHandler (signo=11, sip=<optimized out>, unused=<optimized out>) at osinit.c:115 #13 <signal handler called> #14 0x0000003e06a78bf5 in malloc_consolidate (av=0x3e06d991e0) at malloc.c:5169 #15 0x0000003e06a79669 in malloc_consolidate (av=0x3e06d991e0) at malloc.c:5115 #16 _int_free (av=0x3e06d991e0, p=<optimized out>, have_lock=0) at malloc.c:5034 #17 0x000000000044c64f in FreeClientResources (client=0x17d8140) at resource.c:858 #18 0x000000000042e0ce in CloseDownClient (client=0x17d8140) at dispatch.c:3461 #19 0x000000000042ec3a in Dispatch () at dispatch.c:441 #20 0x0000000000422e1a in main (argc=<optimized out>, argv=0x7fff435c5828, envp=<optimized out>) at main.c:287
I haven't seen this for a couple of weeks now. Given the "raciness" of the symptoms, it's hard to say whether the problem is really fixed or still lurking (or even where the problem is/was). Closing for now.