Red Hat Bugzilla – Bug 1084244
Processing an event from a disabled device causes null-pointer dereference
Last modified: 2014-10-14 00:56:00 EDT
Description of problem: /usr/bin/Xorg crashes on Red Hat Enterprise Linux 6.4 Version-Release number of selected component (if applicable): xorg-x11-server-Xorg-1.13.0-11.el6 How reproducible: Unknown as only once so far Steps to Reproduce: Unknown so far Actual results: /usr/bin/Xorg crashes Expected results: /usr/bin/Xorg not to crash Additional info: Core was generated by `/usr/bin/Xorg :0 -br -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-duDMMr/d'. Program terminated with signal 6, Aborted. #0 0x000000352c6328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x000000352c6328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x000000352c634085 in abort () at abort.c:92 #2 0x0000000000473c6e in OsAbort () at utils.c:1268 #3 0x000000000048d667 in ddxGiveUp (error=EXIT_ERR_ABORT) at xf86Init.c:1060 #4 0x0000000000470932 in AbortServer () at log.c:652 #5 0x0000000000471944 in FatalError (f=<value optimized out>) at log.c:793 #6 0x0000000000472b2e in OsSigHandler (signo=11, sip=<value optimized out>, unused=<value optimized out>) at osinit.c:146 #7 <signal handler called> #8 0x000000000059397b in mieqMoveToNewScreen (dev=0x1126310, event=0x823500, screen=0xe6da00) at mieq.c:490 #9 mieqProcessDeviceEvent (dev=0x1126310, event=0x823500, screen=0xe6da00) at mieq.c:532 #10 0x0000000000593f94 in mieqProcessInputEvents () at mieq.c:623 #11 0x000000000048b7f9 in ProcessInputEvents () at xf86Events.c:164 #12 0x000000000048bccd in xf86VTSwitch (blockData=<value optimized out>, err=<value optimized out>, pReadmask=<value optimized out>) at xf86Events.c:455 #13 xf86Wakeup (blockData=<value optimized out>, err=<value optimized out>, pReadmask=<value optimized out>) at xf86Events.c:285 #14 0x000000000043b8bb in WakeupHandler (result=-1, pReadmask=0x82a740) at dixutils.c:423 #15 0x000000000046a4ef in WaitForSomething (pClientsReady=0x1120b20) at WaitFor.c:224 #16 0x00000000004379d2 in Dispatch () at dispatch.c:357 #17 0x000000000047cbca in main (argc=10, argv=<value optimized out>, envp=<value optimized out>) at main.c:295 What's happening here is we are aborting after intercepting a segfault being sent while we are executing mieqMoveToNewScreen in frame 8. The signal handler OsSigHandler has been called with the signal (signo) 11 (SIGSEGV) so frame 8 is where we need to focus. (gdb) f 8 #8 0x000000000059397b in mieqMoveToNewScreen (dev=0x1126310, event=0x823500, screen=0xe6da00) at mieq.c:490 490 if (dev && screen && screen != DequeueScreen(dev)) { What instruction were we executing when we segfaulted? (gdb) x/i $pc => 0x59397b <mieqProcessDeviceEvent+379>: cmp 0x118(%rax),%r13 (gdb) i r rax rax 0x0 0 Okay, so we segfaulted because %rax was zero but why was %rax zero? (gdb) disass /m 0x59397b Dump of assembler code for function mieqProcessDeviceEvent: 490 if (dev && screen && screen != DequeueScreen(dev)) { 0x000000000059395f <+351>: test %r13,%r13 0x0000000000593962 <+354>: je 0x593871 <mieqProcessDeviceEvent+113> 0x0000000000593968 <+360>: test %rbx,%rbx 0x000000000059396b <+363>: je 0x593871 <mieqProcessDeviceEvent+113> 0x0000000000593971 <+369>: mov 0x148(%rbx),%rax <-------------============== NOTE 0x0000000000593978 <+376>: mov (%rax),%rax => 0x000000000059397b <+379>: cmp 0x118(%rax),%r13 0x0000000000593982 <+386>: je 0x593871 <mieqProcessDeviceEvent+113> See that we got the value for %rax indirectly from the value in %rbx which is probably one of the variables in the line of source code. (gdb) i r rbx rbx 0x1126310 17982224 (gdb) p dev $1 = (struct _DeviceIntRec *) 0x1126310 It's dev. We get the value from rax from the field at offset 0x148 in the _DeviceIntRec struct which appears to be "spriteInfo". (gdb) p/x &((struct _DeviceIntRec *)0x0)->spriteInfo $2 = 0x148 (gdb) p dev->spriteInfo $3 = (SpriteInfoPtr) 0x1126620 gdb) x 0x1126620 0x1126620: 0x00000000 gdb) p *dev->spriteInfo $4 = {sprite = 0x0, spriteOwner = 0, paired = 0x0, anim = {pCursor = 0x0, pScreen = 0x0, elt = 0, time = 0}} Why were we dealing with this field in the first place? That comes down to the definition of DequeueScreen(). #define DequeueScreen(dev) dev->spriteInfo->sprite->pDequeueScreen (gdb) p dev->spriteInfo->sprite->pDequeueScreen Cannot access memory at address 0x118 Segfault. I think someone with far greater knowledge of this code than me would need to speculate on why dev->spriteInfo wasn't populated at teh time.
Comment #11 on similar bug at https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1094097 says that it was resolved after updating to xserver version 1.13.1
MODIFIED xorg-x11-server-1.15.0-14.el6 is available in brew
Note to testers: this is a race condition and thus inherently hard to trigger. The bug is triggered by a device generating events while it is being disabled but the window is quite small. I only managed to reproduce it by modifying the server to send an event at the right time. For blackbox-testing, an approach was described in https://bugs.freedesktop.org/show_bug.cgi?id=77884: - Run xorg-server in valgrind to slow it down enough. - Hammer on the touchpad like a madman. - ssh in and chvt away while hammering. - Observe the following crash in X.org: [...]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1376.html