Bug 1084244

Summary:	Processing an event from a disabled device causes null-pointer dereference
Product:	Red Hat Enterprise Linux 6	Reporter:	Kazu Yoshida <kyoshida>
Component:	xorg-x11-server	Assignee:	Peter Hutterer <peter.hutterer>
Status:	CLOSED ERRATA	QA Contact:	Desktop QE <desktop-qa-list>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	6.4	CC:	peter.hutterer, svashisht, tpelka
Target Milestone:	rc
Target Release:	6.6
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	xorg-x11-server-1.15.0-14.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-10-14 04:56:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Kazu Yoshida 2014-04-04 02:24:40 UTC

Description of problem:
/usr/bin/Xorg crashes on Red Hat Enterprise Linux 6.4

Version-Release number of selected component (if applicable):
xorg-x11-server-Xorg-1.13.0-11.el6

How reproducible:
Unknown as only once so far

Steps to Reproduce:
Unknown so far

Actual results:
/usr/bin/Xorg crashes 

Expected results:
/usr/bin/Xorg not to crash

Additional info:

Core was generated by `/usr/bin/Xorg :0 -br -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-duDMMr/d'.
Program terminated with signal 6, Aborted.
#0  0x000000352c6328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x000000352c6328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x000000352c634085 in abort () at abort.c:92
#2  0x0000000000473c6e in OsAbort () at utils.c:1268
#3  0x000000000048d667 in ddxGiveUp (error=EXIT_ERR_ABORT) at xf86Init.c:1060
#4  0x0000000000470932 in AbortServer () at log.c:652
#5  0x0000000000471944 in FatalError (f=<value optimized out>) at log.c:793
#6  0x0000000000472b2e in OsSigHandler (signo=11, sip=<value optimized out>, unused=<value optimized out>) at osinit.c:146
#7  <signal handler called>
#8  0x000000000059397b in mieqMoveToNewScreen (dev=0x1126310, event=0x823500, screen=0xe6da00) at mieq.c:490
#9  mieqProcessDeviceEvent (dev=0x1126310, event=0x823500, screen=0xe6da00) at mieq.c:532
#10 0x0000000000593f94 in mieqProcessInputEvents () at mieq.c:623
#11 0x000000000048b7f9 in ProcessInputEvents () at xf86Events.c:164
#12 0x000000000048bccd in xf86VTSwitch (blockData=<value optimized out>, err=<value optimized out>, pReadmask=<value optimized out>) at xf86Events.c:455
#13 xf86Wakeup (blockData=<value optimized out>, err=<value optimized out>, pReadmask=<value optimized out>) at xf86Events.c:285
#14 0x000000000043b8bb in WakeupHandler (result=-1, pReadmask=0x82a740) at dixutils.c:423
#15 0x000000000046a4ef in WaitForSomething (pClientsReady=0x1120b20) at WaitFor.c:224
#16 0x00000000004379d2 in Dispatch () at dispatch.c:357
#17 0x000000000047cbca in main (argc=10, argv=<value optimized out>, envp=<value optimized out>) at main.c:295

What's happening here is we are aborting after intercepting a segfault being sent while we are executing mieqMoveToNewScreen in frame 8. The signal handler OsSigHandler has been called with the signal (signo) 11 (SIGSEGV) so frame 8 is where we need to focus.

(gdb) f 8
#8  0x000000000059397b in mieqMoveToNewScreen (dev=0x1126310, event=0x823500, screen=0xe6da00) at mieq.c:490
490         if (dev && screen && screen != DequeueScreen(dev)) {

What instruction were we executing when we segfaulted?

(gdb) x/i $pc
=> 0x59397b <mieqProcessDeviceEvent+379>:       cmp    0x118(%rax),%r13
(gdb) i r rax
rax            0x0      0

Okay, so we segfaulted because %rax was zero but why was %rax zero?

(gdb) disass /m 0x59397b
Dump of assembler code for function mieqProcessDeviceEvent:
490         if (dev && screen && screen != DequeueScreen(dev)) {
   0x000000000059395f <+351>:   test   %r13,%r13
   0x0000000000593962 <+354>:   je     0x593871 <mieqProcessDeviceEvent+113>
   0x0000000000593968 <+360>:   test   %rbx,%rbx
   0x000000000059396b <+363>:   je     0x593871 <mieqProcessDeviceEvent+113>
   0x0000000000593971 <+369>:   mov    0x148(%rbx),%rax   <-------------============== NOTE
   0x0000000000593978 <+376>:   mov    (%rax),%rax
=> 0x000000000059397b <+379>:   cmp    0x118(%rax),%r13
   0x0000000000593982 <+386>:   je     0x593871 <mieqProcessDeviceEvent+113>

See that we got the value for %rax indirectly from the value in %rbx which is probably one of the variables in the line of source code.

(gdb) i r rbx
rbx            0x1126310        17982224
(gdb) p dev
$1 = (struct _DeviceIntRec *) 0x1126310

It's dev. We get the value from rax from the field at offset 0x148 in the _DeviceIntRec struct which appears to be "spriteInfo".

(gdb) p/x &((struct _DeviceIntRec *)0x0)->spriteInfo                      
$2 = 0x148

(gdb) p dev->spriteInfo
$3 = (SpriteInfoPtr) 0x1126620
gdb) x 0x1126620
0x1126620:      0x00000000
gdb) p *dev->spriteInfo
$4 = {sprite = 0x0, spriteOwner = 0, paired = 0x0, anim = {pCursor = 0x0, pScreen = 0x0, elt = 0, time = 0}}

Why were we dealing with this field in the first place? That comes down to the definition of DequeueScreen().

#define DequeueScreen(dev) dev->spriteInfo->sprite->pDequeueScreen

(gdb) p dev->spriteInfo->sprite->pDequeueScreen
Cannot access memory at address 0x118

Segfault. I think someone with far greater knowledge of this code than me would need to speculate on why dev->spriteInfo wasn't populated at teh time.

Comment 3 Siteshwar Vashisht 2014-04-22 07:28:31 UTC

Comment #11 on similar bug at https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1094097 says that it was resolved after updating to xserver version 1.13.1

Comment 9 Peter Hutterer 2014-05-20 05:32:22 UTC

MODIFIED

xorg-x11-server-1.15.0-14.el6 is available in brew

Comment 10 Peter Hutterer 2014-05-20 05:36:23 UTC

Note to testers: this is a race condition and thus inherently hard to trigger. The bug is triggered by a device generating events while it is being disabled but the window is quite small. I only managed to reproduce it by modifying the server to send an event at the right time.

For blackbox-testing, an approach was described in https://bugs.freedesktop.org/show_bug.cgi?id=77884:
   - Run xorg-server in valgrind to slow it down enough.
   - Hammer on the touchpad like a madman.
   - ssh in and chvt away while hammering.
   - Observe the following crash in X.org: [...]

Comment 12 errata-xmlrpc 2014-10-14 04:56:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1376.html